  Letter
  Published:

1970s and ‘Patient 0’ HIV-1 genomes illuminate early HIV/AIDS history in North America

This article has been updated


The emergence of HIV-1 group M subtype B in North American men who have sex with men was a key turning point in the HIV/AIDS pandemic. Phylogenetic studies have suggested cryptic subtype B circulation in the United States (US) throughout the 1970s1,2 and an even older presence in the Caribbean2. However, these temporal and geographical inferences, based upon partial HIV-1 genomes that postdate the recognition of AIDS in 1981, remain contentious3,4 and the earliest movements of the virus within the US are unknown. We serologically screened >2,000 1970s serum samples and developed a highly sensitive approach for recovering viral RNA from degraded archival samples. Here, we report eight coding-complete genomes from US serum samples from 1978–1979—eight of the nine oldest HIV-1 group M genomes to date. This early, full-genome ‘snapshot’ reveals that the US HIV-1 epidemic exhibited extensive genetic diversity in the 1970s but also provides strong evidence for its emergence from a pre-existing Caribbean epidemic. Bayesian phylogenetic analyses estimate the jump to the US at around 1970 and place the ancestral US virus in New York City with 0.99 posterior probability support, strongly suggesting this was the crucial hub of early US HIV/AIDS diversification. Logistic growth coalescent models reveal epidemic doubling times of 0.86 and 1.12 years for the US and Caribbean, respectively, suggesting rapid early expansion in each location3. Comparisons with more recent data reveal many of these insights to be unattainable without archival, full-genome sequences. We also recovered the HIV-1 genome from the individual known as ‘Patient 0’ (ref. 5) and found neither biological nor historical evidence that he was the primary case in the US or for subtype B as a whole. We discuss the genesis and persistence of this belief in the light of these evolutionary insights.

Figure 1: Maximum clade credibility (MCC) tree summary of the Bayesian spatio-temporal reconstruction based on complete HIV-1 genome data.
Figure 2: Demographic reconstruction based on the nested coalescent model.
Figure 3: The early patterns of HIV-1 subtype B spread in the Americas.

Change history

  02 October 2016

    In the online version of this paper the images in Figures 2 and 3 were switched, this has been corrected.


We thank C. Stevens and D. Hemmerlein for facilitating access to archival sera; G.-Z. Han, A. Bjork, W. Switzer, V. Sullivan, R. Ruboyianes and P. Sprinkle for technical assistance; T. Spira and M. Owen for geographical data on some published sequences; and the NIH AIDS Reagent program for providing reference virus samples US657 and HT599. W. W. Darrow led the initial 1982 cluster investigation and provided R.A.M. with access to his copies of archival CDC documents. This work was supported by NIH/NIAID R01AI084691 and the David and Lucile Packard Foundation (M.W.); the Wellcome Trust (080651), the University of Oxford’s Clarendon Fund, the Economic and Social Research Council (PTA-026-27-2838), and a J. Armand Bombardier Internationalist Fellowship (R.A.M.); the Research Fund KU Leuven (Onderzoeksfonds KU Leuven, Program Financing no. PF/10/018) and the ‘Fonds voor Wetenschappelijk Onderzoek Vlaanderen’ (FWO) (G066215N) (P.L); and NSF DMS 1264153, NIH R01 HG006139 and NIH R01 AI107034 (M.A.S.).

Author information

Authors and Affiliations



M.W., H.W.J., P.L. and R.A.M. conceived the study. T.D.W and M.W. designed the RNA jackhammering method. T.D.W. generated the sequences. B.A.K. provided serum samples from New York City. W.H. and T.G. acquired specimens and provided serological data. D.E.T. provided conceptual input. M.W., M.A.S. and P.L. prepared the data sets and performed the phylogenetic analyses. R.A.M. performed the historical analyses. M.W., H.W.J., P.L. and R.A.M. wrote the paper. All authors discussed the results and commented on the manuscript. The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Corresponding authors

Correspondence to Michael Worobey or Richard A. McKay.

Ethics declarations

Competing interests

A patent, ‘Methods and systems for RNA or DNA detection and sequencing’ (US patent application 62/325,320), has been filed with the United States Patent and Trademark Office. It will be used to facilitate the licensing of this methodology.

Additional information

Reviewer Information Nature thanks K. Andersen and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data figures and tables

Extended Data Figure 1 Jackhammering schematic and primer panels and pools.

ad, Detection and amplification of target RNA molecules in old, degraded, low-titre samples. For the purposes of illustration, consider a tube with 1013 RNA molecules, but (because of the low RNA quality) only one molecule that is (i) capable of being primed by the given reverse primer(s) and (ii) long enough to form a 200-bp product. a, Conventional RT–PCR with a long amplification product, oversized for a sample with RNA less than ~200 bases in length. b, RT–PCR with a shorter amplification product. c, Use of multiple primer pairs to increase the chance of at least one PCR-positive result. d, The jackhammering approach, which overcomes the problems encountered in ac by (i) targeting an extensive panel of short amplicons appropriately sized to the level of RNA survival in the sample, (ii) conducting reverse transcription with pools of primer pairs that amplify discrete, non-overlapping genomic regions, and (iii) employing a multiplex pre-amplification step, in the tube with the reverse transcription product, to generate sufficient DNA to ensure that each aliquot from it contains numerous template molecules for final PCR amplification. In this schematic, we show just two primer pairs per pool, but we used pools of ten pairs with our largest primer panels (shown in e, HXB2 numbering along HIV-1 genome). With a 10 primer-pair pool, and 10 final reactions, one can reliably recover 10 bands for sequencing. Five such pools (one entire panel of 50 pairs), allows complete HIV-1 genome recovery even in heavily degraded samples.

Extended Data Figure 2 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstructions based on complete HIV-1 genome data.

a, ‘full genome 46’, b, ‘full genome 38’. The tips of the trees correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US, the United States; CA, California, GA, Georgia; NY, New York. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95. Grey bars indicate the 95% credibility intervals for the internal node ages. The tree in b represents the fully annotated version of the tree in Fig. 1 in the main text.

Extended Data Figure 3 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstructions based on different genome region data sets.

MCC trees for the same strains are shown for a, gag, b, pol, c, env and d, the complete genome. The tips of the trees correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US, the United States. Tip labels are provided for the newly obtained archival HIV-1 genomes. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95. We also depict the posterior probability densities for the time of the introduction event from the Caribbean into the US on the time scale of the trees.

Extended Data Figure 4 Maximum likelihood phylogenies for the different genome region data sets.

a, gag, b, pol, c, env and d, the complete genome. We analysed the same data sets as in Extended Data Fig. 3. The diameters of the internal node circles reflect bootstrap support values. We manually coloured the branches in a similar way as for the Bayesian phylogeographic reconstructions.

Extended Data Figure 5 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstructions based on different env data sets.

a, ‘env 105’, b, ‘env 74’. The tips of the trees correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US, the United States, CA, California; GA, Georgia; NJ, New Jersey, NY, New York; PA, Pennsylvania. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95. We also depict the posterior probability density for the time of the introduction event from the Caribbean into the U.S on the time scales of the trees. The three partial env sequences from SF in 1978 (ref. 10) are highlighted with bullets.

Extended Data Figure 6 Maximum clade credibility (MCC) tree summaries of Bayesian spatio-temporal reconstruction comparing early and late strains.

a, ‘env 133’, b, only ‘late’ sequences from ‘env 133’. In a, we classified US sequences as ‘early’ or ‘late’ depending on whether they were sampled before or after (and including) 1985. In b, the analysis was conducted on an empirical tree distribution of ‘env 133’ from which we pruned early US sequences (in grey), but we still annotate the reconstruction on the complete phylogenies for reference. The tips of the tree correspond to the year of sampling while the branch (and node) colours reflect location: the sampling location for the tip branches and the inferred location for the internal branches. AF, Africa; CB, Caribbean; US early, the United States sampled <1985; US late, the United States sampled in or after 1985; CA, California; GA, Georgia; NC, North Carolina, NY, New York. The diameters of the internal node circles reflect posterior location probability values. Thick outer circles represent internal nodes with posterior probability support >0.95.

Extended Data Figure 7 A cluster of 40 early AIDS patients linked through sexual contact.

Reprinted from figure 1 of ref. 5 with permission from Elsevier.

Extended Data Figure 8 Jackhammering validation with reference viruses.

a, The consensus sequences for primer panels HIVM and HIVR (‘RMcon’ suffix) were included, with previously published sequences for an US (US657) virus and a Haitian (HT599) virus, in a maximum likelihood tree. The two clusters of paired sequences are highlighted by coloured boxes. b, Plot of the root to tip genetic distance against sampling time for the tree in a. The colours for the data points are consistent with those used for sampling locations in the phylogenies (the two African outgroup tips are not shown for clarity). The data points with black circles represent the published sequences while the data points with a target symbol represent the newly obtained sequences.

Extended Data Figure 9 Plots of the root-to-tip genetic distance against sampling time for different genome region data sets (gag, pol, env and the complete genome).

We used TempEst27 to obtain exploratory regressions based on the maximum likelihood trees (Extended Data Fig. 4). Each data point represents a tip; colours are consistent with those used for sampling locations in the phylogenies. The US data points with black circles represent the new genomes dating back to 1978–1979. The data point with the target symbol represents the Patient 0 genome. In each plot, we provide the R2 for the regression and the slope, reflecting the evolutionary rate (in substitutions per site per year).

Extended Data Table 1 Molecular clock, phylogeographic and recombination estimates for the different data sets

Related audio

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion, Supplementary References and Supplementary Tables 1-2. (PDF 1016 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Worobey, M., Watts, T., McKay, R. et al. 1970s and ‘Patient 0’ HIV-1 genomes illuminate early HIV/AIDS history in North America. Nature 539, 98–101 (2016). https://doi.org/10.1038/nature19827

Download citation

  Received:

  Accepted:

  Published:

  Issue Date:

  • DOI: https://doi.org/10.1038/nature19827

This article is cited by


