The Genomic Formation of Human Populations in East Asia

The deep population history of East Asia remains poorly understood due to a lack of ancient DNA data and sparse sampling of present-day people. We report genome-wide data from 191 individuals from Mongolia, northern China, Taiwan, the Amur River Basin and Japan dating to 6000 BCE – 1000 CE, many from contexts never previously analyzed with ancient DNA. We also report 383 present-day individuals from 46 groups mostly from the Tibetan Plateau and southern China. We document how 6000-3600 BCE people of Mongolia and the Amur River Basin were from populations that expanded over Northeast Asia, likely dispersing the ancestors of Mongolic and Tungusic languages. In a time transect of 89 Mongolians, we reveal how Yamnaya steppe pastoralist spread from the west by 3300-2900 BCE in association with the Afanasievo culture, although we also document a boy buried in an Afanasievo barrow with ancestry entirely from local Mongolian hunter-gatherers, representing a unique case of someone of entirel...

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 1 The Genomic Formation of Human Populations in East Asia 2 3 4 5 6 7 8 9 Chuan-Chao Wang1,2,3,4,*, Hui-Yuan Yeh5,*, Alexander N Popov6,*, Hu-Qin Zhang7,*, Hirofumi Matsumura8, Kendra Sirak2,9, Olivia Cheronet10, Alexey Kovalev11, Nadin Rohland2, Alexander M. Kim2,12, Rebecca Bernardos2, Dashtseveg Tumen13, Jing Zhao7, Yi-Chang Liu14, Jiun-Yu Liu15, Matthew Mah2,16,17, Swapan Mallick2, 9,16,17, Ke Wang3, Zhao Zhang2, Nicole Adamski2,17, Nasreen Broomandkhoshbacht2,17, Kimberly Callan2,17, Brendan J. Culleton18, Laurie Eccles19, Ann Marie Lawson2,17, Megan Michel2,17, Jonas Oppenheimer2,17, Kristin 10 11 12 13 14 15 16 17 18 19 Stewardson2,17, Shaoqing Wen20, Shi Yan21, Fatma Zalzala2,17, Richard Chuang14, Ching-Jung Huang14, Chung-Ching Shiung14, Yuri G. Nikitin22, Andrei V. Tabarev23, Alexey A. Tishkin24, Song Lin7, Zhou-Yong Sun25, Xiao-Ming Wu7, Tie-Lin Yang7, Xi Hu7, Liang Chen26, Hua Du27, Jamsranjav Bayarsaikhan28, Enkhbayar Mijiddorj29, Diimaajav Erdenebaatar29, Tumur-Ochir Iderkhangai29, Erdene Myagmar13, Hideaki Kanzawa-Kiriyama30, Msato Nishino31, Ken-ichi Shinoda30, Olga A. Shubina32, Jianxin Guo1, Qiongying Deng33, Longli Kang34, Dawei Li35, Dongna Li36, Rong Lin36, Wangwei Cai37, Rukesh Shrestha4, Ling-Xiang Wang4, Lanhai Wei1, Guangmao Xie38,39, Hongbing Yao40, Manfei Zhang4, Guanglin He1, Xiaomin Yang1, Rong Hu1, Martine Robbeets3, Stephan Schiffels3, Douglas J. Kennett41, Li Jin4, Hui 20 21 Li4, Johannes Krause3, Ron Pinhasi10, David Reich2,9,16,17 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 1. Department of Anthropology and Ethnology, Institute of Anthropology, School of Sociology and Anthropology and State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen 2. University, Xiamen 361005, China Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA 3. Max Planck Institute for the Science of Human History, 07745 Jena, Germany 4. MOE Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai 200438, China 5. 6. School of Humanities, Nanyang Technological University, Nanyang 639798, Singapore Scientific Museum, Far Eastern Federal University, 690950 Vladivostok, Russia 7. Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China 8. School of Health Science, Sapporo Medical University, S1 W17, Chuo-ku, Sapporo, 060-8556, 9. Japan Department of Human Evolutionary Biology, Harvard Unviersity, Cambridge, MA 02138, USA 10. Department of Evolutionary Anthropology, University of Vienna, 1090 Vienna, Austria 11. Institute of Archaeology, Russian Academy of Sciences, Moscow, Russia 12. Department of Anthropology, Harvard University, Cambridge, Massachusetts 02138, USA 13. Department of Anthropology and Archaeology, National University of Mongolia, Ulaanbaatar 46, Mongolia 14. Institute of Archaeology, National Cheng Kung University, Tainan 701, Taiwan bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 15. Department of Anthropology, University of Washington, 314 Denny Hall, Seattle, USA 16. Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA 17. Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA 18. Institutes of Energy and the Environment, The Pennsylvania State University, University Park, PA 16802, USA. 19. Department of Anthropology, Pennsylvania State University, University Park, PA 16802, USA 20. Institute of Archaeological Science, Fudan University, Shanghai 200433, China 21. School of Ethnology and Sociology, Minzu University of China, Beijing 100081, China 22. Museum of Archaeology and Ethnology of Institute of History, Far Eastern Branch of Russian Academic of Sciences, Vladivostok 690001, Russia 23. Institute of Archaeology and Ethnography, Siberian Branch of Russian Academy of Sciences, Novosibirsk 630090, Russia 24. Department of Archeology, Ethnography and Museology, Altai State University, Barnaul, Altaisky Kray 656049, Russia 25. Shaanxi Provincial Institute of Archaeology, Xi’an 710054, China 26. College of Cultural Heritage, Northwest University, Xi'an 710069, China 27. Xi'an AMS Center, Institute of Earth Environment, Chinese Academy of Sciences, Xi’an 710061, China 28. Research Center at the National Museum of Mongolia, Ulaanbaatar, Region of Sukhbaatar 14201, Mongolia 29. Department of Archaeology, Ulaanbaatar State University, Ulaanbaatar, Region of Bayanzurkh 13343, Mongolia 30. Department of Anthropology, National Museum of Nature and Science, Tsukuba City, Ibaraki Prefecture 305-0005, Japan 31. Archeological Center of Chiba City, Chiba 260-0814 Japan 32. Department of Archeology, Sakhalin Regional Museum, Yuzhno-Sakhalinsk, Russia 33. Department of Human Anatomy and Center for Genomics and Personalized Medicine, Guangxi Medical University, Nanning 530021, China 34. Key Laboratory for Molecular Genetic Mechanisms and Intervention Research on High Altitude Disease of Tibet Autonomous Region, Key Laboratory of High Altitude Environment and Gene Related to Disease of Tibet, Ministry of Education, School of Medicine, Xizang Minzu University, Xianyang 712082, Shaanxi, China 35. Guangxi Museum of Nationalities, Nanning 530028, Guangxi, China 36. Department of Biology, Hainan Medical University, Haikou 571199, Hainan, China 37. Department of Biochemistry and Molecular Biology, Hainan Medical University, Haikou 571199, Hainan, China 38. College of History, Culture and Tourism, Guangxi Normal University, Guilin 541001, China 39. Guangxi Institute of Cultural Relics Protection and Archaeology, Nanning 530003, Guangxi, China 40. Belt and Road Research Center for Forensic Molecular Anthropology, Key Laboratory of Evidence Science of Gansu Province, Gansu Institute of Political Science and Law, Lanzhou 730070, China 41. Department of Anthropology, University of California, Santa Barbara, CA 93106, USA * Contributed equally. Correspondence: wang@xmu.edu.cn (C-C.W), krause@shh.mpg.de (J.K.), ron.pinhasi@univie.ac.at (R.P.), and reich@genetics.med.harvard.edu (D.R.). 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 88 89 90 91 92 93 94 95 96 97 The deep population history of East Asia remains poorly understood due to a lack of ancient DNA data and sparse sampling of present-day people. We report genome-wide data from 191 individuals from Mongolia, northern China, Taiwan, the Amur River Basin and Japan dating to 6000 BCE - 1000 CE, many from contexts never previously analyzed with ancient DNA. We also report 383 present-day individuals from 46 groups mostly from the Tibetan Plateau and southern China. We document how 6000-3600 BCE people of Mongolia and the Amur River Basin were from populations that expanded over Northeast Asia, likely dispersing the ancestors of Mongolic and Tungusic languages. In a time transect of 89 Mongolians, we reveal how Yamnaya steppe pastoralist spread 98 99 100 101 from the west by 3300-2900 BCE in association with the Afanasievo culture, although we also document a boy buried in an Afanasievo barrow with ancestry entirely from local Mongolian hunter-gatherers, representing a unique case of someone of entirely non-Yamnaya ancestry interred in this way. The second 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 spread of Yamnaya-derived ancestry came via groups that harbored about a third of their ancestry from European farmers, which nearly completely displaced unmixed Yamnaya-related lineages in Mongolia in the second millennium BCE, but did not replace Afanasievo lineages in western China where Afanasievo ancestry persisted, plausibly acting as the source of the earlysplitting Tocharian branch of Indo-European languages. Analyzing 20 Yellow River Basin farmers dating to ~3000 BCE, we document a population that was a plausible vector for the spread of Sino-Tibetan languages both to the Tibetan Plateau and to the central plain where they mixed with southern agriculturalists to form the ancestors of Han Chinese. We show that the individuals in a time transect of 52 ancient Taiwan individuals spanning at least 1400 BCE to 600 CE were consistent with being nearly direct descendants of Yangtze Valley first farmers who likely spread Austronesian, Tai-Kadai and Austroasiatic languages across Southeast and South Asia and mixing with the people they encountered, contributing to a four-fold reduction of genetic differentiation during the emergence of complex societies. We finally report data from Jomon huntergatherers from Japan who harbored one of the earliest splitting branches of East Eurasian variation, and show an affinity among Jomon, Amur River Basin, ancient Taiwan, and Austronesian-speakers, as expected for ancestry if they all had contributions from a Late Pleistocene coastal route migration to East Asia. 123 Main text 124 East Asia, one of the oldest centers of animal and plant domestication, today harbors 125 more than a fifth of the world’s human population, with present-day groups speaking 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 126 languages representing eleven major families: Sino-Tibetan, Tai-Kadai, Austronesian, 127 Austroasiatic, Hmong-Mien, Indo-European, Altaic (Mongolic, Turkic, and 128 Tungusic), Koreanic, Japonic, Yukgahiric, and Chukotko-Kanchatkan1. The past 129 10,000 years have been a period of profound economic and cultural change in East 130 Asia, but our current understanding of the genetic diversity, major mixture events, and 131 population movements and turnovers during the transition from foraging to 132 agriculture remains poor due to minimal sampling of the diversity of present-day 133 people on the Tibetan Plateau and southern China2. A particular limitation has been a 134 deficiency in ancient DNA data, which has been a powerful tool for discerning the 135 deep history of populations in Western and Central Eurasia3-8. 136 137 We genotyped 383 present-day individuals from 46 populations indigenous to China 138 (n=337) and Nepal (n=46) using the Affymetrix Human Origins array (Table S1 and 139 Supplementary Information section 1). We also report genome-wide data from 191 140 ancient East Asians, many from cultural contexts for which there is no published 141 ancient DNA data. From Mongolia we report 89 individuals from 52 sites dating 142 between ~6000 BCE to ~1000 CE. From China we report 20 individuals from the 143 ~3000 BCE Neolithic site of Wuzhuangguoliang. From Japan we report 7 Jomon 144 hunter-gatherers from 3500-1500 BCE. From the Russian Far East we report 23 145 individuals: 18 from the Neolithic Boisman-2 cemetery at ~5000 BCE, 1 from the 146 Iron Age Yankovsky culture at ~1000 BCE, 3 from the Medieval Heishui Mohe and 147 Bohai Mohe culture at ~1000 CE; and 1 historic period hunter-gatherer from Sakhalin 148 Island. From archaeological sites in Eastern Taiwan—the Bilhun site at Hanben on the 149 main island and the Gongguan site on Green Island—we report 52 individuals from 150 the Late Neolithic through the Iron Age spanning at least 1400 BCE - 600 CE. 151 152 For all but the Chinese samples we enriched the ancient DNA for a targeted set of 153 about 1.2 million single nucleotide polymorphisms (SNPs)4,9, while for the 154 Wuzhuangguoliang samples from China we used exome capture (18 individuals) or 155 shotgun sequencing (2 individuals) (Figure 1, Supplementary Data files 1 and 2 and 156 Supplementary Information section 1). We performed quality control to test for 157 contamination by other human sequences, assessed by the rate of cytosine to thymine 158 substitution in the terminal nucleotide and polymorphism in mitochondrial DNA 159 sequences10 as well as X chromosome sequences in males, and restricted analysis to 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 160 individuals with minimal contamination11 (Online Table 1). We detected close kinship 161 between individuals at the same site, including a Boisman nuclear family with 2 162 parents and 4 children (Table S2). We merged the new data with previously reported 163 data: 4 Jomon individuals, 8 Amur River Basin Neolithic individuals from the Devil’s 164 Gate site, 72 individuals from the Neolithic to the Iron Age in Southeast Asia, and 8 165 from Nepal7,12-20. We assembled 123 radiocarbon dates using bone from the 166 individuals, of which 94 are newly reported (Online Table 3), and clustered 167 individuals based on time period and cultural associations, then further by genetic 168 cluster which in the Mongolian samples we designated by number (our group names 169 thus have the format “<Country>_<Time Period>_<Genetic Cluster>_<Cultural 170 Association If Any>”) (Supplementary Note, Table S1 and Online Table 1). We 171 merged the data with previously reported data (Online Table 4). 172 173 We carried out Principal Component Analysis (PCA) using smartpca21, projecting the 174 ancient samples onto axes computed using present-day people. The analysis shows 175 that population structure in East Asia is correlated with geographic and linguistic 176 categories, albeit with important exceptions. Groups in Northwest China, Nepal, and 177 Siberia deviate towards West Eurasians in the PCA (Supplementary Information 178 section 2, Figure 2), reflecting multiple episodes of West Eurasian-related admixture 179 that we estimate occurred 5 to 70 generations ago based on the decay of linkage 180 disequilibrium22 (Table S3 and Table S4). East Asians with minimal proportions of 181 West Eurasian-related ancestry fall along a gradient with three clusters at their poles. 182 The “Amur Basin Cluster” correlates geographically with ancient and present-day 183 populations living in the Amur River Basin, and linguistically with present-day 184 indigenous people speaking Tungusic languages and the Nivkh. The “Tibetan Plateau 185 Cluster” is most strongly represented in ancient Chokhopani, Mebrak, and Samzdong 186 individuals from Nepal15 and in present-day people speaking Tibetan-Burman 187 languages and living on the Tibetan Plateau. The “Southeast Asian Cluster” is 188 maximized in ancient Taiwan groups and present-day people in Southeast Asia and 189 southern parts of China speaking Austroasiatic, Tai-Kadai and Austronesian 190 languages (Figure S1, Figure S2). Han are intermediate among these clusters, with 191 northern Han projecting close to the Neolithic Wuzhuangguoliang individuals from 192 northern China (Figure 2). We observe two genetic clusters within Mongolia: one falls 193 closer to ancient individuals from the Amur Basin Cluster (‘East’ based on their 5 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 194 geography), and the second clusters toward ancient individuals of the Afanasievo 195 culture ( ‘West’), while a few individuals take intermediate positions between the two 196 (Supplementary Information section 2). 197 198 The three most ancient individuals of the Mongolia ‘East’ cluster are from the 199 Kherlen River region of eastern Mongolia (Tamsag-Bulag culture) and date to 6000- 200 4300 BCE (this places them in the Early Neolithic period, which in Northeast Asia is 201 defined by the use of pottery and not by agriculture23). These individuals are 202 genetically similar to previously reported Neolithic individuals from the cis-Baikal 203 region and have minimal evidence of West Eurasian-related admixture as shown in 204 PCA (Figure 2), f4-statistics and qpAdm (Table S5, Online Table 5, labeled as 205 Mongolia_East_N). The other seven Neolithic hunter-gatherers from northern 206 Mongolia (labeled as Mongolia_North_N) can be modeled as having 5.4% ± 1.1% 207 ancestry from a source related to previously reported West Siberian Hunter-gatherers 208 (WSHG)8 (Online Table 5), consistent with the PCA where they are part of an east- 209 west Neolithic admixture cline in Eurasia with increasing proximity to West Eurasians 210 in groups further west. Because of this ancestry complexity, we use the 211 Mongolia_East_N individuals without significant evidence of West Eurasian-related 212 admixture as reference points for modeling the East Asian-related ancestry in later 213 groups (Online Table 5). The two oldest individuals from the Mongolia ‘West’ cluster 214 have very different ancestry: they are from the Shatar Chuluu kurgan site associated 215 with the Afanasievo culture, with one directly dated to 3316-2918 calBCE (we quote 216 a 95% confidence interval here and in what follows whenever we mention a direct 217 date), and are indistinguishable in ancestry from previously published ancient 218 Afanasievo individuals from the Altai region of present-day Russia, who in turn are 219 similar to previously reported Yamnaya culture individuals supporting findings that 220 eastward Yamnaya migration had a major impact on people of the Afansievo 221 culture5,8. All the later Mongolian individuals in our time transect were mixtures of 222 Mongolian Neolithic groups and more western steppe-related sources, as reflected by 223 statistics of the form f3 (X, Y; Later Mongolian Groups), which resulted in 224 significantly negative Z scores (Z<−3) when Mongolia_East_N was used as X, and 225 when Yamnaya-related Steppe populations, AfontovaGora3, WSHG, or European 226 Middle/Late Neolithic or Bronze Age populations were used as Y (Table S6). 227 6 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 228 To quantify the admixture history of the later Mongolians, we again used qpAdm. A 229 large number of groups could be modeled as simple two-way admixtures of 230 Mongolia_East_N as one source (in proportions of 65-100%) and WSHG as the other 231 source (in proportions of 0-35%), with negligible contribution from Yamnaya-related 232 sources as confirmed by including Russia_Afanasievo and Russia_Sintashta groups in 233 the outgroup set (Figure 3). The groups that fit this model were not only the two 234 Neolithic groups (0-5% WSHG), but also the Early Bronze Age people from the 235 Afanasievo Kurgak govi site (15%), the Ulgii group (28%), the main grouping of 236 individuals from the Middle Bronze Age Munkhkhairkhan culture (33%), Late Bronze 237 Age burials of the Ulaanzuukh type (6%), a combined group from the Center-West 238 region (27%), the Mongun Taiga type from Khukh tolgoi (35%), and people of the 239 Iron Age Slab Grave culture (9%). A striking finding in light of previous 240 archaeological and genetic data is that the male child from Kurgak govi (individual 241 I13957, skeletal code AT_629) has no evidence of Yamnaya-related ancestry despite 242 his association with Afanasievo material culture (for example, he was buried in a 243 barrow in the form of circular platform edged by vertical stone slabs, in stretched 244 position on the back on the bottom of deep rectangular pit and with a typical 245 Afanasievo egg-shaped vessel (Supplementary Note); his late Afanasievo chronology 246 is confirmed by a direct radiocarbon date of 2858-2505 BCE24). This is the first 247 known case of an individual buried with Afanasievo cultural traditions who is not 248 overwhelmingly Yamnaya-related, and he also shows genetic continuity with an 249 individual buried at the same site Kurgak govi 2 in a square barrow (individual I6361, 250 skeletal code AT_635, direct radiocarbon date 2618-2487 BCE). We label this second 251 individuals as having an Ulgii cultural association, although a different archaeological 252 assessment associates this individual to the Afanasievo or Chemurchek cultures25, so 253 it is possible that this provides a second example of Afanasievo material culture being 254 adopted by individuals without any Yamnaya ancestry. The legacy of the Yamnaya- 255 era spread into Mongolia continued in two individuals from the Chemurchek culture 256 whose ancestry can be only modeled by using Afanasievo as one of the sources 257 (49.0%±2.6%, Online Table 5). This model fits even when ancient European farmers 258 are included in the outgroups, showning that if the long-distance transfer of West 259 European megalithic cultural traditions to people of the Chemurchek culture that has 260 been suggested in the archaeological literature occurred,26 it must have been through 261 spread of ideas rather than through movement of people. 7 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 262 263 Beginning in the Middle Bronze Age in Mongolia, there is no compelling evidence 264 for a persistence of the Yamnaya-derivd lineages originally spread into the region 265 with Afanasievo. Instead in the Late Bronze Age and Iron Age and afterward we have 266 data from multiple Mongolian groups whose Yamnaya-related ancestry can only be 267 modeled as deriving not from the initial Afanasievo migration but instead from a later 268 eastward spread into Mongolia related to people of the Middle to Late Bronze Age 269 Sintashta and Andronovo horizons who were themselves a mixture of ~2/3 Yamnaya- 270 related and 1/3 European farmer-related ancestry5,7,8. The Sintashta-related ancestry is 271 detected in proportions of 5% to 57% in individuals from the 272 Mongolia_LBA_6_Khovsgol (a culturally mixed group from the literature14), 273 Mongolia_LBA_3_MongunTaiga, Mongolia_LBA_5_CenterWest, 274 Mongolia_EIA_4_Sagly, Mongolia_EIA_6_Pazyryk, and Mongolia_Mongol groups, 275 with the most substantial proportions of Sintashta-related ancestry always coming 276 from western Mongolia (Figure 3, Online Table 5). For all these groups, the qpAdm 277 ancestry models pass when Afanasievo is included in the outgroups while models 278 with Afanasievo treated as the source with Sintashta more distantly related outgroups 279 are all rejected (Figure 3, Online Table 5). Starting from the Early Iron Age, we 280 finally detect evidence of gene flow in Mongolia from groups related to Han Chinese. 281 Specifically, when Han are included in the outgroups, our models of mixtures in 282 different proportions of Mongolia_East_N, Russia_Afanasievo, Russia_Sintashta, and 283 WSHG continue to work for all Bronze Age and Neolithic groups, but fail for an 284 Early Iron Age individual from Tsengel sum (Mongolia_EIA_5), and for Xiongnu and 285 Mongols. When we include Han Chinese as a possible source, we estimate ancestry 286 proportions of 20-40% in Xiongnu and Mongols (Online Table 5). 287 288 While the Afanasievo-derived lineages are consistent with having largely disappeared 289 in Mongolia by the Late Bronze Age when our data showed that later groups with 290 Steppe pastoralist ancestry made an impact, we confirm and strengthen previous 291 ancient DNA analysis suggesting that the legacy of this expansion persisted in 292 western China into the Iron Age Shirenzigou culture (410-190 BCE)27. The only 293 parsimonious model for this group that fits according to our criteria is a 3-way 294 mixture of groups related to Mongolia_N_East, Russia_Afanasievo, WSHG. The only 295 other remotely plausible model (although not formally a good fit) also requires 8 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 296 Russia_Afanasievo as a source (Figure 3, Online Table 5). The findings of the original 297 study that reported evidence that the Afanasievo spread was the source of Steppe 298 ancestry in the Iron Age Shirenzigou have been questioned with the proposal of 299 alternative models that use ancient Kazakh Steppe Herders from the site of Botai, 300 Wusun, Saka and ancient Tibetans from the site of Mebrak15 in present-day Nepal as 301 major sources for Steppe and East Asian-related ancestry28. However, when we fit 302 these models with Russia_Afanasievo and Mongolian_East_N added to the outgroups, 303 the proposed models are rejected (P-values between 10-7 and 10-2), except in a model 304 involving a single low coverage Saka individual from Kazakhstan as a source 305 (P=0.17, likely reflecting the limited power to reject models with this low coverage). 306 Repeating the modeling using other ancient Nepalese with very similar genetic 307 ancestry to that in Mebrak results in uniformly poor fits (Online Table 5). Thus, 308 ancestry typical of the Afanasievo culture and Mongolian Neolithic contributed to the 309 Shirenzigou individuals, supporting the theory that the Tocharian languages of the 310 Tarim Basin—from the second-oldest-known branch of the Indo-European language 311 family—spread eastward through the migration of Yamnaya steppe pastoralists to the 312 Altai Mountains and Mongolia in the guise of the Afansievo culture, from where they 313 spread further to Xinjiang5,7,8,27,29,30. These results are significant for theories of Indo- 314 European language diversification, as they increase the evidence in favor of the 315 hypothesis the branch time of the second-oldest branch in the Indo-European language 316 tree occurred at the end of the fourth millennium BCE27,29,30. 317 318 The individuals from the ~5000 BCE Neolithic Boisman culture and the ~1000 BCE 319 Iron Age Yankovsky culture together with the previously published ~6000 BCE data 320 from Devil’s Gate cave19 are genetically very similar, documenting a continuous 321 presence of this ancestry profile in the Amur River Basin stretching back at least to 322 eight thousand years ago (Figure 2 and Figure S2). The genetic continuity is also 323 evident in the prevailing Y chromosomal haplogroup C2b-F1396 and mitochondrial 324 haplogroups D4 and C5 of the Boisman individuals, which are predominant lineages 325 in present-day Tungusic, Mongolic, and some Turkic-speakers. The Neolithic 326 Boisman individuals shared an affinity with Jomon as suggested by their intermediate 327 positions between Mongolia_East_N and Jomon in the PCA and confirmed by the 328 significantly positive statistic f4 (Mongolia_East_N, Boisman; Mbuti, Jomon). 329 Statistics such as f4 (Native American, Mbuti; Test East Asian, 9 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 330 Boisman/Mongolia_East_N) show that Native Americans share more alleles with 331 Boisman and Mongolia_East_N than they do with the great majority of other East 332 Asians in our dataset (Table S5). It is unlikely that these statistics are explained by 333 back-flow from Native Americans since Boisman and other East Asians share alleles 334 at an equal rate with the ~24,000-year-old Ancient North Eurasian MA1 who was 335 from a population that contributed about 1/3 of all Native American ancestry31. A 336 plausible explanation for this observation is that the Boisman/Mongolia Neolithic 337 ancestry was linked (deeply) to the source of the East Asian-related ancestry in Native 338 Americans3,31. We can also model published data from Neolithic and Early Bronze 339 Age individuals around Lake Baikal7 as sharing substantial ancestry (77-94%) with 340 the lineage represented by Mongolia_East_N, revealing that this type of ancestry was 341 once spread over a wide region spanning across Lake Baikal, eastern Mongolia, and 342 the Amur River Basin (Table S7). Some present-day populations around the Amur 343 River Basin harbor large fractions of ancestry consistent with deriving from more 344 southern East Asian populations related to Han Chinese (but not necessarily Han 345 themselves) in proportions of 13-50%. We can show that this admixture occurred at 346 least by the Early Medieval period because one Heishui_Mohe individual (I3358, 347 directly dated to 1050-1220 CE) is estimated to have harbored more than 50% 348 ancestry from Han or related groups (Table S8). 349 350 The Tibetan Plateau, with an average elevation of more than 4,000 meters, is one of 351 the most extreme environments in which humans live. Archaeological evidence 352 suggests two main phases for modern human peopling of the Tibetan Plateau. The 353 first can be traced back to at least ~160,000 years ago probably by Denisovans32 and 354 then to 40,000-30,000 years ago as reflected in abundant blade tool assemblages33. 355 However, it is only in the last ~3,600 years that there is evidence for continuous 356 permanent occupation of this region with the advent of agriculture34. We grouped 17 357 present-day populations from the highlands into three categories based on genetic 358 clustering patterns (Figure S3): “Core Tibetans” who are closely related to the ancient 359 Nepal individuals such as Chokopani with a minimal amount of admixture with 360 groups related to West Eurasians and lowland East Asians in the last dozens of 361 generations, “northern Tibetans” who are admixed between lineages related to Core 362 Tibetans and West Eurasians, and “Tibeto-Yi Corridor” populations (the eastern edge 363 of the Tibetan Plateau connecting the highlands to the lowlands) that includes not just 10 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 364 Tibetan speakers but also Qiang and Lolo-Burmese speakers who we estimate using 365 qpAdm4,35 have 30-70% Southeast Asian Cluster-related ancestry (Table S9). We 366 computed f3 (Mbuti; Core Tibetan, non-Tibetan East Asian) to search for non-Tibetans 367 that share the most genetic drift with Tibetans. Neolithic Wuzhuangguoliang, Han and 368 Qiang appear at the top of the list (Table S10), suggesting that Tibetans harbor 369 ancestry from a population closely related to Wuzhuangguoliang that also contributed 370 more to Qiang and Han than to other present-day East Asian groups. We estimate that 371 the mixture occurred 60-80 generations ago (2240-1680 years ago assuming 28 years 372 per generation36 under a model of a single pulse of admixture (Table S11). This 373 represents an average date and so only provides a lower bound on when these two 374 populations began to mix; the start of their period of admixture could plausibly be as 375 old as the ~3,600-year-old date for the spread of agriculture onto the Tibetan plateau. 376 These findings are therefore consistent with archaeological evidence that expansions 377 of farmers from the Upper and Middle Yellow River Basin influenced populations of 378 the Tibetan Plateau from the Neolithic to the Bronze Age as they spread across the 379 China Central plain37,38, and with Y chromosome evidence that the shared common 380 haplogroup Oα-F5 between Han and Tibetans coalesced to a common ancestry less 381 than 5,800 years ago39. 382 383 In the south, we find that the ancient Taiwan Hanben and Gongguan culture 384 individuals dating from at least a span of 1400 BCE - 600 CE are genetically most 385 similar to present-day Austronesian speakers and ancient Lapita individuals from 386 Vanuatu as shown in outgroup f3-statistics and significantly positive f4-statistics 387 (Taiwan_Hanben/Gongguan, Mbuti; Ami/Atayal/Lapita, other Asians) (Table S8). 388 The similarity to Austronesian-speakers is also evident in the Iron Age dominant 389 paternal Y chromosome lineage O3a2c2-N6 and maternal mtDNA lineages E1a, 390 B4a1a, F3b1, and F4b, which are widespread lineages among Austronesian- 391 speakers40,41. We compared the present-day Austronesian-speaking Ami and Atayal of 392 Taiwan with diverse Asian populations using statistics like f4 (Taiwan Iron 393 Age/Austronesian, Mbuti; Asian1, Asian2). Ancient Taiwan groups and Austronesian- 394 speakers share significantly more alleles with Tai-Kadai speakers in southern 395 mainland China and in Hainan Island42 than they do with other East Asians (Table 396 S8), consistent with the hypothesis that ancient populations related to present-day Tai- 397 Kadai speakers are the source for the spread of agriculture to Taiwan island around 11 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 398 5000 years ago43. The Jomon share alleles at an elevated rate with ancient Taiwan 399 individuals and Ami/Atayal as measured by statistics of the form f4 (Jomon, Mbuti; 400 Ancient Taiwan/Austronesian-speaker, other Asians) compared with other East Asian 401 groups, with the exception of groups in the Amur Basin Cluster (Table S8)44. 402 403 The Han Chinese are the world’s largest ethnic group. It has been hypothesized based 404 on the archaeologically documented spread of material culture and farming 405 technology, as well as the linguistic evidence of links among Sino-Tibetan languages, 406 that one of the ancestral populations of the Han might have consisted of early farmers 407 along the Upper and Middle Yellow River in northern China, some of whose 408 descendants also may have spread to the Tibetan Plateau and contributed to present- 409 day Tibeto-Burmans45. Archaeological and historical evidence document how during 410 the past two millennia, the Han expanded south into regions inhabited by previously 411 established agriculturalists46. Analysis of genome-wide variation among present-day 412 populations has revealed that the Han Chinese are characterized by a “North-South” 413 cline47,48, which is confirmed by our analysis. The Neolithic Wuzhuangguoliang, 414 present-day Tibetans, and Amur River Basin populations, share significantly more 415 alleles with Han Chinese compared with the Southeast Asian Cluster, while the 416 Southeast Asian Cluster groups share significantly more alleles with the majority of 417 Han Chinese groups when compared with the Neolithic Wuzhuangguoliang (Table 418 S12, Table S13). These findings suggest that Han Chinese may be admixed in variable 419 proportions between groups related to Neolithic Wuzhuangguoliang and people 420 related to those of the Southeast Asian Cluster. To determine the minimum number of 421 source populations needed to explain the ancestry of the Han, we used qpWave4,49 to 422 study the matrix of all possible statistics of the form f4 (Han1, Han2; O1, O2), where 423 “O1” and “O2” are outgroups that are unlikely to have been affected by recent gene 424 flow from Han Chinese. This analysis confirms that two source populations are 425 consistent with all of the ancestry in most Han Chinese groups (with the exception of 426 some West Eurasian-related admixture that affects some northern Han Chinese in 427 proportions of 2-4% among the groups we sampled; Table S14 and Table S15). 428 Specifically, we can model almost all present-day Han Chinese as mixtures of two 429 ancestral populations, in a variety of proportions, with 77-93% related to Neolithic 430 Wuzhuangguoliang from the Yellow River basin, and the remainder from a 431 population related to ancient Taiwan that we hypothesize was closely related to the 12 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 432 rice farmers of the Yangtze River Basin. This is also consistent with our inference that 433 the Yangtze River farmer related ancestry contributed nearly all the ancestry of 434 Austronesian speakers and Tai-Kadai speakers and about 2/3 of some Austroasiatic 435 speakers17,20 (Figure 4). A caveat is that there is a modest level of modern 436 contamination in the Wuzhuangguoliang we use as a source population for this 437 analysis (Online Table 1), but this would not bias admixture estimates by more than 438 the contamination estimate of 3-4%. The average dates of West Eurasian-related 439 admixture in northern Han Chinese populations Han_NChina and Han_Shanxi are 32- 440 45 generations ago, suggesting that mixture was continuing at the time of the Tang 441 Dynasty (618-907 CE) and Song Dynasty (960-1279 BCE) during which time there 442 are historical records of integration of Han Chinese amd western ethnic groups, but 443 this date is an average so the mixture between groups could have begun earlier. 444 445 To obtain insight into the formation of present-day Japanese archipelago populations, 446 we searched for groups that contribute most strongly to present-day Japanese through 447 admixture f3-statistics. The most strongly negative signals come from mixtures of Han 448 Chinese and ancient Jomon (f3(Japanese; Han Chinese, Jomon)) (Table S16). We can 449 model present-day Japanese as two-way mixtures of 84.3% Han Chinese and 15.7% 450 Jomon or 87.6% Korean and 12.4% Jomon (we cannot distinguish statistically 451 between these two sources; Table S17 and Table S18). This analysis by no means 452 suggests that the mainland ancestry in Japan was contributed directly by the Han 453 Chinese or Koreans themselves, but does suggest that it is from an ancestral 454 population related to those that contributed in large proportion to Han Chinese as well 455 as to Koreans for which we do not yet have ancient DNA data. 456 457 We used qpGraph35 to explore models with population splits and gene flow, and 458 tested their fit to the data by computing f2-, f3- and f4- statistics measuring allele 459 sharing among pairs, triples, and quadruples of populations, evaluating fit based on 460 the maximum |Z|-score comparing predicted and observed values. We further 461 constrained the models by using estimates of the relative population split times 462 between the selected pairs of populations based on the output of the MSMC 463 software50. While admixture graph modeling based on allele frequency correlation 464 statistics is not able to reject a model in which ancient Taiwan individuals and 465 Boisman share substantial ancestry with each other more recently than either does 13 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 466 with the ancestors of Chokopani and Core Tibetans, this model cannot be correct 467 because our MSMC analysis reveals that Core Tibetans (closely related to Chokopani) 468 and Ulchi (closely related to Boisman) share ancestry more recently in time on 469 average than either does with Ami (related to Taiwan_Hanben). This MSMC-based 470 constraint allowed us to identify a parsimonious working model for the deep history 471 of key lineages discussed in this study (Supplementary Information section 3: 472 qpGraph Modeling). Our fitted model (Figure 5), suggests that much of East Asian 473 ancestry today can be modelled as derived from two ancient populations: one from the 474 same lineage as the approximately ~40,000-year-old Tianyuan individual and the 475 other more closely related to Onge, with groups today having variable proportions of 476 ancestry from these two deep sources. In this model, the Mongolia_East_N and Amur 477 River Basin Boisman related lineages derive the largest proportion of their ancestry 478 from the Tianyuan-related lineage and the least proportion of ancestry from the Onge- 479 related lineage compared with other East Asians. A sister lineage of 480 Mongolia_East_N is consistent with expanding into the Tibetan Plateau and mixing 481 with the local hunter-gatherers who represent an Onge-related branch in the tree. The 482 Taiwan Hanben are well modelled as deriving about 14% of their ancestry from a 483 lineage remotely related to Onge and the rest of their ancestry from a lineage that also 484 contributed to Jomon and Boisman on the Tianyuan side, a scenario that would 485 explain the observed affinity among Jomon, Boisman and Taiwan Hanben. We 486 estimate that Jomon individuals derived 45% of their ancestry from a deep basal 487 lineage on the Onge side. These results are consistent with the scenario a Late 488 Pleistocene coastal route of human migration linking Southeast Asia, the Japanese 489 Archipelago and the Russian Far East51. Due to the paucity of ancient genomic data 490 from Upper Paleolithic East Asians, there are limited constraints at present for 491 reconstructing the deep branching patterns of East Asian ancestral populations, and it 492 is certain that this admixture graph is an oversimplification and that additional 493 features of deep population relationships will be revealed through future work. 494 495 At the end of the last Ice Age, there were multiple highly differentiated populations in 496 East as well as West Eurasia, and it is now clear that these groups mixed in both 497 regions, instead of one population displacing the others. In West Eurasia, there were 498 at least four divergent populations each as genetically differentiated from each other 499 as Europeans and East Asians today (average FST=0.10), which mixed in the 14 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 500 Neolithic, reducing heterogeneity (average FST=0.03) and mixed further in the Bronze 501 Age and Iron Age to produce the present-relatively low differentiation that 502 characterizes modern West Eurasia (average FST=0.01)52. In East Eurasia, our study 503 suggests an analogous process, with the differentiation characteristic of the Amur 504 River Basin groups, Neolithic Yellow River farmers, and people related to those of 505 the Taiwan Iron Age (average FST=0.06 in our data) collapsing through mixture to 506 today’s relatively low differentiation (average FST=0.01-0.02) (Figure 6). A priority 507 should be to obtain ancient DNA data for the hypothesized Yangtze River population 508 (the putative source for the ancestry prevalent in the Southeast Asian Cluster of 509 present-day groups), which should, in turn, make it possible to test and further extend 510 these models, and in particular to understand if dispersals of people in Southeast Asia 511 do or do not correlate to ancient movements of people. 15 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 512 References 513 1. Cavalli-Sforza, L. L. The Chinese human genome diversity project. Proc. Natl. Acad. Sci. 514 515 516 517 518 519 520 521 522 USA 95, 11501-11503 (1998). 2. HUGO Pan-Asian SNP Consortium. Mapping human genetic diversity in Asia. Science 326, 1541-1545 (2009). 3. Lazaridis, I., et al. Ancient human genomes suggest three ancestral populations for presentday Europeans. Nature 513, 409-413 (2014). 4. Haak, W., et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015). 5. Allentoft, M.E., et al.. Population genomics of Bronze Age Eurasia. Nature 522,167-172 (2015). 523 6. Fu, Q., et al.. The genetic history of ice age Europe. Nature 534, 200-205 (2016). 524 7. de Barros Damgaard, P., et al.. 137 ancient human genomes from across the Eurasian steppes. 525 Nature 557, 369-374 (2018). 526 527 8. Narasimhan, V.M., et al. The formation of human populations in South and Central Asia. 528 9. Fu, Q., et al. An early modern human from Romania with a recent Neanderthal ancestor. 529 530 531 532 533 534 535 536 537 538 539 540 Science 365, eaat7487 (2019). Nature 524, 216–219 (2015). 10. Fu, Q.,et al. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl Acad. Sci. USA 110, 2223–2227 (2013). 11. Korneliussen, T. S., Albrechtsen, A., & Nielsen, R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics 15, 356 (2014). 12. Posth, C., et al. Language continuity despite population replacement in Remote Oceania. Nat Ecol Evol. 2, 731-740 (2018). 13. Sikora, M., et al. The population history of northeastern Siberia since the Pleistocene. Nature 570, 182-188 (2019). 14. Jeong, C., et al. The genetic history of admixture across inner Eurasia. Nat. Ecol. Evol. 3, 966–976 (2019). 15. Jeong, C., et al. Long-term genetic stability and a high-altitude East Asian origin for the 541 peoples of the high valleys of the Himalayan arc. Proc. Natl. Acad. Sci. USA 113, 7485–7490 542 (2016). 543 544 545 546 547 548 16. Skoglund, P., et al. Genomic insights into the peopling of the Southwest Pacific. Nature 538, 510-513 (2016). 17. Lipson, M., et al. Ancient genomes document multiple waves of migration in Southeast Asian prehistory. Science 361, 92-95 (2018). 18. Kanzawa-Kiriyama, H., et al. A partial nuclear genome of the Jomons who lived 3000 years ago in Fukushima, Japan. J. Hum. Genet 62, 213–221 (2016). 16 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 549 550 19. Siska, V., et al. Genome-wide data from two early Neolithic East Asian individuals dating to 7700 years ago. Sci Adv. 3, e1601877 (2017). 551 20. McColl, H., et al. The prehistoric peopling of Southeast Asia. Science. 361, 88-92 (2018). 552 21. Patterson, N., Price, A. L., & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, 553 554 555 556 557 558 e190 (2006). 22. Loh, P.R., et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193, 1233-1254 (2013). 23. Jordan, P. & Zvelebil M. ed. Ceramics Before Farming: The Dispersal of Pottery Among Prehistoric Eurasian Hunter-Gatherers (Routledge, New York, 2010). 24. Kovalev, A. A., & Erdenebaatar, D. Discovery of New Cultures of the Bronze Age in 559 Mongolia according to the Data obtained by the International Central Asian Archaeological 560 Expedition. In Current Archaeological Research in Mongolia, (eds Bemmann, J., H. 561 Parzinger, H., Pohl, E., D. Tseveendorzh, D.) 149–170 (Bonn: Vor- und Frügeschichtliche 562 Archäologie Rheinische Friedrich-Wilhelm-Universität Bonn, 2009). 563 564 25. Wilkins, S., et al. Dairy pastoralism sustained eastern Eurasian steppe populations for 5,000 years. Nat Ecol Evol 4, 346–355 (2020). 565 26. Kovalev, A. The Great Migration of the Chemurchek People from France to the Altai in the 566 Early 3rd Millennium BCE . International Journal of Eurasian Studies. 1(11) , pp. 1-58 567 (2011). 568 569 570 571 572 573 574 575 576 577 578 579 580 581 27. Ning, C., et al. Ancient Genomes Reveal Yamnaya-Related Ancestry and a Potential Source of Indo-European Speakers in Iron Age Tianshan. Curr Biol. 29, 2526-2532.e4 (2019). 28. Weslowski. Eurogenes Blog: A surprising twist to the Shirenzigou nomads story (2019).https://eurogenes.blogspot.com/2019/08/a-surprising-twist-to-shirenzigou.html. 29. Mallory, J.P. In Search of the INDO-Europeans: Language, Archaeology and Myth (Thames & Hudson, New York, 1991). 30. Anthony, D. The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World (Princeton University Press, Princeton and Oxford, 2007). 31. Raghavan, M., et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87-91 (2014). 32. Chen, F., et al. A late Middle Pleistocene Denisovan mandible from the Tibetan Plateau. Nature 569, 409-412 (2019). 33. Zhang, X. L., et al. The earliest human occupation of the high-altitude Tibetan Plateau 40 thousand to 30 thousand years ago. Science 362, 1049-1051 (2018). 582 583 34. Chen, F.H., et al. Agriculture facilitated permanent human occupation of the Tibetan Plateau 584 35. Patterson, N., et al. Ancient admixture in human history. Genetics 192, 1065-1093 (2012). after 3600 B.P. Science 347, 248-250 (2015). 17 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 585 36. Moorjani, M., Sankararaman, S., Fu, Q., Przeworski, M., Patterson, N., & Reich, D. A genetic 586 method for dating ancient genomes provides a direct estimate of human generation interval in 587 the last 45,000 years. Proc. Natl. Acad. Sci. USA 113, 5652-5657 (2016). 588 589 590 591 592 37. Barton, L., et al. Agricultural origins and the isotopic identity of domestication in northern China. Proc. Natl. Acad. Sci. USA 106, 5523-5528 (2009). 38. Yang, X., et al. Early millet use in northern China. Proc. Natl. Acad. Sci. USA 109, 3726–3730 (2012). 39. Wang, L.X., et al. Reconstruction of Y-chromosome phylogeny reveals two neolithic 593 expansions of Tibeto-Burman populations. Mol Genet Genomics. 293, 1293-1300 (2018). 594 40. Wei, L.H., et al. Phylogeography of Y-chromosome haplogroup O3a2b2-N6 reveals patrilineal 595 traces of Austronesian populations on the eastern coastal regions of Asia. PLoS One 12, 596 e0175080 (2017). 597 598 599 600 601 602 603 604 41. Ko, A.M., et al. Early Austronesians: into and out of Taiwan. Am. J. Hum. Genet. 94, 426-36 (2014). 42. Lipson, M., et al. Reconstructing Austronesian population history in island Southeast Asia. Nat Commun. 5, 4689 (2014). 43. Bellwood, P. The checkered prehistory of rice movement southwards as a domesticated cereal—from the Yangzi to the equator. Rice 4, 93-103 (2011). 44. Kanzawa-Kiriyama H., et al. Late Jomon male and female genome sequences from the Funadomari site in Hokkaido, Japan. Anthropol Sci. 127, 83-108 (2019). 605 606 45. Su, B., et al. Y chromosome haplotypes reveal prehistorical migrations to the Himalayas. 607 46. Ge, J. X., Wu, S. D., & Chao, S. J. Zhongguo yimin shi (The Migration History of China) 608 609 610 611 612 613 614 615 616 617 618 619 620 Hum. Genet. 107, 582-590 (2000). (Fujian People’s Publishing House, Fuzhou, 1997). 47. Chen, J., et al. Genetic structure of the Han Chinese population revealed by genome-wide SNP variation. Am. J. Hum. Genet. 85, 775-785 (2009). 48. Xu, S., et al. Genomic dissection of population substructure of Han Chinese and its implication in association studies. Am. J. Hum. Genet. 85, 762-774 (2009). 49. Reich, D., et al. Reconstructing Native American population history. Nature 488, 370-374 (2012). 50. Schiffels, S., & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat Genet. 46, 919-925 (2014). 51. Matsumura, H., et al. Craniometrics Reveal "Two Layers" of Prehistoric Human Dispersal in Eastern Eurasia. Sci Rep. 9, 1451 (2019). 52. Lazaridis, I., et al. Genomic insights into the origin of farming in the ancient Near East. Nature 536, 419–424 (2016). 621 622 18 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 623 Methods 624 Ancient DNA laboratory work 625 All samples except those from Wuzhuangguoliang were prepared in dedicated clean 626 room facilities at Harvard Medical School, Boston, USA. Online Table 2 lists 627 experimental settings for each sample and library included in the dataset. Skeletal 628 samples were surface cleaned and drilled or sandblasted and milled to produce a fine 629 powder for DNA extraction53,54. We then either followed the extraction protocol by 630 Dabney et al55 replacing the extender-MinElute-column assembly with the columns 631 from the Roche High Pure Viral Nucleic Acid Large Volume Kit56 (manual 632 extraction) or, for samples prepared later, used DNA extraction protocol based on 633 silica beads instead of spin columns (and Dabney buffer) to allow for automated DNA 634 purification57 (robotic extraction). We prepared individually barcoded double- 635 stranded libraries for most samples using a protocol that included a DNA repair step 636 with Uracil-DNA-glycosylase (UDG) treatment to cut molecules at locations 637 containing ancient DNA damage that is inefficient at the terminal positions of DNA 638 molecules (Online Table 1, UDG: “half”)58, or, without UDG pre-treatment (double 639 stranded minus). For a few samples processed later, single stranded DNA libraries59 640 were prepared with USER (NEB) addition in the dephosphorylation step that results 641 in inefficient uracil removal at the 5’end of the DNA molecules, and does not affect 642 deamination rates at the terminal 3’ end60. We performed target enrichment via 643 hybridization of these libraries with previously reported protocols10. We either 644 enriched for the mitochondrial genome and 1.2M SNPs in two separate experiments 645 or together in a single experiment. If split over two experiments, the first enrichment 646 was for sequences aligning to mitochondrial DNA58,61 with some baits overlapping 647 nuclear targets spiked in to screen libraries for nuclear DNA content. The second in- 648 solution enrichment was for a targeted set of 1,237,207 SNPs that comprises a merge 649 of two previously reported sets of 394,577 SNPs (390k capture)4 and 842,630 SNPs9. 650 We sequenced the enriched libraries on an Illumina NextSeq500 instrument for 2x76 651 cycles (and both indices) or on Hiseq X10 instruments at the Broad Institute of MIT 652 and Harvard for 2x101 cycles. We also shotgun sequenced each library for a few 653 hundred thousand reads to assess the fraction of human reads. 654 655 Ancient DNA extractions of the Wuzhuangguoliang samples were performed in the 656 clean room at Xi'an Jiaotong University and Xiamen University following the 19 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 657 protocol by Rohland and Hofreiter62. Each sample extract was converted into double- 658 stranded Illumina libraries following the manufacturer’s protocol (Fast Library Prep 659 Kit, iGeneTech, Beijing, China). Sample-speciﬁc indexing barcodes were added to 660 both sides of the fragments via ampliﬁcation. Nuclear DNA capture was performed 661 with AIExome Enrichment Kit V1 (iGeneTech, Beijing, China) according to the 662 manufacturer’s protocol and sequenced on an Illumina NovaSeq instrument with 150 663 base pair paired-end reads. Sequences that did not perfectly match one of the expected 664 index combinations were discarded. 665 666 For the AH1-7 and AH1-17 DNA extracts, we prepared whole genome sequencing 667 libraries. The two DNA extracts were converted into barcoded Illumina sequencing 668 libraries using commercially available library kits (NEBNext® Ultra™ II DNA 669 Library Prep Kit) and Illumina-specific primers63. DNA libraries were not treated with 670 uracil-DNA-glycosylase (UDG) 59. We used a MinElute Gel Extraction Kit (Qiagen, 671 Hilden, Germany) for purification. Two libraries were sequenced on a HiSeqX10 672 instrument (2×150 bp, PE) at the Novogene Sequencing Centre (Beijing, China). The 673 base calling was performed using CASAVA software. 674 675 Bioinformatic processing 676 For the sequencing data produced at Harvard Medical School, we used one of two 677 pipelines (“pipeline 1” or “pipeline 2”; Online Table 2). An up-to-date description of 678 both pipelines and analyses showing that the differences between them do not cause 679 systematic bias in population genetic analysis can be found in Fernandes et al64. For 680 both pipelines we began by de-multiplexed the data and assigning sequences to 681 samples based on the barcodes and/or indices, allowing up to one mismatch per 682 barcode or index. We trimmed adapters and restricted to fragments where the two 683 reads overlapped by at least 15 nucleotides. In pipeline 1 we merged the sequences 684 (allowing up to one mismatch) using a modified version of Seqprep65 where bases in 685 the merged region are chosen based on highest quality in case of a conflict, and in 686 pipeline 2 we used custom software (https://github.com/DReichLab/ADNA-Tools). 687 For mitochondrial DNA analysis, we aligned the resulting merged sequences to the 688 RSRS reference genome66 using bwa (version 0.6.1 for pipeline 1 and version 0.7.15 689 for pipeline 2)67, and removed duplicates with the same orientation, start and stop 690 positions, and molecular barcodes. We determined mitochondrial DNA haplogroups 20 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 691 using HaploGrep268. We also analyzed the sequences to generate two assessments of 692 ancient DNA authenticity. The first assessment estimated the rate of cytosine to 693 thymine substitution in the final nucleotide, which is expected to be at least 3% at 694 cytosines in libraries prepared with a partial UDG treatment protocol and at least 10% 695 for untreated libraries (minus) and single stranded libraries; all libraries we analyzed 696 met this threshold. The second assessment used contamMix (version 1.0.9 for pipeline 697 1 and 1.0.12 for pipeline 2)10 to determine the fraction of mtDNA sequences in an 698 ancient sample that match the endogenous majority consensus more closely than a 699 comparison set of 311 worldwide present-day human mtDNAs (Online Table 1). 700 Computational processing of the sequence data from the whole genome was the same 701 as the mtDNA enrichment except that the human genome (hg19) was used as the 702 target reference. Due to the low coverage, diploid calling was not possible; instead, 703 we randomly selected a single sequence covering every SNP position of interest 704 (“pseudo-haploid” data) using custom software, only using nucleotides that were a 705 minimum distance from the ends of the sequences to avoid deamination artifacts 706 (https://github.com/DReichLab/adna-workflow). The coverages and numbers of SNPs 707 covered at least once on the autosomes (chromosomes 1-22) are in Online Table 1. 708 709 For the sequencing data from the Wuzhuangguoliang samples, we clipped adaptors 710 with leehom69 and then further processed using EAGER70, including mapping with 711 bwa (v0.6.1)67 against the human genome reference GRCh37/hg19 (or just the 712 mitochondrial reference sequence), and removing duplicate reads with the same 713 orientation and start and end positions. To avoid an excess of remaining C-to-T and 714 G-to-A transitions at the ends of the sequences, we clipped three bases of the ends of 715 each read for each sample using trimBam 716 (https://genome.sph.umich.edu/wiki/BamUtil:_trimBam). We generated pseudo- 717 haploid calls by selecting a single read randomly for each individual using 718 pileupCaller (https://github.com/stschiff/sequenceTools/tree/master/srcpileupCaller). 719 720 Accelerator Mass Spectrometry Radiocarbon Dating 721 We generated 94 direct AMS (Accelerator Mass Spectrometry) radiocarbon (14C) 722 dates as part of this study; 87 at Pennsylvania State University (PSU) and 7 at Poznan 723 Radiocarbon Laboratory. The methods used at both laboratories are published, and 724 here we summarize the methods from PSU. Bone collagen from petrous, phalanx, or 21 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 725 tooth (dentine) samples was extracted and purified using a modified Longin method 726 with ultrafiltration (>30kDa gelatin)71. If bone collagen was poorly preserved or 727 contaminated we hydrolyzed the collagen and purified the amino acids using solid 728 phase extraction columns (XAD amino acids)72. Prior to extraction we sequentially 729 sonicated all samples in ACS grade methanol, acetone, and dichloromethane (30 730 minutes each) at room temperature to remove conservants or adhesives possibly used 731 during curation. Extracted collagen or amino acid preservation was evaluated using 732 crude gelatin yields (% wt), %C, %N and C/N ratios. Stable carbon and nitrogen 733 isotopes were measured on a Thermo DeltaPlus instrument with a Costech elemental 734 analyzer at Yale University. C/N ratios between 3.14 and 3.45 indicate that all 735 radiocarbon dated samples are well preserved. All samples were combusted and 736 graphitized at PSU using methods described in Kennett et al. 201771. 14C 737 measurements were made on a modified National Electronics Corporation 1.5SDH-1 738 compact accelerator mass spectrometer at either the PSUAMS facility or the Keck- 739 Carbon Cycle AMS Facility. All dates were calibrated using the IntCal13 curve73 in 740 OxCal v 4.3.274 and are presented in calendar years BCE/CE . 741 742 Y chromosomal haplogroup analysis 743 We performed Y-haplogroup determination by examining the state of SNPs present in 744 ISOGG version 11.89 (accessed March 31, 2016) and our unpublished updated 745 phylogeny. 746 747 X-chromosome contamination estimates 748 We performed an X-chromosomal contamination test for the male individuals 749 following an approach introduced by Rasmussen et al75 and implemented in the 750 ANGSD software suite11. We used the “MoM” (Methods of Moments) estimates. The 751 estimates for some males are not informative because of the limited number of X- 752 chromosomal SNPs covered by at least two sequences, and hence we only report 753 results for individuals with at least 200 SNPs covered at least twice. The estimated 754 contamination rates for the male samples are low (Online Table 1). The contamination 755 rates for all samples are quite low except those from Wuzhuangguoliang. We detected 756 3-6% contamination in the Wuzhuangguoliang samples, and restricted population 757 genetic modeling analysis only to three males with 3-4% contamination. 758 22 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 759 Data merging 760 We merged the data with previously published datasets genotyped on Affymetrix 761 Human Origins arrays3,35, restricting to individuals with >95% genotyping 762 completeness. We manually curated the data using ADMIXTURE76 and 763 EIGENSOFT21 to identify samples that were outliers compared with other samples 764 from their own populations. We removed seven individuals from subsequent analysis; 765 the population IDs for these individuals are prefixed by the string “Ignore_” in the 766 dataset we release, so users who wish to analyze these samples are still able to do so. 767 768 Principal Components Analysis. We carried out principal components analysis in 769 the smartpca program of EIGENSOFT21, using default parameters and the lsqproject: 770 YES and numoutlieriter: 0 options. 771 772 ADMIXTURE Analysis. We carried out ADMIXTURE analysis in unsupervised 773 mode76 after pruning for linkage disequilibrium in PLINK77 with parameters --indep- 774 pairwise 200 25 0.4 which retained 256,427 SNPs for Human Origin Dataset. We ran 775 ADMIXTURE with default 5-fold cross-validation (--cv=5), varying the number of 776 ancestral populations between K=2 and K=18 in 100 bootstraps with different random 777 seeds. 778 779 f-statistics. We computed f3-statistics and f4-statistics using ADMIXTOOLS35 with 780 default parameters. We computed standard errors using a block jackknife78. 781 782 FST computation. We estimated FST using EIGENSOFT21 with default parameters, 783 inbreed: YES, and fstonly: YES. We found that the inbreeding corrected and 784 uncorrected FST were nearly identical (within ~0.001), and in this study, always 785 analyzed uncorrected FST. 786 787 Admixture graph modeling. Admixture graph modeling was carried out with the 788 qpGraph software as implemented in ADMIXTOOLS35 using Mbuti as an outgroup. 789 790 Testing for the number of streams of ancestry. We used qpWave4,35 as 791 implemented in ADMIXTOOLS to test whether a set of test populations is consistent 792 with being related via N streams of ancestry from a set of outgroup populations. 23 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 793 794 Inferring mixture proportions without an explicit phylogeny. We used qpAdm4 as 795 implemented in ADMIXTOOLS to estimate mixture proportions for a Test population 796 as a combination of N ‘reference’ populations by exploiting (but not explicitly 797 modeling) shared genetic drift with a set of ‘Outgroup’ populations. 798 799 Weighted linkage disequilibrium (LD) analysis. LD decay was calculated using 800 ALDER22 to infer admixture parameters including dates and mixture proportions. 801 802 MSMC. We used MSMC50 following the procedures in Mallick et al79 to infer cross- 803 coalescence rates and population sizes among Ami/Atayal, Tibetan, and Ulchi. 804 805 Kinship analysis. We used READ software80 as well as a custom method81 to 806 determine genetic kinship between individual pairs. 807 808 Data availability 809 The aligned sequences are available through the European Nucleotide Archive under 810 accession number [to be made available on publication]. Genotype data used in 811 analysis are available at https://reich.hms.harvard.edu/datasets. Any other relevant 812 data are available from the corresponding author upon reasonable request. 813 814 53. Pinhasi, R., Fernandes, D.M., Sirak, K., & Cheronet, O. Isolating the human cochlea to 815 generate bone powder for ancient DNA analysis. Nat Protoc. 14, 1194-1205 (2019). 816 817 818 54. Sirak, K.A., et al., A minimally-invasive method for sampling human petrous bones from the cranial base for ancient DNA analysis. Biotechniques. 62, 283-289 (2017). 55. Dabney, J., et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear 819 reconstructed from ultrashort DNA fragments. Proc Natl Acad Sci U S A. 110, 15758-63 820 (2013). 821 822 823 824 825 56. Korlević, P. Reducing microbial and human contamination in DNA extractions from ancient bones and teeth. Biotechniques. 59, 87-93 (2015). 57. Rohland, N., Glocke, I., Aximu-Petri, A., & Meyer, M. Extraction of highly degraded DNA from ancient bones, teeth and sediments for high-throughput sequencing. Nat Protoc. 13, 2447-2461 (2018). 24 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 826 58. Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. & Reich, D. Partial uracil–DNA– 827 glycosylase treatment for screening of ancient DNA. Phil. Trans. R. Soc. Lond. B 370, 828 20130624 (2015). 829 830 831 832 833 834 835 836 837 838 839 59. Gansauge, M.T., & Meyer, M. Selective enrichment of damaged DNA molecules for ancient genome sequencing. Genome Res. 24, 1543-1549 (2014). 60. Meyer, M., et al., A high-coverage genome sequence from an archaic Denisovan individual. Science. 338, 222-226 (2012). 61. Maricic, T., Whitten, M., & Pääbo, S. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS One 5, e14004 (2010). 62. Rohland, N., & Hofreiter, M. Ancient DNA extraction from bones and teeth. Nat. Protoc. 2, 1756–1762 (2007). 63. Meyer, M., & Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 6, pdb.prot5448 (2010). 64. Fernandes, D.M., et al. The spread of steppe and Iranian-related ancestry in the islands of the 840 western Mediterranean. Nat Ecol Evol. 4, 334-345 (2020). 841 65. John, J. S. SeqPrep, https://github.com/jstjohn/SeqPrep (2011). 842 66. Behar, D.M., et al. A "Copernican" reassessment of the human mitochondrial DNA tree from 843 844 845 its root. Am J Hum Genet. 90, 675-84 (2012). Erratum in: Am J Hum Genet. 90, 936 (2012). 67. Li, H., & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25, 1754-1760 (2009). 846 847 68. Weissensteiner, H., et al. HaploGrep 2: mitochondrial haplogroup classification in the era of 848 69. Renaud, G., Stenzel, U., & Kelso, J. leeHom: adaptor trimming and merging for Illumina 849 850 851 852 853 854 high-throughput sequencing. Nucleic Acids Res. 44, W58–W63 (2016). sequencing reads. Nucleic Acids Res. 42, e141 (2014). 70. Peltzer, A., et al. EAGER: efficient ancient genome reconstruction. Genome Biol. 17, 60 (2016). 71. Kennett, D. J. et al. Archaeogenomic evidence reveals prehistoric matrilineal dynasty. Nat. Commun. 8, 14115 (2017). 72. Lohse, J. C., Madsen, D. B., Culleton, B. J. & Kennett, D. J. Isotope paleoecology of episodic 855 mid-to-late Holocene bison population expansions in the southern Plains, U.S.A. Quat. Sci. 856 857 858 859 Rev. 102, 14–26 (2014). 73. Reimer, P. J. et al. IntCal13 and Marine13 radiocarbon age calibration curves 0–50,000 years cal BP. Radiocarbon 55, 1869–1887 (2013). 74. Bronk Ramsey, C. Bayesian analysis of radiocarbon dates. Radiocarbon, 51, 337-360 (2009). 860 75. Rasmussen, M., et al. An Aboriginal Australian Genome Reveals Separate Human Dispersals 861 862 863 into Asia. Science 334, 94–98 (2011). 76. Alexander, D. H., Novembre, J., & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655-1664 (2009). 25 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 864 865 866 867 868 869 77. Chang, C., et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015). 78. Busing, F. T. A., Meijer, E., & Leeden, R. Delete-m Jackknife for Unequal m. Statistics and Computing 9, 3-8 (1999). 79. Mallick, S.M., et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations, Nature 538, 201-206 (2016). 870 871 80. Monroy, K.J.M., Jakobsson, M., & Günther, T. Estimating genetic kin relationships in 872 81. Kennett, D.J., et al. Archaeogenomic evidence reveals prehistoric matrilineal dynasty. Nat. 873 prehistoric populations. PLoS One 13, e0195491 (2018). Commun. 8, 14115 (2017). 874 875 Acknowledgements 876 We thank David Anthony, Ofer Bar-Yosef, Katherine Brunson, Rowan Flad, Pavel 877 Flegontov, Qiaomei Fu, Wolfgang Haak, Iosif Lazaridis, Mark Lipson, Iain 878 Mathieson, Richard Meadow, Inigo Olalde, Nick Patterson, Pontus Skoglund, and 879 Dan Xu for valuable conversations and critical comments. We thank Naruya Saitou 880 and the Asian DNA Repository Consortium for sharing genotype data from present- 881 day Japanese groups. We thank Toyohiro Nishimoto and Takashi Fujisawa from the 882 Rebun Town Board of Education for providing the Funadomari Jomon samples, and 883 Hideyo Tanaka and Watru Nagahara from the Archeological Center of Chiba City 884 who are excavators of the Rokutsu Jomon site. The excavations at Boisman-2 site 885 (Boisman culture), the Pospelovo-1 site (Yankovsky culture), and the Roshino-4 site 886 (Heishui Mohe culture) were funded by the Far Eastern Federal University and the 887 Institute of History Far Eastern Branch of the Russian Academu of Sciences, 888 researches Pospelovo-1 funded by RFBR project number 18-09-40101. C.C.W was 889 funded by the Max Planck Society, the National Natural Science Foundation of China 890 (NSFC 31801040), the Nanqiang Outstanding Young Talents Program of Xiamen 891 University (X2123302), and Fundamental Research Funds for the Central Universities 892 (ZK1144). O.B. and Y.B. were funded by Russian Scientific Foundation grant 17-14- 893 01345. H.M. was supported by the grant JSPS 16H02527. The research of M.R. and 894 C.C.W has received funding from the European Research Council (ERC) under the 895 European Union’s Horizon 2020 research and innovation programme (grant 896 agreement No 646612) granted to M.R. The research of C.S. is supported by the 897 Calleva Foundation and the Human Origins Research Fund. H.L was funded NSFC 898 (91731303, 31671297), B&R International Joint Laboratory of Eurasian 26 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. 899 Anthropology (18490750300). J.K. was funded by DFG grant KR 4015/1-1, the 900 Baden Württemberg Foundation, and the Max Planck Institute. Accelerator Mass 901 Spectrometry radiocarbon dating work was supported by the National Science 902 Foundation (BCS- 1460369) to D.J.K. and B.J.C). D.R. was funded by NSF 903 HOMINID grant BCS-1032255, NIH (NIGMS) grant GM100233, the Paul Allen 904 Foundation, the John Templeton Foundation grant 61220, and the Howard Hughes 905 Medical Institute. 906 907 Author Contributions 908 Conceptualization, C.-C.W., H.-Y.Y., A.N.P., H.M., A.M.K., L.J., H.L., J.K., R.P., 909 and D.R.; Formal Analysis, C.-C.W., R.B., M.Ma., S.M., Z.Z., B.J.C, and D.R.; 910 Investigation, C.-C.W., K.Si., O.C., A.K., N.R., A.M.K., M.Ma., S.M., K.W., N.A., 911 N.B., K.C., B.J.C, L.E., A.M.L., M.Mi., J.O., K.S., S.W., S.Y., F.Z., J.G., Q.D., L.K., 912 Da.L, Do.L, R.L., W.C., R.S., L.-X.W., L.W., G.X., H.Y., M.Z., G.H., X.Y., R.H., 913 S.S., D.J.K., L.J., H.L., J.K., R.P., and D.R.; Resources, H.-Y.Y., A.N.P., R.B., D.T., 914 J.Z., Y.-C.L, J.-Y.L., M.Ma., S.M., Z.Z., R.C., C.-J. H., C.-C.S., Y.G.N., A.V.T., 915 A.A.T., S.L., Z.-Y.S., X.-M.W., T.-L.Y., X.H., L.C., H.D., J.B., E.Mi., D.E., T.-O.I., 916 E.My., H.K.-K., M.N., K.Sh., D.J.K., R.P., and D.R.; Data Curation, C.-C.W., K.Si., 917 O.C., A.K., N.R., R.B., M.Ma., S.M., B.J.C, L.E., A.A.T., and D.R.; Writing, C.- 918 C.W., H.-Y.Y., A.N.P., H.M., A.K., and D.R.; Supervision, C.-C.W., H.-Q.Z., N.R., 919 M.R., S.S., D.J.K., L.J., H.L., J.K., R.P., and D.R. 920 921 Competing interests 922 The authors declare no competing interests. 27 Figure Legends Figure 1: Geographical locations of newly reported ancient individuals. We use different colors for the two ancient Mongolia clusters. Detailed information are given in Table S1, Online Table 1 and Supplemental Experimental Procedures. 28 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. Figure 2: Principal Component Analysis (PCA). (A) Projection of ancient samples onto PCA dimensions 1 and 2 defined by East Asians, Europeans, Siberians and Native Americans. (B) Projection onto groups with the little West Eurasian mixture. 29 Figure 3: qpAdm modeling of ancestry change over time in Mongolia. We use Mongolia_East_N, Afanasievo, WSHG, and Sintashta_MLBA as sources, and for each combined archaeological and genetic grouping identify maximally parsimonious models (fewest numbers of sources) that fit with P>0.05 (Online Table 5). We plot results for groupings that give a unique parsimonious model, and include at least one individual with data that “PASS” at high quality and with a confident chronological assignment (Online Table 1). The bars show proportions of each ancestry source, and we also include time spans for the individuals in the cluster. Groupings that include more eastern individuals (longitude >102.7 degrees) are indicated in green and typically have very little Yamnaya-related admixture even at late dates. 30 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. Figure 4: qpAdm modeling of Han Chinese cline. We used the ancient Wuzhuangguoliang as a proxy for Yellow River Farmers and Taiwan_Hanben as a proxy for Yangtze River Farmers related ancestry. 31 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.25.004606. this version posted March 25, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. Figure 5: qpGraph modeling of a subset of East Asians. We used all available sites in the 1240K dataset, restricting to transversions only to replicate key results (Supplementary Information). We started with a skeleton tree that fits the data with Denisova, Mbuti, Onge, Tianyuan and Loschbour and one admixture event. We then grafted on Mongolia_East_N, Jomon, Taiwan_Hanben, Chokhopani, and Boisman in turn, adding them consecutively to all possible edges in the tree and retaining only graph solutions that provided no differences of |Z|>3 between fitted and estimated statistics. We used the MSMC relative population split time to constrain models (the maximum discrepancy for this model is |Z|=2.8). Drifts along edges are multiplied by 1000. Dashed lines represent admixture. Deep population splits are not well constrained due to a lack of data from Upper Paleolithic East Asians. 32 Figure 6. Homogenization of East Asian populations through mixture. Pairwise FST distribution among populations belonging to four time slices in East Asia; the median (red) of FST is shown. 33

Log In

The Genomic Formation of Human Populations in East Asia

Related papers

Related papers