Abstract
The deep population history of East Asia remains poorly understood owing to a lack of ancient DNA data and sparse sampling of present-day people1,2. Here we report genome-wide data from 166 East Asian individuals dating to between 6000 bc and ad 1000 and 46 present-day groups. Hunter-gatherers from Japan, the Amur River Basin, and people of Neolithic and Iron Age Taiwan and the Tibetan Plateau are linked by a deeply splitting lineage that probably reflects a coastal migration during the Late Pleistocene epoch. We also follow expansions during the subsequent Holocene epoch from four regions. First, hunter-gatherers from Mongolia and the Amur River Basin have ancestry shared by individuals who speak Mongolic and Tungusic languages, but do not carry ancestry characteristic of farmers from the West Liao River region (around 3000 bc), which contradicts theories that the expansion of these farmers spread the Mongolic and Tungusic proto-languages. Second, farmers from the Yellow River Basin (around 3000 bc) probably spread Sino-Tibetan languages, as their ancestry dispersed both to Tibet—where it forms approximately 84% of the gene pool in some groups—and to the Central Plain, where it has contributed around 59–84% to modern Han Chinese groups. Third, people from Taiwan from around 1300 bc to ad 800 derived approximately 75% of their ancestry from a lineage that is widespread in modern individuals who speak Austronesian, Tai–Kadai and Austroasiatic languages, and that we hypothesize derives from farmers of the Yangtze River Valley. Ancient people from Taiwan also derived about 25% of their ancestry from a northern lineage that is related to, but different from, farmers of the Yellow River Basin, which suggests an additional north-to-south expansion. Fourth, ancestry from Yamnaya Steppe pastoralists arrived in western Mongolia after around 3000 bc but was displaced by previously established lineages even while it persisted in western China, as would be expected if this ancestry was associated with the spread of proto-Tocharian Indo-European languages. Two later gene flows affected western Mongolia: migrants after around 2000 bc with Yamnaya and European farmer ancestry, and episodic influences of later groups with ancestry from Turan.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
Data availability
The aligned sequences are available through the European Nucleotide Archive under accession number PRJEB42781. The newly generated genotype data of 383 modern East Asian individuals have been deposited in Zenodo (https://doi.org/10.5281/zenodo.4058532). The previously published data co-analysed with our newly reported data can be obtained as described in the original publications, which are all referenced in Supplementary Table 4; a compiled dataset that includes the merged genotypes used in this paper is available as the Allen Ancient DNA Resource at https://reich.hms.harvard.edu/allen-ancient-dna-resource- aadr-downloadable-genotypes-present-day-and-ancient-dna-data. Any other relevant data are available from the corresponding authors upon reasonable request.
References
Cavalli-Sforza, L. L. The Chinese human genome diversity project. Proc. Natl Acad. Sci. USA 95, 11501–11503 (1998).
HUGO Pan-Asian SNP Consortium. Mapping human genetic diversity in Asia. Science 326, 1541–1545 (2009).
Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).
Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).
Damgaard, P. B. et al. 137 ancient human genomes from across the Eurasian steppes. Nature 557, 369–374 (2018).
Narasimhan, V. M. et al. The formation of human populations in South and Central Asia. Science 365, eaat7487 (2019).
Fu, Q. et al. An early modern human from Romania with a recent Neanderthal ancestor. Nature 524, 216–219 (2015).
Fu, Q. et al. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl Acad. Sci. USA 110, 2223–2227 (2013).
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Loh, P. R. et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193, 1233–1254 (2013).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Yang, M. A. et al. 40,000-year-old individual from Asia provides insight into early population structure in Eurasia. Curr. Biol. 27, 3202–3208 (2017).
Massilani, D. et al. Denisovan ancestry and population history of early East Asians. Science 370, 579–583 (2020).
Wang, C. C. & Li, H. Inferring human history in East Asia from Y chromosomes. Investig. Genet. 4, 11 (2013).
Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
Yang, M. A. et al. Ancient DNA indicates human population shifts and admixture in northern and southern China. Science 369, 282–288 (2020).
Nakashima, A., Ishida, H., Shigematsu, M., Goto, M. & Hanihara, T. Nonmetric cranial variation of Jomon Japan: implications for the evolution of eastern Asian diversity. Am. J. Hum. Biol. 22, 782–790 (2010).
Bellwood, P. & Renfrew, C. Examining the Farming/Language Dispersal Hypothesis (McDonald Institute for Archaeological Research, 2002).
Robbeets, M. & Savelyev, A. The Oxford Guide to the Transeurasian Languages (Oxford Univ. Press, 2020).
Siska, V. et al. Genome-wide data from two early Neolithic East Asian individuals dating to 7700 years ago. Sci. Adv. 3, e1601877 (2017).
Kamberov, Y. G. et al. Modeling recent human evolution in mice by expression of a selected EDAR variant. Cell 152, 691–702 (2013).
Zhang, X. L. et al. The earliest human occupation of the high-altitude Tibetan Plateau 40 thousand to 30 thousand years ago. Science 362, 1049–1051 (2018).
Chen, F. H. et al. Agriculture facilitated permanent human occupation of the Tibetan Plateau after 3600 B.P. Science 347, 248–250 (2015).
Zhang, M., Yan, S., Pan, W. & Jin, L. Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic. Nature 569, 112–115 (2019).
van Driem, G. in The Peopling of East Asia: Putting Together Archaeology, Linguistics and Genetics (eds Sagart, L. et al.) 81–106 (Routledge, 2005).
Liu, S. et al. Genomic analyses from non-invasive prenatal testing reveal genetic associations, patterns of viral infections, and Chinese population history. Cell 175, 347–359 (2018).
Chiang, C. W. K., Mangul, S., Robles, C. & Sankararaman, S. A comprehensive map of genetic variation in the world’s largest ethnic group—Han Chinese. Mol. Biol. Evol. 35, 2736–2750 (2018).
Reich, D. et al. Reconstructing Native American population history. Nature 488, 370–374 (2012).
Lipson, M. et al. Ancient genomes document multiple waves of migration in Southeast Asian prehistory. Science 361, 92–95 (2018).
McColl, H. et al. The prehistoric peopling of Southeast Asia. Science 361, 88–92 (2018).
Wang, L. X. et al. Reconstruction of Y-chromosome phylogeny reveals two neolithic expansions of Tibeto-Burman populations. Mol. Genet. Genomics 293, 1293–1300 (2018).
Ge, J. X., Wu, S. D. & Chao, S. J. Zhongguo yimin shi (The Migration History of China) (Fujian People’s Publishing House, 1997).
Ning, C. et al. Ancient genomes from northern China suggest links between subsistence changes and human migration. Nat. Commun. 11, 2700 (2020).
Wei, L. H. et al. Phylogeography of Y-chromosome haplogroup O3a2b2-N6 reveals patrilineal traces of Austronesian populations on the eastern coastal regions of Asia. PLoS ONE 12, e0175080 (2017).
Ko, A. M. et al. Early Austronesians: into and out of Taiwan. Am. J. Hum. Genet. 94, 426–436 (2014).
Skoglund, P. et al. Genomic insights into the peopling of the Southwest Pacific. Nature 538, 510–513 (2016).
Lipson, M. et al. Reconstructing Austronesian population history in island Southeast Asia. Nat. Commun. 5, 4689 (2014).
Bellwood, P. The checkered prehistory of rice movement southwards as a domesticated cereal—from the Yangzi to the equator. Rice 4, 93–103 (2011).
Yang, X. et al. Early millet use in northern China. Proc. Natl Acad. Sci. USA 109, 3726–3730 (2012).
Wilkin, S. et al. Dairy pastoralism sustained eastern Eurasian steppe populations for 5,000 years. Nat. Ecol. Evol. 4, 346–355 (2020).
Kovalev, A. The great migration of the Chemurchek people from France to the Altai in the early 3rd millennium bce. Int. J. Eurasian Stud. 1, 1–58 (2011).
Jeong, C. et al. A dynamic 6,000-year genetic history of Eurasia’s Eastern Steppe. Cell 183, 890–904 (2020).
Ning, C. et al. Ancient genomes reveal Yamnaya-related ancestry and a potential source of Indo-European speakers in Iron Age Tianshan. Curr. Biol. 29, 2526–2532 (2019).
Bellwood, P. in The Encyclopedia of Global Human Migration (Wiley-Blackwell, 2013).
Mallory, J. P. in Search of the Indo-Europeans: Language, Archaeology and Myth (Thames & Hudson, 1991).
Anthony, D. The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World (Princeton Univ. Press, 2007).
Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).
Wang, K., Mathieson, I., O’Connell, J. & Schiffels, S. Tracking human population structure through time from whole genome sequences. PLoS Genet. 16, e1008552 (2020).
Jeong, C. et al. Long-term genetic stability and a high-altitude East Asian origin for the peoples of the high valleys of the Himalayan arc. Proc. Natl Acad. Sci. USA. 113, 7485–7490 (2016).
Pinhasi, R., Fernandes, D. M., Sirak, K. & Cheronet, O. Isolating the human cochlea to generate bone powder for ancient DNA analysis. Nat. Protocols 14, 1194–1205 (2019).
Sirak, K. A. et al. A minimally-invasive method for sampling human petrous bones from the cranial base for ancient DNA analysis. Biotechniques 62, 283–289 (2017).
Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, 15758–15763 (2013).
Korlević, P. et al. Reducing microbial and human contamination in DNA extractions from ancient bones and teeth. Biotechniques 59, 87–93 (2015).
Rohland, N., Glocke, I., Aximu-Petri, A. & Meyer, M. Extraction of highly degraded DNA from ancient bones, teeth and sediments for high-throughput sequencing. Nat. Protocols 13, 2447–2461 (2018).
Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. & Reich, D. Partial uracil-DNA-glycosylase treatment for screening of ancient DNA. Phil. Trans. R. Soc. Lond. B 370, 20130624 (2015).
Gansauge, M. T. & Meyer, M. Selective enrichment of damaged DNA molecules for ancient genome sequencing. Genome Res. 24, 1543–1549 (2014).
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Maricic, T., Whitten, M. & Pääbo, S. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS ONE 5, e14004 (2010).
Rohland, N. & Hofreiter, M. Ancient DNA extraction from bones and teeth. Nat. Protocols 2, 1756–1762 (2007).
John, J. S. SeqPrep. GitHub https://github.com/jstjohn/SeqPrep (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Behar, D. M. et al. A “Copernican” reassessment of the human mitochondrial DNA tree from its root. Am. J. Hum. Genet. 90, 675–684 (2012).
Weissensteiner, H. et al. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44, W58–W63 (2016).
Günther, T. & Nettelblad, C. The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet. 15, e1008302 (2019).
Kennett, D. J. et al. Archaeogenomic evidence reveals prehistoric matrilineal dynasty. Nat. Commun. 8, 14115 (2017).
Lohse, J. C., Madsen, D. B., Culleton, B. J. & Kennett, D. J. Isotope paleoecology of episodic mid-to-late Holocene bison population expansions in the southern Plains, U.S.A. Quat. Sci. Rev. 102, 14–26 (2014).
Reimer, P. J. et al. The IntCal20 Northern Hemisphere radiocarbon age calibration curve (0–55 cal kBP). Radiocarbon 62, 725–757 (2020).
Bronk Ramsey, C. Bayesian analysis of radiocarbon dates. Radiocarbon 51, 337–360 (2009).
Rasmussen, M. et al. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334, 94–98 (2011).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Busing, F. T. A., Meijer, E. & van der Leeden, R. Delete-m jackknife for unequal m. Stat. Comput. 9, 3–8 (1999).
Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).
Monroy Kuhn, J. M., Jakobsson, M. & Günther, T. Estimating genetic kin relationships in prehistoric populations. PLoS ONE 13, e0195491 (2018).
Ringbauer, H., Novembre, J. & Steinruecken, M. Human parental relatedness through time — detecting runs of homozygosity in ancient DNA. Preprint at bioRxiv https://doi.org/10.1101/2020.05.31.126912 (2020).
Acknowledgements
We thank D. Anthony, O. Bar-Yosef, K. Brunson, R. Flad, P. Flegontov, Q. Fu, W. Haak, I. Lazaridis, M. Lipson, I. Mathieson, R. Meadow, I. Olalde, N. Patterson, P. Skoglund, D. Xu, P. Bellwood and C. Chiang for comments; N. Saitou and the Asian DNA Repository Consortium for sharing genotype data from present-day Japanese groups; T. Nishimoto and T. Fujisawa from the Rebun Town Board of Education for sharing the Funadomari Jomon samples, and H. Tanaka and W. Nagahara from the Archeological Center of Chiba City, who are excavators of the Rokutsu Jomon site. The excavations at Boisman-2 site (Boisman culture), the Pospelovo-1 site (Yankovsky culture) and the Roshino-4 site (Heishui Mohe culture) were funded by the Far Eastern Federal University and the Institute of History, Archaeology and Ethnology Far Eastern Branch of the Russian Academy of Sciences; research on Pospelovo-1 is funded by RFBR project number 18-09-40101. C.-C.W. was funded by the Max Planck Society, the National Natural Science Foundation of China (NSFC 31801040), the Nanqiang Outstanding Young Talents Program of Xiamen University (X2123302), the Major project of National Social Science Foundation of China (20&ZD248), a European Research Council (ERC) grant to D. Xu (ERC-2019-ADG-883700-TRAM) and Fundamental Research Funds for the Central Universities (ZK1144). H.M. was supported by grant JSPS 16H02527. M.R. and C.-C.W. received funding from the ERC under the European Union’s Horizon 2020 research and innovation program (grant no. 646612) to M.R. H. Li was funded NSFC (91731303, 31671297), B&R International Joint Laboratory of Eurasian Anthropology (18490750300). J.K. was funded by DFG grant KR 4015/1-1, the Baden Württemberg Foundation and the Max Planck Institute. Accelerator Mass Spectrometry radiocarbon dating work was supported by the National Science Foundation (NSF) (BCS-1460369) to D.J.K. and B.J.C. D.R. was funded by NSF grant BCS-1032255, NIH (NIGMS) grant GM100233, the Paul M. Allen Frontiers Group, John Templeton Foundation grant 61220, a gift from J.-F. Clin and the Howard Hughes Medical Institute.
Author information
Authors and Affiliations
Contributions
C.-C.W., H.-Y.Y., A.N.P., H.M., A.M.K., L.J., H. Li, J.K., R.P. and D.R. conceptualized the study. C.-C.W., R.B., M. Mah, S.M., Z.Z., B.J.C. and D.R. carried out the formal analysis; C.-C.W., K. Sirak, O.C., A.K., N.R., A.M.K., M. Mah, S.M., K.W., N.A., N.B., K.C., F.C., K.S.D.C., B.J.C., L.E., S.F., D.K., A.M.L., K.M., M. Michel, J.O., K.T.O., K. Stewardson, S.W., S.Y., F.Z., J.G., Q.D., L.K., Dawei Li, Dongna Li, R.L., W.C., N., R.S., L.-X.W., L.W., G.X., H.Y., M.Z., G.H., X.Y., R.H., S.S., D.J.K., L.J., H. Li, J.K., R.P. and D.R. carried out the investigation. H.-Y.Y., A.N.P., R.B., D.T., J.Z., Y.-C.L., J.-Y.L., M. Mah, S.M., Z.Z., R.C., H. Looh, C.-J.H., C.-C.S., Y.G.N., A.V.T., A.A.T., S.L., Z.-Y.S., X.-M.W., T.-L.Y., X.H., L.C., H.D., J.B., E. Mijiddorj, D.E., T.-O.I., E. Myagmar, H.K.-K., M.N., K.-i.S., O.A.S., D.J.K., R.P. and D.R. provided resources. C.-C.W., K. Sirak, O.C., A.K., N.R., R.B., M. Mah, S.M., B.J.C., L.E., A.A.T. and D.R. curated the data. C.-C.W., H.-Y.Y., A.N.P., H.M., A.K. and D.R. wrote the paper. C.-C.W., H.-Q.Z., N.R., M.R., S.S., D.J.K., L.J., H. Li, J.K., R.P. and D.R. supervised the study.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature thanks Peter Bellwood, Charleston Chiang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 PCA of ancient samples.
Projection of ancient samples onto PCA dimensions 1 and 2 defined by East Asian, European, Siberian and Native American populations.
Extended Data Fig. 2 PCA of present-day samples.
a, PCA dimensions 1 and 2 defined by present-day East Asian, European, Siberian and Native American populations. b, PCA dimensions 1 and 2 defined by present-day East Asian groups with little West Eurasian mixture.
Extended Data Fig. 3 Neighbour-joining tree of present-day East Eurasian individuals using the human origin dataset.
a, Neighbour-joining tree of present-day East Eurasian individuals based on FST distances using the human origin dataset. The branch length is shown in FST distance. b, Neighbour-joining tree of present-day East Eurasian individuals in which internal branches are all shown with the same branch length for better visualization.
Extended Data Fig. 4 Admixture plot at K = 15 using the human origin dataset.
a–f, We grouped the populations roughly into six groups based on geographical and genetic affinity. a, Populations mainly from Africa (yellow), America (magenta), West Eurasia (dark green and light brown) and Oceania (light magenta). b, Populations mainly from Mongolia (blue) and Siberia (purple). c, Populations mainly from southern China and Southeast Asia (light blue). d, Populations mainly from the Tibetan Plateau (olive) and Neolithic Yellow River Basin (red). e, Mainly Han Chinese groups from China (light blue and red). f, Populations mainly from the Amur River Basin (blue and red) and northeast Asia.
Extended Data Fig. 5 Estimates of population split times.
a, Cross-coalescence rates for selected population pairs. We ran MSMC for four pairs of populations: Tibetan–Ami, Tibetan–Atayal, Tibetan–Ulchi and Tibetan–Mixe. We used one individual from each population in this analysis. The modern genomic data for those individuals are from the Simons Genome Diversity Project. The times are calculated based on the mutation rate and generation time specified on the x axis. b, Cross-coalescence rates for selected population pairs. The same analysis as shown in a but using MSMC2 instead of MSMC, and using two individuals per population except for the Tibetan–Atayal pair, for which we used only one.
Extended Data Fig. 6 Admixture graph model.
This figure is the same as Fig. 2 except we show the fitted genetic drifts on each lineage. We used all available sites in the dataset comprising 1,237,207 SNPs, restricting to transversions only to confirm that the same model fit (Supplementary Information section 3). We started with a skeleton tree that fits the data for Denisovan, Mbuti, Onge, Tianyuan and Luxembourg Loschbour and one admixture event. We grafted on Mongolia East Neolithic, Late Neolithic farmers from the Upper Yellow River, Liangdao 2, Japan Jomon, Nepal Chokhopani, Taiwan Hanben and Late Neolithic farmers from the West Liao River in turn, adding them consecutively to all possible edges in the tree and retaining only graph solutions that provided no differences of |Z| < 3 between fitted and estimated statistics (maximum |Z| = 2.95 here). We used the MSMC and MSMC2 relative population split time estimates to constrain models. Deep splits are not well constrained because of the minimal availability of data on East Asian populations from the Upper Paleolithic. a, Locations and dates of the East Asian individuals used in model fitting, with colours indicating whether the majority ancestry is from the hypothesized coastal expansion (green), interior expansion south (red) and interior expansion north (blue). The map is based on the ‘Google Map Layer’ from ArcGIS Online Basemaps (map data ©2020 Google). The grey circles represent sampled populations and white circles represent unsampled hypothesized nodes. b, In the model visualization, we colour lineages modelled as deriving entirely from one of these expansions, and also colour populations according to ancestry proportions. Dashed lines represent admixture (proportions are marked), and we show the amount of genetic drift on each lineage in units of FST × 1,000.
Extended Data Fig. 7 Shared genetic drift among Tibetan groups, measured by f3(X, Y; Mbuti).
Lighter colours indicate more shared drift. Lahu groups with the Southeast Asian cluster probably due to substantial admixture. The Tibetan_Yajiang are geographically in the Tibeto-Burman Corridor but group with Core Tibetan individuals, presumably reflecting less genetic admixture from people of the Southeast Asian cluster.
Supplementary information
Supplementary Information
This Supplementary Information file contains an Ethics Statement, Supplementary Information sections 1-4 including, 15 Supplementary Figures, 5 Supplementary Tables and Supplementary References. The supplementary figures and tables provide information on the genetic structure and population history of East Asians.
Supplementary Tables
This zipped file contains 26 Supplementary Tables and a table guide.
Supplementary Data
Genotypes of the newly reported 166 ancient individuals.
Rights and permissions
About this article
Cite this article
Wang, CC., Yeh, HY., Popov, A.N. et al. Genomic insights into the formation of human populations in East Asia. Nature 591, 413–419 (2021). https://doi.org/10.1038/s41586-021-03336-2
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41586-021-03336-2
This article is cited by
-
YHSeqY3000 panel captures all founding lineages in the Chinese paternal genomic diversity database
BMC Biology (2025)
-
Exploring Y-chromosomal STRs and SNPs for forensic and genetic insights in the Jiangsu Han population
BMC Genomics (2025)
-
Ancient DNA analysis of elite nomadic warrior from Chinge-Tey I funerary commemorative complex in the “Valley of the Kings”, Tuva
BMC Genomics (2025)
-
Ancient genomes in Southwest China revealed genetic interactions among diverse populations in the historical period
BMC Biology (2025)
-
The demic expansion of Yangshao culture inferred from ancient human genomes
BMC Biology (2025)
Tao Zhang
At first, thank you very much to the authors and co-authors for this study. It is good to see that more and more studies about the peopling of East Asian (and Eastern-Eurasia generally) are being published.
However I have serious concerns regarding some arguments and results in this study. This includes the hypothetical clade "Andamanese/Hoabinhians with ancient Tibetans and Jomon period Japanese". As I can see in the preprint, this clade is largely based on the findings of McColl et al. 2018, which proposed a partial shared ancestry between the Jomon period people (samplified by one sinlge sample IK002 "Ikawazu Jomon") and Hoabinhians/Onge as well as the unusual link to haplogroup D (D1/D-M174), taking into account that this haplogroup has an estimated age of 60,000 years, with the Japanese lineage D1a2/D-M55 53,000 years. McColl et al. 2018 estimated the IK002 sample having about 44% Onge/Hoabinhian like ancestry and 56% East Asian like ancestry (identical to the study resutls here). However more recent studies such as Gakuhari et al. 2019 (published in 2020) rejected the conclusions of McColl et al. 2018 and did not find any evidence for a shared ancestry between the Ikawazu Jomon sample IK002 and Onge/Hoabinhians. However, they found evidence for geneflow between Jomon period people with coastal East Asians in Northeast and Southeast Asia, with up to 41% geneflow in the Ami and Atayal ancestors (Native Taiwanese). (See:https://www.nature.com/arti...
Further studies, such as Boer et al. 2020 similarly rejected McColl et al. 2018 findings and concluded that there is no shared ancestry between the Ikawazu Jomon (IK002) and two other Jomon samples from Hokkaido and Honshu (such as Sanganji Jomon sample), with "basal Asians" such as Hoabinhians or Tianyuan. (See:https://www.cambridge.org/c...
Citing Boer et al. 2020 :
Another study by Yang et al. 2020 expanded on the findings of Boer et al. 2020, similarly rejecting a close genetic relationship between Jomon and Onge/Hoabinhians. See: (https://science.sciencemag....
The question is now, why did the authors ignore these new findings and took the "early stage" data from McColl et al. 2018 for granted? In the paper it does not seem that these data got tested for accuracy?
This problem and possible error is affecting the data results of the whole study. Did the authors take into accound aditional migrations and admixture currently not detected? Why the Andamanese get counted as unmixed proxy? There were several hints that the Andamanese recived admixture from the mainland (several times). Citing Chaubey et al. 2015, the Andamanese (samplified by the Onge) have 32% East Asian like ancestry. Citing Mondal et al. 2017, the Onge paternal lineage D1a3 (D1a2b) splitted from a Tibeto-Burmese tribe of Northeast India (Riang/Tripuri) about 7,000 years ago. How did this clade come into the Andamanese/Onge? Why do the Great Andamanese have different paternal lineages such as P, K, M and S? D is common in East Asian Highlanders and Japanese (suggested to be a Jomon lineage, but still debated) and strangely in Andamanes. However other Hoabinhian like people lack haplogroup D but have K, F, P, M or S, found in other Negrito and Melanesion populations of Insular Southeast Asia and Oceania.
The conclusions made by the authors to link haplogroup D to the Andamanese/Hoabinhian clade seems rather unlikely. The diversity of D clades (to be specific D1 clades) point to a northern route and dispersal (i.e. Altai route and following dispersal into the regions of East Asia). See this map as example:
Haplogroup D1 (D-M174) has an age of about 60,000 years and originated in Central Asia and diverged between Tibet and the Altai mountains (Hammer et al.). The Japanese subclade D1a2 or D-M55 splitted from other D1 clades about 53,000 years ago (Mondal et al. 2017).
Citing Mondal et al. 2017:
How do the authors explain their suggestion linking haplogroup D to a hypothetical clade uniting all populations which carry haplogroup D? Do they suggest that Aboriginal Australians and Mongolians also form one clade based on haplogroup C? Are Europeans with haplogroup R (P1) closer to Papuans with haplogroup M/S than pre-Indo-European Europeans with lineages such as I or J? I thought it was already generally accepted that genetic ancestry should not be based solely on haplogroups?
Boer et al. 2020 found evidence for geneflow of an divergent East Asian population into both the Jomon and less into ancient Southeast Asians. This geneflow can be better linked to the spread of haplogroup D (D1 and its subclades). Did the authors take into account geneflow into the Andamanese? Taking them as a 100% proxy seems rather unusual and unlikely. Especially after the findings of Chaubey et al. 2015 with about 32% East Asian admixture (samplified by Han) in the Andamanese (samplified by Onge).
It would be great if the authors can comment on this point. I have written this rather fast and hope that a reviewer will take a deeper look onto this and may help to improve this paper.
Nevertheless, thank you and good luck!