Genome diversity and signatures of natural selection in mainland Southeast Asia

He, Yaoxi; Zhang, Xiaoming; Peng, Min-Sheng; Li, Yu-Chun; Liu, Kai; Zhang, Yu; Mao, Leyan; Guo, Yongbo; Ma, Yujie; Zhou, Bin; Zheng, Wangshan; Yue, Tian; Liao, Yuwen; Liang, Shen-Ao; Chen, Lu; Zhang, Weijie; Chen, Xiaoning; Tang, Bixia; Yang, Xiaofei; Ye, Kai; Gao, Shenghan; Lu, Yurun; Wang, Yong; Wan, Shijie; Hao, Rushan; Wang, Xuankai; Mao, Yafei; Dai, Shanshan; Gao, Zongliang; Yang, Li-Qin; Guo, Jianxin; Li, Jiangguo; Liu, Chao; Wang, Jianhua; Sovannary, Tuot; Bunnath, Long; Kampuansai, Jatupol; Inta, Angkhana; Srikummool, Metawee; Kutanan, Wibhu; Ho, Huy Quang; Pham, Khoa Dang; Singthong, Sommay; Sochampa, Somphad; Kyaing, U. Win; Pongamornkul, Wittaya; Morlaeku, Chutima; Rattanakrajangsri, Kittisak; Kong, Qing-Peng; Zhang, Ya-Ping; Su, Bing

doi:10.1038/s41586-025-08998-w

Article
Published: 14 May 2025

Genome diversity and signatures of natural selection in mainland Southeast Asia

Yaoxi He ORCID: orcid.org/0000-0003-3324-3239^1,2^na1,
Xiaoming Zhang ORCID: orcid.org/0000-0003-3139-2361^1,2^na1,
Min-Sheng Peng^1,3,4^na1,
Yu-Chun Li^1,3,5^na1,
Kai Liu^1,4^na1,
Yu Zhang^1,4,
Leyan Mao^1,4,
Yongbo Guo¹,
Yujie Ma^1,4,
Bin Zhou^1,4,
Wangshan Zheng^1,4,
Tian Yue^1,4,
Yuwen Liao^1,4,
Shen-Ao Liang⁶,
Lu Chen⁶,
Weijie Zhang^1,4,
Xiaoning Chen⁷,
Bixia Tang⁷,
Xiaofei Yang^8,9,10,
Kai Ye ORCID: orcid.org/0000-0002-2851-6741^{9,10,11,12,13,14},
Shenghan Gao^8,9,11,
Yurun Lu ORCID: orcid.org/0000-0003-0402-8927¹⁵,
Yong Wang ORCID: orcid.org/0000-0003-0695-5273¹⁵,
Shijie Wan ORCID: orcid.org/0009-0005-0834-892X⁸,
Rushan Hao¹⁶,
Xuankai Wang¹⁷,
Yafei Mao ORCID: orcid.org/0000-0002-9648-4278^17,18,
Shanshan Dai¹,
Zongliang Gao^1,3,4,5,
Li-Qin Yang^1,2,5,
Jianxin Guo¹,
Jiangguo Li^1,4,
Chao Liu^19,20,
Jianhua Wang²¹,
Tuot Sovannary²²,
Long Bunnath²²,
Jatupol Kampuansai ORCID: orcid.org/0000-0003-4687-104X²³,
Angkhana Inta²³,
Metawee Srikummool²⁴,
Wibhu Kutanan²⁵,
Huy Quang Ho²⁶,
Khoa Dang Pham²⁶,
Sommay Singthong²⁷,
Somphad Sochampa²⁸,
U. Win Kyaing²⁹,
Wittaya Pongamornkul³⁰,
Chutima Morlaeku³¹,
Kittisak Rattanakrajangsri³²,
Consortium of Anthropological Research in Southeast Asia and Southwest China (CASEAC),
Qing-Peng Kong ORCID: orcid.org/0000-0002-6046-4494^1,3,5,33,
Ya-Ping Zhang ORCID: orcid.org/0000-0002-5401-1114^1,3,4,34 &
…
Bing Su ORCID: orcid.org/0000-0002-4379-9014^1,2,33

Nature volume 643, pages 417–426 (2025)Cite this article

12k Accesses
8 Citations
75 Altmetric
Metrics details

Subjects

Abstract

Mainland Southeast Asia (MSEA) has rich ethnic and cultural diversity with a population of nearly 300 million^1,2. However, people from MSEA are underrepresented in the current human genomic databases. Here we present the SEA3K genome dataset (phase I), generated by deep short-read whole-genome sequencing of 3,023 individuals from 30 MSEA populations, and long-read whole-genome sequencing of 37 representative individuals. We identified 79.59 million small variants and 96,384 structural variants, among which 22.83 million small variants and 24,622 structural variants are unique to this dataset. We observed a high genetic heterogeneity across MSEA populations, reflected by the varied combinations of genetic components. We identified 44 genomic regions with strong signatures of Darwinian positive selection, covering 89 genes involved in varied physiological systems such as physical traits and immune response. Furthermore, we observed varied patterns of archaic Denisovan introgression in MSEA populations, supporting the proposal of at least two distinct instances of Denisovan admixture into modern humans in Asia³. We also detected genomic regions that suggest adaptive archaic introgressions in MSEA populations. The large number of novel genomic variants in MSEA populations highlight the necessity of studying regional populations that can help answer key questions related to prehistory, genetic adaptation and complex diseases.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Statistics of the SEA3K genomic variants.**

**Fig. 2: SV discovery based on the long-read genome data of 37 MSEA individuals.**

**Fig. 3: Genetic structure and population history of MSEA populations.**

**Fig. 4: Genomic signals of positive selection in MSEA populations.**

**Fig. 5: The landscape of archaic introgression in MSEA populations.**

The landscape of genomic structural variation in Indigenous Australians

Article Open access 13 December 2023

Near-complete Middle Eastern genomes refine autozygosity and enhance disease-causing and population-specific variant discovery

Article Open access 05 May 2025

Exploring the genomic population structure and history of Austroasiatic speakers in Mainland Southeast Asia

Article Open access 16 January 2026

Data availability

Datasets generated in this study were deposited in public repositories. WGS data are archived at the Genome Sequence Archive under the accession HRA007135. Genome assemblies are archived at Genome Warehouse (GWH) under the accession PRJCA028104. Variant data are archived at Genome Variation Map under the accession number GVM000730. To protect participant confidentiality, the raw sequencing data are available to the scientific community for general research through a controlled access process. Access can be requested by submitting an application that includes a detailed research proposal and an IRB approval from the applicant’s home institute to the Data Access Committee of Kunming Institute of Zoology, Chinese Academy of Sciences (KIZ, CAS). All other data are open access. Datasets obtained from publicly available sources include: human reference genome GRCh38 (https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa), human reference genome T2T-CHM13 (https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz), 1KGP phase 3 dataset (https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), high-coverage 1KGP dataset (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_raw_GT_with_annot/), HGDP dataset (https://rosenberglab.stanford.edu/data/conradEtAl2006/data1_1.tar.gz), SGDP dataset (https://sharehost.hms.harvard.edu/genetics/reich_lab/sgdp/vcf_variants/vcfs.variants.public_samples.279samples.tar), HGSVC3 (freeze4) (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/working/20240415_Freeze4/), HPRC (v1.1) (https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.vcfbub.a100k.wave.vcf.gz), Genome of Altai Neanderthal and Denisovan (https://www.eva.mpg.de/genetics/genome-projects/), RefSeq genes (https://www.cog-genomics.org/static/bin/plink/glist-hg38), GWAS summary data (https://www.ebi.ac.uk/gwas/api/search/downloads/full), cCREs from ENCODE (https://downloads.wenglab.org/V3/GRCh38-cCREs.bed), ClinVar, https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/), OMIM database (https://www.ncbi.nlm.nih.gov/omim/) and eQTLs from GTEx (https://storage.googleapis.com/adult-gtex/bulk-qtl/v8/single-tissue-cis-qtl/GTEx_Analysis_v8_eQTL.tar).

References

Jin, L., Seielstad, M. & Xiao, C. Genetic, Linguistic and Archaeological Perspectives on Human Diversity in Southeast Asia (World Scientific, 2001).
Glover, I. & Bellwood, P. S. Southeast Asia: From Prehistory to History (Routledge Curzon, 2004).
Browning, S. R., Browning, B. L., Zhou, Y., Tucci, S. & Akey, J. M. Analysis of human sequence data reveals two pulses of archaic Denisovan admixture. Cell 173, 53–61.e59 (2018).
Article CAS PubMed PubMed Central Google Scholar
Su, B. et al. Y-chromosome evidence for a northward migration of modern humans into Eastern Asia during the last Ice Age. Am. J. Hum. Genet. 65, 1718–1724 (1999).
Article CAS PubMed PubMed Central Google Scholar
Hallast, P., Agdzhoyan, A., Balanovsky, O., Xue, Y. & Tyler-Smith, C. A Southeast Asian origin for present-day non-African human Y chromosomes. Hum. Genet. 140, 299–307 (2021).
Article CAS PubMed Google Scholar
Kutanan, W. et al. Reconstructing the human genetic history of mainland Southeast Asia: insights from genome-wide data from Thailand and Laos. Mol. Biol. Evol. 38, 3459–3477 (2021).
Article CAS PubMed PubMed Central Google Scholar
Duong, N. T. et al. Complete human mtDNA genome sequences from Vietnam and the phylogeography of mainland Southeast Asia. Sci. Rep. 8, 11651 (2018).
Article PubMed PubMed Central Google Scholar
Li, Y. C. et al. Ancient inland human dispersals from Myanmar into interior East Asia since the Late Pleistocene. Sci. Rep. 5, 9473 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X. et al. Analysis of mitochondrial genome diversity identifies new and ancient maternal lineages in Cambodian aborigines. Nat. Commun. 4, 2599 (2013).
Article PubMed Google Scholar
Abdulla, M. A. et al. Mapping human genetic diversity in Asia. Science 326, 1541–1545 (2009).
Article CAS PubMed Google Scholar
Peng, M. S. et al. Tracing the Austronesian footprint in mainland Southeast Asia: a perspective from mitochondrial DNA. Mol. Biol. Evol. 27, 2417–2430 (2010).
Article CAS PubMed Google Scholar
Deng, L. et al. Genetic connections and convergent evolution of tropical Indigenous peoples in Asia. Mol. Biol. Evol. 39, msab361 (2022).
Article CAS PubMed Google Scholar
Tucci, S. et al. Evolutionary history and adaptation of a human pygmy population of Flores Island, Indonesia. Science 361, 511–516 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, X. et al. The distinct morphological phenotypes of Southeast Asian aborigines are shaped by novel mechanisms for adaptation to tropical rainforests. Natl Sci. Rev. 9, nwab072 (2022).
Article CAS PubMed Google Scholar
Dhir, R. K., Cattaneo, U., Ormaza, M. V. C., Coronado, H. & Oelz, M. Implementing the ILO Indigenous and Tribal Peoples Convention No. 169: Towards an Inclusive, Sustainable and Just Future (International Labour Organization, 2020).
Wong, L. P. et al. Deep whole-genome sequencing of 100 southeast Asian Malays. Am. J. Hum. Genet. 92, 52–66 (2013).
Article CAS PubMed PubMed Central Google Scholar
Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e3419 (2022).
Article CAS PubMed PubMed Central Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article PubMed Google Scholar
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Article CAS PubMed PubMed Central Google Scholar
GenomeAsia, K. C. The GenomeAsia 100K Project enables genetic discoveries across Asia. Nature 576, 106–111 (2019).
Article Google Scholar
Wu, D. et al. Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell 179, 736–749.e715 (2019).
Article CAS PubMed Google Scholar
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Logsdon, G. A. et al. Complex genetic variation in nearly complete human genomes. Preprint at bioRxiv https://doi.org/10.1101/2024.09.24.614721 (2024).
Liao, W. W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wang, C., Zollner, S. & Rosenberg, N. A. A quantitative comparison of the similarity between genes and geography in worldwide human populations. PLoS Genet. 8, e1002886 (2012).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Lipson, M. et al. Ancient genomes document multiple waves of migration in Southeast Asian prehistory. Science 361, 92–95 (2018).
Article CAS PubMed PubMed Central Google Scholar
McColl, H. et al. The prehistoric peopling of Southeast Asia. Science 361, 88–92 (2018).
Article CAS PubMed Google Scholar
Liu, D. et al. Extensive ethnolinguistic diversity in Vietnam reflects multiple sources of genetic diversity. Mol. Biol. Evol. 37, 2503–2519 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lawson, D. J., van Dorp, L. & Falush, D. A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. Nat. Commun. 9, 3258 (2018).
Article PubMed PubMed Central Google Scholar
Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).
Article CAS PubMed PubMed Central Google Scholar
Schiffels, S. & Wang, K. MSMC and MSMC2: the multiple sequentially Markovian coalescent. Methods Mol. Biol. 2090, 147–166 (2020).
Article PubMed Google Scholar
Terhorst, J., Kamm, J. A. & Song, Y. S. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat. Genet. 49, 303–309 (2017).
Article CAS PubMed Google Scholar
Grossman, S. R. et al. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 327, 883–886 (2010).
Article CAS PubMed Google Scholar
Luo, H. et al. Recent positive selection signatures reveal phenotypic evolution in the Han Chinese population. Sci. Bull. 68, 2391–2404 (2023).
Article CAS Google Scholar
Zheng, W. et al. Large-scale genome sequencing redefines the genetic footprints of high-altitude adaptation in Tibetans. Genome Biol. 24, 73 (2023).
Article CAS PubMed PubMed Central Google Scholar
Liu, X. et al. Decoding triancestral origins, archaic introgression, and natural selection in the Japanese population by whole-genome sequencing. Sci. Adv. 10, eadi8419 (2024).
Article CAS PubMed PubMed Central Google Scholar
Lo, Y. H. et al. Detecting genetic ancestry and adaptation in the Taiwanese Han people. Mol. Biol. Evol. 38, 4149–4165 (2021).
Article CAS PubMed Google Scholar
Chen, L., Wolf, A. B., Fu, W., Li, L. & Akey, J. M. Identifying and interpreting apparent Neanderthal ancestry in African individuals. Cell 180, 677–687.e616 (2020).
Article CAS PubMed Google Scholar
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Article CAS PubMed PubMed Central Google Scholar
Springelkamp, H. et al. ARHGEF12 influences the risk of glaucoma by increasing intraocular pressure. Hum. Mol. Genet. 24, 2689–2699 (2015).
Article CAS PubMed Google Scholar
Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 104, 65–75 (2019).
Article CAS PubMed Google Scholar
Choudhury, A. et al. High-depth African genomes inform human migration and health. Nature 586, 741–748 (2020).
Article CAS PubMed PubMed Central Google Scholar
Efremov, G. D. et al. Hb Icaria–Hb H disease: identification of the Hb Icaria mutation through analysis of amplified DNA. Br. J. Haematol. 75, 250–253 (1990).
Article CAS PubMed Google Scholar
Vlok, M. et al. Forager and farmer evolutionary adaptations to malaria evidenced by 7000 years of thalassemia in Southeast Asia. Sci. Rep. 11, 5677 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
Article CAS PubMed PubMed Central Google Scholar
Larena, M. et al. Multiple migrations to the Philippines during the last 50,000 years. Proc. Natl Acad. Sci. USA 118, e2026132118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Karmin, M. et al. Episodes of diversification and isolation in island Southeast Asian and near Oceanian male lineages. Mol. Biol. Evol. 39, msac045 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fan, S., Hansen, M. E., Lo, Y. & Tishkoff, S. A. Going global by adapting local: a review of recent human adaptation. Science 354, 54–59 (2016).
Article CAS PubMed PubMed Central Google Scholar
Sankararaman, S., Mallick, S., Patterson, N. & Reich, D. The combined landscape of Denisovan and Neanderthal ancestry in present-day humans. Curr. Biol. 26, 1241–1247 (2016).
Article CAS PubMed PubMed Central Google Scholar
Barnes, R. H., Gray, A. & Kingsbury, B. Indigenous Peoples of Asia (Association for Asian Studies, 1995).
Taylor, P. M. in World Bank Inspection Panel. Investigation Report (March 30, 2006): Cambodia: Forest Concession Management and Control Pilot Project 128–141 (World Bank, 2006).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central Google Scholar
Van der Auwera, G. A. & O’Connor, B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (O’Reilly Media, 2020).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Cingolani, P. et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet. 3, 35 (2012).
Article PubMed PubMed Central Google Scholar
Bergström, A. et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 367, eaay5012 (2020).
Article PubMed PubMed Central Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, 80–92 (2012).
Article CAS PubMed PubMed Central Google Scholar
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
Article PubMed PubMed Central Google Scholar
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mondal, M. et al. Genomic analysis of Andamanese provides insights into ancient human migration into Asia and adaptation. Nat. Genet. 48, 1066–1070 (2016).
Article CAS PubMed Google Scholar
Lu, D. et al. Ancestral origins and genetic history of Tibetan highlanders. Am. J. Hum. Genet. 99, 580–594 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Wang, C. et al. Comparing spatial maps of human population-genetic variation using Procrustes analysis. Stat. Appl. Genet. Mol. Biol. 9, 13 (2010).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
Article CAS PubMed PubMed Central Google Scholar
Bhatia, G., Patterson, N., Sankararaman, S. & Price, A. L. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23, 1514–1521 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis version 11. Mol. Biol. Evol. 38, 3022–3027 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yu, G. Using ggtree to visualize data on tree-like structures. Curr. Protoc. Bioinformatics 69, e96 (2020).
Article PubMed Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J. 8, 289–317 (2016).
Article PubMed PubMed Central Google Scholar
Zhang, C., Dong, S. S., Xu, J. Y., He, W. M. & Yang, T. L. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 35, 1786–1788 (2019).
Article CAS PubMed Google Scholar
Wang, J., Raskin, L., Samuels, D. C., Shyr, Y. & Guo, Y. Genome measures used for quality control are dependent on gene function and ancestry. Bioinformatics 31, 318–323 (2015).
Article PubMed Google Scholar
Weissensteiner, H. et al. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44, W58–W63 (2016).
Article CAS PubMed PubMed Central Google Scholar
van Oven, M. & Kayser, M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum. Mutat. 30, E386–E394 (2009).
Article PubMed Google Scholar
Chen, H., Lu, Y., Lu, D. & Xu, S. Y-LineageTracker: a high-throughput analysis framework for Y-chromosomal next-generation sequencing data. BMC Bioinformatics 22, 114 (2021).
Article PubMed PubMed Central Google Scholar
Y Chromosome Consortium. A nomenclature system for the tree of human Y-chromosomal binary haplogroups. Genome Res. 12, 339–348 (2002).
Article Google Scholar
Schiffels, S. & Durbin, R. Inferring human population size and separation history from multiple genome sequences. Nat. Genet. 46, 919–925 (2014).
Article CAS PubMed PubMed Central Google Scholar
Sellinger, T. P. P., Abu-Awad, D. & Tellier, A. Limits and convergence properties of the sequentially Markovian coalescent. Mol. Ecol. Resour. 21, 2231–2248 (2021).
Article PubMed Google Scholar
Patton, A. H. et al. Contemporary demographic reconstruction methods are robust to genome assembly quality: a case study in Tasmanian devils. Mol. Biol. Evol. 36, 2906–2921 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hu, W. et al. Genomic inference of a severe human bottleneck during the Early to Middle Pleistocene transition. Science 381, 979–984 (2023).
Article CAS PubMed Google Scholar
Yi, X. et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science 329, 75–78 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).
Article CAS PubMed PubMed Central Google Scholar
Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
Article PubMed PubMed Central Google Scholar
Szpiech, Z. A. & Hernandez, R. D. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol. Biol. Evol. 31, 2824–2827 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zerbino, D. R., Wilder, S. P., Johnson, N., Juettemann, T. & Flicek, P. R. The ensembl regulatory build. Genome Biol. 16, 56 (2015).
Article PubMed PubMed Central Google Scholar
The Encode Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article PubMed Central Google Scholar
Romanoski, C. E., Glass, C. K., Stunnenberg, H. G., Wilson, L. & Almouzni, G. Epigenomics: roadmap for regulation. Nature 518, 314–316 (2015).
Article CAS PubMed Google Scholar
Adams, D. et al. BLUEPRINT to decode the epigenetic signature written in blood. Nat. Biotechnol. 30, 224–226 (2012).
Article CAS PubMed Google Scholar
Turchin, M. C. et al. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nat. Genet. 44, 1015–1019 (2012).
Article CAS PubMed PubMed Central Google Scholar
Racimo, F., Berg, J. J. & Pickrell, J. K. Detecting polygenic adaptation in admixture graphs. Genetics 208, 1565–1584 (2018).
Article PubMed PubMed Central Google Scholar
Berg, J. J. & Coop, G. A population genetic signal of polygenic adaptation. PLoS Genet. 10, e1004412 (2014).
Article PubMed PubMed Central Google Scholar
Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chen, M. et al. Evidence of polygenic adaptation in Sardinia at height-associated loci ascertained from the Biobank Japan. Am. J. Hum. Genet. 107, 60–71 (2020).
Article CAS PubMed PubMed Central Google Scholar
Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).
Article CAS PubMed Google Scholar
Hofmeister, R. J., Ribeiro, D. M., Rubinacci, S. & Delaneau, O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat. Genet. 55, 1243–1249 (2023).
Article CAS PubMed PubMed Central Google Scholar
Rozas, J. et al. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol. 34, 3299–3302 (2017).
Article CAS PubMed Google Scholar
Excoffier, L. & Lischer, H. E. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567 (2010).
Article PubMed Google Scholar
Leigh, W. J. & Bryant, D. POPART: full-feature software for haplotype network construction. Methods Ecol. Evol. 6, 1110–1116 (2015).
Article Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hu, J. et al. NextPolish2: A repeat-aware polishing tool for genomes assembled using HiFi long reads. Genomics Proteomics Bioinformatics 22, qzad009 (2024).
Article PubMed PubMed Central Google Scholar
Chen, Y., Zhang, Y., Wang, A. Y., Gao, M. & Chong, Z. Accurate long-read de novo assembly evaluation with Inspector. Genome Biol. 22, 312 (2021).
Article PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. 42, 1571–1580 (2024).
Article CAS PubMed PubMed Central Google Scholar
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, S. et al. De novo and somatic structural variant discovery with SVision-pro. Nat. Biotechnol. 43, 181–185 (2025).
Article CAS PubMed Google Scholar
Kirsche, M. et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat. Methods 20, 408–417 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chen, S. et al. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol. 20, 291 (2019).
Article PubMed PubMed Central Google Scholar
Prufer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).
Article PubMed Google Scholar
Prufer, K. et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 358, 655–658 (2017).
Article PubMed PubMed Central Google Scholar
Delaneau, O., Zagury, J. F., Robinson, M. R., Marchini, J. L. & Dermitzakis, E. T. Accurate, scalable and integrative haplotype estimation. Nat. Commun. 10, 5436 (2019).
Article PubMed PubMed Central Google Scholar
Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6, 8111 (2015).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank all participants in the project, and S.-F. Wu for assistance in sample collection. This study was supported by National Natural Science Foundation of China (32288101 to B.S.; 32170632 to Y.H.; T2222030 and U23A20161 to X.Z.; 32170633 to Y.-C.L.), Major Scientific Project of Yunnan Province (202305AH340007 to B.S.), Yunnan Revitalization Talent Support Program Science and Technology Champion Project (202005AB160004 to B.S.), Yunnan Revitalization Talent Support Program Innovation Team (202405AS350008), Yunnan Scientist Workshops (to B.S.), Science and Technology General Program of Yunnan Province (202301AW070010 to Y.H.), High-level Talent Promotion and Training Projects of Kunming (2022SCP001 to M.-S.P., 2022SCP001 to Y.-P.Z. and 2020SCP001 to Q.-P.K.), Animal Branch of the Germplasm Bank of Wild Species, Chinese Academy of Sciences (the Large Research Infrastructure Funding), the National Key R&D Program of China (2022YFC3302004 to Y.-C.L.), Yunling Scholar of the Yunnan Province (Q.-P.K.) and the Yunnan Ten Thousand Talents Plan Young and Elite Talents Project (Y.-C.L.).

Author information

These authors contributed equally: Yaoxi He, Xiaoming Zhang, Min-Sheng Peng, Yu-Chun Li, Kai Liu

Authors and Affiliations

State Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
Yaoxi He, Xiaoming Zhang, Min-Sheng Peng, Yu-Chun Li, Kai Liu, Yu Zhang, Leyan Mao, Yongbo Guo, Yujie Ma, Bin Zhou, Wangshan Zheng, Tian Yue, Yuwen Liao, Weijie Zhang, Shanshan Dai, Zongliang Gao, Li-Qin Yang, Jianxin Guo, Jiangguo Li, Qing-Peng Kong, Ya-Ping Zhang & Bing Su
Yunnan Key Laboratory of Integrative Anthropology, Kunming, China
Yaoxi He, Xiaoming Zhang, Li-Qin Yang & Bing Su
KIZ/CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming, China
Min-Sheng Peng, Yu-Chun Li, Zongliang Gao, Qing-Peng Kong & Ya-Ping Zhang
University of Chinese Academy of Sciences, Beijing, China
Min-Sheng Peng, Kai Liu, Yu Zhang, Leyan Mao, Yujie Ma, Bin Zhou, Wangshan Zheng, Tian Yue, Yuwen Liao, Weijie Zhang, Zongliang Gao, Jiangguo Li & Ya-Ping Zhang
Kunming Key Laboratory of Healthy Aging Study, Kunming, China
Yu-Chun Li, Zongliang Gao, Li-Qin Yang & Qing-Peng Kong
State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Science, Fudan University, Shanghai, China
Shen-Ao Liang & Lu Chen
National Genomics Data Center, China National Center for Bioinformation, Beijing, China
Xiaoning Chen & Bixia Tang
School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
Xiaofei Yang, Shenghan Gao & Shijie Wan
MOE Key Lab for Intelligent Networks & Networks Security, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
Xiaofei Yang, Kai Ye & Shenghan Gao
Center for Mathematical Medical, the First Affiliated Hospital, Xi’an Jiaotong University, Xi’an, China
Xiaofei Yang & Kai Ye
School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
Kai Ye & Shenghan Gao
Genome Institute, the First Affiliated Hospital, Xi’an Jiaotong University, Xi’an, China
Kai Ye
School of Life Science and Technology, Xi’an Jiaotong University, Xi’an, China
Kai Ye
Faculty of Science, Leiden University, Leiden, The Netherlands
Kai Ye
CEMS, NCMIS, HCMS, MADIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
Yurun Lu & Yong Wang
School of Medicine, Yunnan University, Kunming, China
Rushan Hao
Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
Xuankai Wang & Yafei Mao
Center for Genomic Research, International Institutes of Medicine, The Fourth Affiliated Hospital, Zhejiang University, Yiwu, China
Yafei Mao
Laboratory Animal Center, Kunming Institute of Zoology, the Chinese Academy of Sciences, Kunming, China
Chao Liu
National Resource Center for Non-Human Primates, Kunming, China
Chao Liu
Department of Anthropology, School of Sociology, Yunnan Minzu University, Kunming, China
Jianhua Wang
Department of Geography and Land Management, Royal University of Phnom Penh, Phnom Penh, Cambodia
Tuot Sovannary & Long Bunnath
Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai, Thailand
Jatupol Kampuansai & Angkhana Inta
Department of Biochemistry, Faculty of Medical Science, Naresuan University, Phitsanulok, Thailand
Metawee Srikummool
Department of Biology, Faculty of Science, Naresuan University, Phitsanulok, Thailand
Wibhu Kutanan
Department of Immunology, Ha Noi Medical University, Ha Noi, Vietnam
Huy Quang Ho & Khoa Dang Pham
National University of Laos (NUOL), Vientiane, Laos
Sommay Singthong
Muang Sing District Hospital, Luang Namtha, Laos
Somphad Sochampa
Field School of Archaeology, Paukkhaung, Myanmar
U. Win Kyaing
Queen Sirikit Botanic Garden (QSBG), The Botanical Garden Organization, Chiang Mai, Thailand
Wittaya Pongamornkul
Inter Mountain Peoples Education and Culture in Thailand Association (IMPECT), Sansai, Thailand
Chutima Morlaeku
Asia Indigenous Peoples Pact (AIPP), Chiang Mai, Thailand
Kittisak Rattanakrajangsri
Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
Qing-Peng Kong & Bing Su
State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, School of Life Sciences, Yunnan University, Kunming, China
Ya-Ping Zhang

Authors

Yaoxi He
View author publications
Search author on:PubMed Google Scholar
Xiaoming Zhang
View author publications
Search author on:PubMed Google Scholar
Min-Sheng Peng
View author publications
Search author on:PubMed Google Scholar
Yu-Chun Li
View author publications
Search author on:PubMed Google Scholar
Kai Liu
View author publications
Search author on:PubMed Google Scholar
Yu Zhang
View author publications
Search author on:PubMed Google Scholar
Leyan Mao
View author publications
Search author on:PubMed Google Scholar
Yongbo Guo
View author publications
Search author on:PubMed Google Scholar
Yujie Ma
View author publications
Search author on:PubMed Google Scholar
Bin Zhou
View author publications
Search author on:PubMed Google Scholar
Wangshan Zheng
View author publications
Search author on:PubMed Google Scholar
Tian Yue
View author publications
Search author on:PubMed Google Scholar
Yuwen Liao
View author publications
Search author on:PubMed Google Scholar
Shen-Ao Liang
View author publications
Search author on:PubMed Google Scholar
Lu Chen
View author publications
Search author on:PubMed Google Scholar
Weijie Zhang
View author publications
Search author on:PubMed Google Scholar
Xiaoning Chen
View author publications
Search author on:PubMed Google Scholar
Bixia Tang
View author publications
Search author on:PubMed Google Scholar
Xiaofei Yang
View author publications
Search author on:PubMed Google Scholar
Kai Ye
View author publications
Search author on:PubMed Google Scholar
Shenghan Gao
View author publications
Search author on:PubMed Google Scholar
Yurun Lu
View author publications
Search author on:PubMed Google Scholar
Yong Wang
View author publications
Search author on:PubMed Google Scholar
Shijie Wan
View author publications
Search author on:PubMed Google Scholar
Rushan Hao
View author publications
Search author on:PubMed Google Scholar
Xuankai Wang
View author publications
Search author on:PubMed Google Scholar
Yafei Mao
View author publications
Search author on:PubMed Google Scholar
Shanshan Dai
View author publications
Search author on:PubMed Google Scholar
Zongliang Gao
View author publications
Search author on:PubMed Google Scholar
Li-Qin Yang
View author publications
Search author on:PubMed Google Scholar
Jianxin Guo
View author publications
Search author on:PubMed Google Scholar
Jiangguo Li
View author publications
Search author on:PubMed Google Scholar
Chao Liu
View author publications
Search author on:PubMed Google Scholar
Jianhua Wang
View author publications
Search author on:PubMed Google Scholar
Tuot Sovannary
View author publications
Search author on:PubMed Google Scholar
Long Bunnath
View author publications
Search author on:PubMed Google Scholar
Jatupol Kampuansai
View author publications
Search author on:PubMed Google Scholar
Angkhana Inta
View author publications
Search author on:PubMed Google Scholar
Metawee Srikummool
View author publications
Search author on:PubMed Google Scholar
Wibhu Kutanan
View author publications
Search author on:PubMed Google Scholar
Huy Quang Ho
View author publications
Search author on:PubMed Google Scholar
Khoa Dang Pham
View author publications
Search author on:PubMed Google Scholar
Sommay Singthong
View author publications
Search author on:PubMed Google Scholar
Somphad Sochampa
View author publications
Search author on:PubMed Google Scholar
U. Win Kyaing
View author publications
Search author on:PubMed Google Scholar
Wittaya Pongamornkul
View author publications
Search author on:PubMed Google Scholar
Chutima Morlaeku
View author publications
Search author on:PubMed Google Scholar
Kittisak Rattanakrajangsri
View author publications
Search author on:PubMed Google Scholar
Qing-Peng Kong
View author publications
Search author on:PubMed Google Scholar
Ya-Ping Zhang
View author publications
Search author on:PubMed Google Scholar
Bing Su
View author publications
Search author on:PubMed Google Scholar

Consortia

Consortium of Anthropological Research in Southeast Asia and Southwest China (CASEAC)

Yaoxi He
, Xiaoming Zhang
, Min-Sheng Peng
, Yu-Chun Li
, Jianxin Guo
, Lu Chen
, Yafei Mao
, Jianhua Wang
, Tuot Sovannary
, Long Bunnath
, Jatupol Kampuansai
, Metawee Srikummool
, Wibhu Kutanan
, Huy Quang Ho
, Khoa Dang Pham
, Sommay Singthong
, Somphad Sochampa
, U. Win Kyaing
, Qing-Peng Kong
& Bing Su

Contributions

B.S., Y.-P.Z. and Q.-P.K. conceived and designed the study. B.S. and Y.H. coordinated and supervised the project. T.S., L.B. and X.Z. collected samples from Cambodia. J.K., M.S., W.K. and J.W. collected samples from Thailand. H.Q.H., K.D.P., S.D. and M.-S.P. collected samples from Vietnam. S. Singthong, S. Sochampa, C.L., Z.G., L.-Q.Y. and Y.-C.L. collected samples from Laos. U.W.K. and M.-S.P. collected samples from Myanmar. X.Z. collected samples from China. A.I., W.P., C.M. and K.R. were responsible for the ethical approval work, personnel coordination and volunteer organization for sample collection. Y.H., X.Z., M.-S.P., Y.-C.L., Y.Z., K.L., Y. Ma and T.Y. prepared the samples and processed them for sequencing. Y.H. and K.L. contributed to data processing, QC, variant analysis and genome assembly. Y.H., Y. Ma, W. Zheng, Y. Liao, L.M., J.G., J.L., R.H., K.L., Y. Lu and Y.W. contributed to the population genetics analysis. Y.Z., T.Y. and Y.H. contributed to construction of the imputation reference panel. W. Zhang, X.C. and B.T. contributed to construction of the SEA3K Imputation Server. Y.G., Y.H, X.Y., K.Y., S.G., S.W., B.Z., Y. Mao and X.W. contributed to structural variants analysis. L.C., S.-A.L, Y.Z. and Y.H. contributed to archaic introgression analysis. Y.H. and X.Z. were responsible for organizing the CASEAC. Y.H., M.-S.P. and Y.-C.L. were responsible for ethical, legal and social implications. Y.H. and B.S. wrote the manuscript. Y.-P.Z., Q.-P.K., X.Z., M.-S.P., Y.-C.L., L.C. and Y. Mao edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Qing-Peng Kong, Ya-Ping Zhang or Bing Su.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Stephen Acabado and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Statistics of small variant calls.

The sample-level counts of small variants (SNVs and indels) (panel-a) and the calculated heterozygosity ratios of small variants (panel-b) cross populations, stratified by super-populations. A totoal of 29 MSEA populations (CMKL was excluded due to the small sample size of two individuals) in the SEA3K dataset and 26 populations (belong to five super-populations) from 1KGP dataset were included in this analysis. Panel-a shows a higher average number of variants per individual genome in MSEA populations than those in Eurasian populations of 1KGP, and panel-b shows a comparable heterozygosity ratio in MSEA populations with East Asian populations. The genome-wide heterozygosity was calculated by the ratio of the number of heterozygous SNVs divided by the number of non-reference homozygous SNVs. For each boxplot, we drew a box from the first quartile to the third quartile. A horizontal line across the box indicates the median. The whiskers go from each quartile to the minimum or the maximum. Abbreviations in 1KGP: EAS-East Asians, AMR-Americans, SAS-South Asians, EUR-Europeans, and AFR-Africans.

Extended Data Fig. 2 Schematic diagram of SV merging and SV novelty evaluation.

a, The SV calls from Sniffles2 and PAV of each individual were merged using Jasmine, and then we merged the 37 individual calls to obtain the population calls (120,063 SVs in total). b, The counts of individual SVs in the Sniffles2 calls, the PAV calls and the merged calls. The mean SV counts per individual at each step are labeled and denoted using dashed red lines. c, Repeat annotation of MSEA SVs. d, The HGSVC3 and HPRC SV callsets are considered as the reference panel. The detailed information of evaluation procedure is provided in Methods.

Extended Data Fig. 3 Principal component analysis (PCA) of the SEA3K and worldwide population samples.

The plots were constructed using 4,587,143 biallelic SNVs among 5,938 global individuals, including 2,183 samples drawn from SEA3K, 2,504 samples from 1KGP, 828 samples from HGDP, 279 samples from SGDP, and other representative populations from the published data: Malays (n = 96), Andamanese (n = 10) and Tibetans (n = 38) (see Methods). The upper-panel shows the global pattern, and the bottom-panel shows the regional pattern including only East Asians, South Asians and SEA3K. The SEA3K dataset was categorized by language families.

Extended Data Fig. 4 Genetic relatedness of MSEA populations with other global populations.

The Neighbor-Joining (NJ) trees and Maximum-Likelihood (ML) trees were constructed using the merged SNV data from SEA3K (marked in red font) and 1KGP (or HGDP). The NJ trees showing the genetic relationship that was determined using genetic distances, and the unit of branch length for genetic distance is shown in the bottom of the tree. In the ML trees, the standard errors and 100 bootstrap replicates were used to evaluate the confidence in the inferred tree topology.

Extended Data Fig. 5 Pairwise F_ST between MSEA and East Asian populations.

Heatmaps showing the pairwise F_ST differences between East Asian populations from SGDP, and MSEA populations in SEA3K. Color scales for both heatmaps are the same. Genetic differences between MSEA populations are greater than between population from East Asia distributed over a comparable geographic range. For example, F_ST between MYNA and CMKR (0.051) is significantly higher than is the highest F_ST between East Asian groups (0.037, Yakut vs. She), given the two populations are separated by three to four times the geographical distance of the MSEA populations.

Extended Data Fig. 6 Historical population dynamics of MSEA populations.

The plots show the results of the estimated effective population size (Ne) of the ancestral populations in MSEA populations (stratified by countries) using SMC + + (dotted lines) and MSMC2 (solid lines) methods. The generation time and the mutation rate per generation per site (μ) in the analyses were set as 29 years and 1.25 × 10⁻⁸ in both MSMC2 and SMC + +, respectively.

Extended Data Fig. 7 GWAS annotation and simulation of PS-SNVs in MSEA populations.

a, Manhattan plot of the CMS scores of the genome-wide SNVs in MSEA populations. The reported GWAS hits are highlighted by red dots and labeled with GWAS information (variants, associated traits and mapped genes). There are six positive-selection regions harboring the GWAS hits of the top PS-SNVs (with the top 0.01% CMS scores), including two regions covering genes (GDF5, UQCC1 and ZRANB3) related to body height, one region covering genes (SLC24A5 and MYEF2) related to skin pigmentation, one region covering PNPT1 related to hip circumference, one region covering WDPCP related to diet measurement, and the MHC region related to diverse phenotypes. The dot size denotes to the P-value of the CMS score. Statistical significance was assessed using one-sided chi-squared (χ²) test. b, Permutation test of height-associated SNVs. Among the 5,505 PS-SNVs in the 44 regions, we found a significant excess of variants related to body height, with 21 height-associated SNVs falling in the PS-SNV set (P-value = 0.005 based on 1000 one-sided permutation tests) (see details in Methods).

Extended Data Fig. 8 Signature of natural selection in the FLG gene region.

a, The regional plot of CMS scores and recombination rates in the FLG region, in which the peaks indicate the selective signals. The peak SNVs are marked with colors. The bottom panel shows the LD blocks of the 262 PS-SNVs with MSEA-enriched alleles. The dot size denotes to the P-value of the CMS score, and statistical significance was assessed using chi-squared (χ²) test. The calculated recombination rates (r²) indicate the estimated linkage disequilibrium (LD) degree between the peak SNV and the other SNVs and are coded in colors. b, The TCS network of the FLG region showing a MSEA-specific haplogroup. Each node represents a haplotype, and the size is proportional to its frequency. The MSEA-specific haplogroup is highlighted.

Extended Data Fig. 9 Identification of archaic introgression in MSEA populations.

a, Comparison of the archaic-introgression sequences (from Neanderthal and Denisovan) between MSEA and global populations in 1KGP. EAS-East Asians (CDX and KHV were excluded since they due belong to MSEA), AMR-Americans, SAS-South Asians and EUR-Europeans. The y-axis indicates the mean detected Neanderthal sequences (or Denisovan sequences) per individual from different populations, stratified by super populations. We compared MSEA with EAS and SAS, and evaluated significance using two-sided unpaired t-test. For each boxplot, we drew a box from the first quartile to the third quartile. A horizontal line across the box indicates the median. The whiskers go from each quartile to the minimum or the maximum. b, Mean amounts of the detected introgressed sequences per individual in the 1KGP populations, categorized by affinity to the Altai Neanderthal and Altai Denisovan genomes. c, Intersection of the Neanderthal introgression callsets between Sprime and IBDmix. The callsets were merged for all identified introgression segments in MSEA individuals. d, Violin plots of the Neanderthal sequences per individual in MSEA populations identified by Sprime and IBDmix. For each boxplot, we drew a box from the first quartile to the third quartile. A horizontal line across the box indicates the median. The whiskers go from each quartile to the minimum or the maximum.

Extended Data Fig. 10 Hierarchical clustering of the haplotypes spanning the adaptive introgression region on Chr1.

The rows illustrate the individual haplotypes. A total of 2,116 individuals from 23 populations in SEA3K (MSEA population) were include, and 2,504 individuals from 26 geographically diverse populations in 1KGP (including SAS, EAS, EUR and AMR populations) were used as the control groups. The Denisovan-derived PS-SNVs are denoted by rhombus in red. The colors of gray and black represent the ancestral and the derived alleles, respectively. DEN, Denisovan. The introgressed Denisovan-like haplotypes were marked in the plot.

Extended Data Fig. 11 Detection of disease-associated variants and protein-truncating variants in MSEA populations.

a, Frequency distribution of the ClinVar pathogenic variants in SEA3K. The classifications of autosomal-dominant (AD), autosomal recessive (AR) and unknown were based on the OMIM database. b, Number of pathogenic variants carried by each MSEA individual. c, Ten pathogenic variants specifically enriched in MSEA populations. The mapped gene, variants and risk alleles, and frequencies of risk alleles in SEA3K and other datasets are indicated. The clinical significance is indicated by the exclamation marks (pathogenic level) and stars (times of classified by previous submitter). d, Frequency distribution of an alpha-thalassemia variant in HBA2 (chromosome 16: 173598; c.427 T > C) in world populations and MSEA populations. NA, not available. e, The proportion of genes with at least one high-confidence PTVs (pie on the left), and the proportions of novel, known, heterozygous and homozygous PTVs (pie on the right) in the SEA3K dataset. f, The counts of the identified novel homozygous PTVs per individual across MSEA populations.

Supplementary information

Supplementary Information

Supplementary Figs 1–18 and a list of members of the Consortium of Anthropological Research in Southeast Asia and Southwest China (CASEAC).

Reporting Summary

Supplementary Tables

Supplementary Tables 1–20.

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, Y., Zhang, X., Peng, MS. et al. Genome diversity and signatures of natural selection in mainland Southeast Asia. Nature 643, 417–426 (2025). https://doi.org/10.1038/s41586-025-08998-w

Download citation

Received: 14 February 2024
Accepted: 09 April 2025
Published: 14 May 2025
Version of record: 14 May 2025
Issue date: 10 July 2025
DOI: https://doi.org/10.1038/s41586-025-08998-w

This article is cited by

Integrated genetic and geographic ancestry prediction via large-scale genomic data and machine learning
- Jing Chen
- Yuguo Huang
- Jiangwei Yan
Human Genomics (2025)