East Asian Gene flow bridged by northern coastal populations over past 6000 years

Liu, Juncen; Liu, Yichen; Zhao, Yongsheng; Zhu, Chao; Wang, Tianyi; Zeng, Wen; Sun, Bo; Wang, Fen; Han, Hui; Li, Zhenguang; Feng, Xiaotian; Cao, Peng; Luan, Fengshi; Liu, Feng; Dai, Qingyan; Guo, Junfeng; Wang, Zimeng; Wei, Chengmin; Wei, Qiaowei; Yang, Ruowei; Hou, Weihong; Ping, Wanjing; Bai, Fan; Miao, Bo; Wang, Wenjun; Yang, Melinda A.; Fu, Qiaomei

doi:10.1038/s41467-025-56555-w

Download PDF

Article
Open access
Published: 03 February 2025

East Asian Gene flow bridged by northern coastal populations over past 6000 years

Juncen Liu^1,2^na1,
Yichen Liu ORCID: orcid.org/0000-0002-7187-6232¹^na1,
Yongsheng Zhao³^na1,
Chao Zhu⁴^na1,
Tianyi Wang ORCID: orcid.org/0000-0002-4790-7668^1,2^na1,
Wen Zeng³,
Bo Sun⁴,
Fen Wang⁵,
Hui Han⁴,
Zhenguang Li⁴,
Xiaotian Feng¹,
Peng Cao¹,
Fengshi Luan⁵,
Feng Liu¹,
Qingyan Dai¹,
Junfeng Guo⁶,
Zimeng Wang⁴,
Chengmin Wei⁴,
Qiaowei Wei⁷,
Ruowei Yang¹,
Weihong Hou¹,
Wanjing Ping¹,
Fan Bai^1,2,
Bo Miao^1,8,
Wenjun Wang^1,9,
Melinda A. Yang ORCID: orcid.org/0000-0001-9004-7563¹⁰ &
…
Qiaomei Fu ORCID: orcid.org/0000-0002-7141-0002^1,2

Nature Communications volume 16, Article number: 1322 (2025) Cite this article

17k Accesses
15 Citations
80 Altmetric
Metrics details

Subjects

Abstract

Coastal areas of northern East Asia in the ShanDong region, which show complex cultural transitions in the last 10,000 years, have helped to facilitate population interactions between more inland regions of mainland East Asia and islands such as those in the Japanese archipelago. To examine how ShanDong populations changed over time and interacted with island and inland East Asian populations, we sequenced 85 individuals from 11 ancient sites in the ShanDong region dating to ~6000-1500 BP. We found that ancestry related to ShanDong populations likely explains the mainland East Asian ancestry observed in post-Yayoi populations from the Japanese archipelago, particularly recent populations who lived in the Ryukyu Islands after ~2800 BP. In the ShanDong region, we observed gene flow from populations to the north and south of this region by at least ~7700 BP, and two waves of gene flow associated with the inland Yellow River populations into the ShanDong region during the DaWenKou cultural period (6000-4600 BP) and in the early dynastic period (3500-1500 BP). Reconstructing the genetic history of the Neolithic, Bronze, and Iron Age populations of coastal northern East Asia shows gene flow on both a north-south and an east-west (inland-coastal-island) scale.

Ancient DNA indicates 3,000 years of genetic continuity in the Northern Iranian Plateau, from the Copper Age to the Sassanid Empire

Article Open access 13 May 2025

The genomic history of the indigenous people of the Canary Islands

Article Open access 15 August 2023

Ancient DNA reveals the population interactions and a Neolithic patrilineal community in Northern Yangtze Region

Article Open access 30 September 2025

Introduction

Maritime travel has greatly impacted human migration history, enabling long-distance movement that not only introduced humans to many islands and remote continents, but also increased biological and cultural interactions between different human groups. In southern East Asia, multiple waves of population admixture and turnover as well as the spread of cultures and languages have been observed across oceans^1,2, and the migration of proto-Austronesian humans from coastal southern East Asia to islands of Southeast Asia and the Southwest Pacific has been well characterized^3,4,5. In northern East Asia, human movement and interaction in northern coastal East Asia and Pacific islands such as those found in the Japanese archipelago greatly impacted the region, as the eastern coastline of East Asia was an important route for the spread of crops and trade goods (e.g. rice) from mainland East Asia to the Japanese archipelago^6,7. Previous ancient genomic studies have shown gene flow from ancient hunter-gatherers from Japan (e.g. Jōmon) into prehistoric populations from Far East Siberia (e.g. Boisman_MN⁸) and the West Liao River basin⁶. Efforts to investigate the genetic connections and history between mainland East Asians and populations from the Japanese archipelago have shown that populations from the the Kofun period (~1750–1400 BP) of Japan and historical Nagabaka populations from the Ryukyu islands show partial ancestry related to northern East Asians^6,7,9 and can be described by a three-ancestry model, where they possess ancestry related to Jōmon hunter-gatherers, and two mainland East Asian sources. One mainland East Asian source, which appeared during the Yayoi period (~2300–1750 BP) of Japan, has been associated with ancient northern East Asians. The other, which arrived in the Japanese archipelago after the Yayoi period, has not been clearly identified, and only present-day Han populations have been used as a proxy^7,9. Even with more than 3000 deeply sequenced genomes from present-day Japanese populations, a suitable East Asian ancestral source population has yet to be found¹⁰. In previously published studies, differences in genetic structure were observed between populations in the Japanese archipelago, with different genetic patterns between populations from the main islands (Hondo) and the Ryukyu islands, complicating the population history of prehistoric Japan further^10,11,12,13. Thus, increased sampling of ancient humans from coastal regions of mainland East Asia, where sampling has been limited to a few localities and time periods, is vital for determining the East Asian ancestry that made substantial contributions to the genetic composition of humans from the Japanese archipelago.

Previous sampling of ancient ShanDong populations from the Early Neolithic period⁴ (~9500–7700 BP) shows a shared ancestry that falls within the diversity of ancient northern East Asian ancestries spanning from the Upper Yellow River Basin to the Amur River Basin. However, younger populations from the ShanDong region have yet to be sampled, despite a rich archaeological context. The ShanDong region was home to one of the longest and most influential Neolithic cultures in East Asian prehistory, the DaWenKou culture (6000–4600 BP)^{14,15,16,17,18,19}. The DaWenKou culture spanned the Middle and Late Neolithic and was primarily located in ShanDong province, and it co-existed with the YangShao culture that was distributed along the Yellow River^14,15,17,18. Interactions between these two cultures were highly dynamic, where influences from the YangShao culture can be observed in sites associated with DaWenKou culture²⁰. Ultimately, by 4600 BP, populations in the Yellow River Basin and the ShanDong region both showed cultural remains associated with the LongShan culture (4600–4000 BP)^20,21. However, the population movement and interaction associated with the transition to the LongShan culture in both the Yellow River and ShanDong regions is still unknown.

For the early dynastic period spanning the Xia to the Jin Dynasties (~4000–1500 BP), the historical and archaeological record emphasizes the prominent role of trading in the coastal regions of East Asia, leading to increased communication and conflict as early as the Shang Dynasty²². In ShanDong, the predominant culture was the Dongyi culture, which was culturally influenced by populations from the Yellow River region during the Shang Dynasty through salt trading²³. Archaeological and historical studies point to frequent conflicts in coastal regions since the Shang Dynasty that ultimately led to the incorporation of the ShanDong region under the rule of the Western Zhou Dynasty²⁴. The effect of increased interaction through trade and conflict across the Yellow River and ShanDong regions²⁵ on the population makeup of coastal northern East Asians is unclear, due to a lack of aDNA evidence in the coastal region in East Asia from this time period.

To study changes in coastal populations from the ShanDong region during the dynamic period spanning from the early Neolithic period to the Jin Dynasty and examine the impact of ShanDong populations on nearby populations in the Yellow River region and the Japanese archipelago, we collected 85 individuals from 11 sites dating from 6000 to 1500 BP from ShanDong, which spans a sixth of the coastline of China. By retrieving aDNA evidence from these individuals, we reconstructed the history of ShanDong populations and resolved the population dynamics of mainland, coastal, and archipelago East Asians from the DaWenKou cultural period to the early dynastic period.

Results

We generated genome-wide data from 85 ancient individuals sampled from 11 sites from the ShanDong region. Radiocarbon dating indicates that these individuals span from ~6000 to 1500 BP, covering the DaWenKou cultural period to the Jin Dynasty (Fig. 1A, B). Coverage across the genome-wide data for these individuals ranged from 0.028 to 2.696×. Specifically, 78 individuals with at least 40,000 SNPs were retained for downstream population genetic analyses, and the seven remaining low-coverage individuals (labeled with the suffix “_low”) were only included in limited downstream analyses (Supplementary Table S1). We estimated the contamination level using X chromosomes (males) and mitochondrial genomes (males and females)^26,27. For 77 sampled individuals with at least 40,000 SNPs and two low-coverage individuals, we estimated contamination levels lower than 5.0%. For the three individuals (TL4773_d_k, BQ4625_d_low_k, YX4790_d_low) with high contamination (>5.0%), we restricted our analyses to fragments showing characteristic aDNA deamination when performing genotype calling for downstream analyses^28,29 (Supplementary Table S1).

**Fig. 1: Spatial, temporal, and genetic structure associated with ancient individuals from the ShanDong region.**

We explored the genetic relationship these coastal populations from ShanDong shared with nearby mainland East Asians, as well as island populations of the Japanese archipelago. We then used the observed genetic connections to examine how these coastal populations influenced and were influenced by inland and island neighbors. Newly sampled ShanDong populations were associated with one of three periods: the DaWenKou cultural period spanning the Neolithic period dating to 6000–4600 BP (ShanDong_DWK), the LongShan cultural period spanning the Neolithic period dating to 4600–4000 BP (ShanDong_LS), and the early Chinese dynastic period spanning 3500–1500 BP (ShanDong_CD, Fig. 1A, B).

North-south interactions influenced ShanDong populations since at least 7700 BP

We first examined the genetic relationship between the newly sampled individuals from the ShanDong region and previously sampled ancient and present-day East Asians. Using cluster analyses (PCA³⁰, Umap³¹, t-sne³²), we found that the ShanDong individuals are located close to ancient and present-day northern East Asians, including previously sampled Early Neolithic ShanDong populations (9500–7700 BP⁴) (Fig. 1C, D, Supplementary Fig. S2A, B). Outgroup f3-statistics³³ also demonstrated that the ShanDong populations fall within northern East Asian genetic diversity (Fig. 2A). They formed clusters distinct from that observed for ancient humans from the Japanese archipelago, West Liao River, and Amur River regions in the PCA (Fig. 1C, D). Outside of ShanDong, populations from the Yellow River region are closest to the newly sampled individuals in the PCA. Among ancient ShanDong populations, there are three main clusters: one composed of Early Neolithic populations (9500–7700 BP), one composed of early ShanDong individuals dating to the DaWenKou cultural period (6000–4600 BP, ShanDong_DWK), and another composed of later ShanDong populations from the LongShan and early dynastic cultural periods (4600–1500 BP, ShanDong_LS and ShanDong_CD, Fig. 1C). Notably, compared with the DaWenKou and Early Neolithic ShanDong populations, younger ShanDong populations are shifted towards ancient Yellow River populations, suggesting influence from inland East Asian populations outside of the ShanDong region. In a Treemix analysis³⁴, we observed that all ShanDong populations group with ancient northern East Asian populations, with those from the ShanDong region sharing the closest relationship to each other (Fig. 2B). Outside of ShanDong, Yellow River populations show the closest genetic relationship to ShanDong populations (supported by 59.2% of 1000 bootstrap trees, Fig. 2B, Supplementary Table S3). These patterns support that ShanDong populations fall within the genetic diversity of northern East Asians. In an ADMIXTURE³⁵ analysis, we observed the same components across ShanDong populations, but with varying proportional distributions of these components over time. (Fig. 2C).

**Fig. 2: Genetic relationships of ShanDong popualtions to other ancient East Asians.**

The Holocene in mainland East Asia was a time of marked change in human societies, with the rapid rise of farming and complex societies^36,37,38,39. We first examined the effect of these societal changes in ShanDong populations from the Early Neolithic (9500–7500 BP) and the DaWenKou cultural period (6000–4600 BP). We found different trends in the Early Neolithic, where individuals from the Xiaojingshan site (7700 BP) show more genetic connections with populations outside of the ShanDong area, such as northern East Asians from the Amur River region and Far East Siberia (AR/FE^8,40,41), as well as southern East Asians from Fujian, the Taiwan Strait, and Guangxi regions (aSC^3,4). In an f4-analysis assessing whether any Early Neolithic ShanDong individuals share excess ancestry with other ancient East Asians, we observed that the 7700 BP Xiaojingshan individuals share additional alleles with these northern East Asians (aAR/FE), i.e. most f4 (Bianbian/Boshan/Xiaogao/SD9K/aYR/aLR/aSC, Xiaojingshan; aAR/FE, Mbuti) < 0 (−16.1 < Z < 0.4, Supplementary Table S4a), and southern East Asians (aSC), i.e. most f4 (Bianbian/Boshan/Xiaogao/SD9K/aYR/aLR, Xiaojingshan; aSC, Mbuti) < 0 (-7.9 < Z < 1.6, Supplementary Table S4b) relative to older ShanDong individuals dating to ~9000 BP and other northern East Asians (except for AR9.2K_o, who shares some genetic affinity with ancient ShanDong populations⁴²). Using a rotational qpAdm strategy to further parse Xiaojingshan’s connection to these northern East Asians, we found that Xiaojingshan can only be modeled by a 3-way model with 74.2% Early Neolithic ShanDong ancestry related to Bianbian, Xiaogao, and Boshan (SD9K); 9.8% ancestry related to Early Neolithic populations from Fujian (Fujian_EN); and 16.0% ancestry related to Amur River populations younger than 14,000 years ago (ARpost14K, Supplementary Table S5), confirming that Xiaojingshan shows additional genetic influences from northern and southern East Asian ancestries outside of the ShanDong region. This suggests that as early as the Early Neolithic, there was already some interaction with northern and southern East Asian populations from other regions of mainland East Asia.

Two pulses of gene flow from inland to coastal populations in northern East Asia

We next investigated genetic relationships during the Middle and Late Neolithic between coastal ShanDong populations (6000–4600 BP) associated with the DaWenKou culture and inland Yellow River populations (7000–5000 BP) associated with the YangShao culture. In the PCA (Fig. 1C, D), we observed that the three DaWenKou ShanDong populations, from the more inland GangShang site (GSGroup) to the more coastal BeiQian (BQGroup) and FuJia (FJGroup) sites, are shifted away from the Early Neolithic ShanDong populations (ShanDong_EN), and toward YangShao-related Yellow River (YR) populations. It can be observed that the three populations from the DaWenKou period distributed along this axis (from ShanDong_EN to YR) show different affinities to YR populations. Specifically, the relatively inland GSGroup clusters with YR populations, whereas the coastal BQGroup and FJGroup fall between the YR and Early Neolithic ShanDong populations (Fig. 1C, D).

To further determine whether DaWenKou populations show additional YR ancestry relative to Early Neolithic ShanDong populations, we employed a rotational qpAdm strategy to estimate ancestral components found in the DaWenKou populations. A total of 11 representative East Asian populations (e.g. ARpost14K, aFujian_EN) were rotated as potential source ancestries for the three DaWenKou groups (Supplementary Table S5). Using this strategy, only those populations that fit the mixture model are considered as ancestral source populations (i.e. Tail_prob >0.05, each ancestral mixture proportion >standard error, pnest <0.05, Supplementary Table S5). With the rotational qpAdm strategy, we found that the DaWenKou populations (BQGroup and GSGroup) can be modeled as a mixture of ancestry related to Early Neolithic ShanDong populations (ShanDong_EN, ~29–87%) and YR populations (YR, ~13–71%), while FJGroup are best described by a single source ancestry related to ShanDong_EN (“1-way” model, Tail_prob = 0.08) or the BQGroup (Tail_prob = 0.47, Fig. 3A, Supplementary Table S5).

**Fig. 3: Gene flow related to the ShanDong populations.**

Consistent with previous observation that the three sampled DaWenKou populations show different affinities to the YR populations, the mixture proportion calculated by qpAdm for YR ancestry in 6000–4600 BP ShanDong populations varies, with the GSGroup showing the highest levels of YR ancestry (~50–71%, Fig. 3A, Supplementary Table S5), followed by the BQGroup (~13%, Fig. 3A, Supplementary Table S5) and the FJGroup (~0–13%, Fig. 3A, Supplementary Table S5). In an ADMIXTURE analysis, the GSGroup shows a higher proportion of a YR-related component (orange) compared to other DaWenKou populations (Fig. 2C). Based on our finding of greater YR-related ancestry in the inland GSGroup relative to the coastal BQGroup and FJGroup, we suspect that YR-related ancestry had decreasing impact with proximity to the coast. With only three DWK sites represented, however, finer sampling of DWK sites in Shandong is needed to confirm this hypothesis. Collectively, these patterns suggest that ancestry related to YR populations impacted populations in the ShanDong region during the DaWenKou cultural period (Fig. 3B).

Between 4600–4000 BP, the archaeological record shows high cultural assimilation in YR and ShanDong populations, resulting in a shared culture across these inland and coastal regions denoted the LongShan culture²¹. To explore population dynamics during this cultural transition, we investigated shifts in genetic ancestry of the coastal ShanDong population during this time period, particularly in populations from the YinJiaCheng (YJCGroup) and ChengZiYa (CZYGroup) sites. We first found that the genetic influence of YR populations persisted in ShanDong populations of the Late Neolithic who are associated with the LongShan cultural period (YJCGroup and CZYGroup). In a PCA, similar to the DaWenKou GSGroup, both YinJiaCheng and ChengZiYa individuals clustered with Yellow River individuals (Fig. 1C, D). In an ADMIXTURE analysis, individuals from the YJCGroup and CZYGroup show a component related to inland YR populations, and the proportion of a YR-related component in these two groups is within a range that overlaps with the proportion observed in the three DaWenKou populations (GSGroup, BQGroup, and FJGroup, Fig. 2C). We then explored whether this YR-related component was introduced from additional admixture from YR populations using an f4-analysis. We found that f4 (ShanDong_DWK, ShanDong_LS; YR, Mbuti)~0 (−2.3 < Z < 3.3, with only one “Z-value” >3 when the ShanDong_DWK was GSGroup, Supplementary Table S6), which suggests that LongShan populations did not share more genetic connections with YR populations than DaWenKou populations.

Using a rotational qpAdm analysis (Fig. 3A, Supplementary Table S5), we found that both LongShan populations can only be modeled as a mixture of ancestry related to the GSGroup (50–71% YR) – the DaWenKou population with elevated YR ancestry – and another DaWenKou population (BQGroup) or an Early Neolithic ShanDong population (21–32%, Fig. 3A, Supplementary Table S5). Interestingly, our result suggests different patterns of admixture in LongShan and DaWenKou populations. The LongShan populations cannot be modeled using a mixture of “ShanDong-related ancestors” and a “YR population”, but only as a mixture of two ShanDong populations. This suggests that rather than continued admixture from YR-related populations, there was genetic continuity between the LongShan and older ShanDong populations. Overall, our results support LongShan populations as a mixture of ancestry related to the older DaWenKou populations from both the more coastal and inland regions of ShanDong, with no additional influence from YR-related populations.

We next examined genetic changes in the ShanDong region during the early dynastic period of China, starting around 3500 BP with the establishment of the Shang Dynasty. Using a rotational qpAdm strategy (Fig. 3A, Supplementary Table S5) for early dynastic ShanDong populations, we found that they can be modeled by three different admixture patterns: (1) The HouLi (HLGroup), LiuJiaZhuang (LJZGroup), and XiChen (XCGroup) populations can be modeled as a single ancestry related to the LongShan “CZYGroup”, with no additional connections to YR-related ancestry beyond that observed in LongShan populations. (2) The TongLin (TLGroup) and XinZhi (XZGroup) populations can only be modeled by a single source ancestry related to the “GSGroup”, the DaWenKou population who showed an elevated YR-related ancestry (50–71%). This suggests that the two populations may share more YR-related ancestry, more similar to DaWenKou populations than LongShan populations. However, it is important to note that, apart from the level of YR-related ancestry, the ancient ShanDong populations are overall highly similar to each other. Two possible scenarios could have given rise to the observed qpAdm model for the TLGroup and XZGroup: (a) genetic continuity between the GSGroup and these two early dynastic groups, or (b) additional YR-related admixture leading to genetic similarity between the GSGroup and these two groups. The major difference between these two scenarios is the timing of when the YR-related ancestry was introduced into the population. (3) Lastly, we found that the YiXi population (YXGroup) is best modeled as a mixture of YR-related ancestry (~75–92%) and ancestry related to another ShanDong population (e.g., ~25% CZYGroup, or ~9% SD9K, Supplementary Table S5).

We further note that the TLGroup, XZGroup, and YXGroup (all younger than 3000 BP) mixture model patterns all differ compared with the LongShan mixture model pattern. To explore what happened to these three early dynastic ShanDong populations (XZGroup, YXGroup and TLGroup), we estimated the timing of admixture using DATES, and found that these three populations could be modeled as a mixture of ancestry related to ShanDong populations older than 3500 BP and ancestry related to YR populations (YR_MN/YR_LN), with an estimated date of admixture around 4.6–18.9 generations prior to the dynastic period, or ~2880–2030 BP (Supplementary Table S7). This finding suggests a second wave of admixture, where these three populations may have been genetically influenced by a second YR-related population after the LongShan cultural period, unique from the admixture event associated with the DaWenKou populations (at least 6000–4600 BP). In an ADMIXTURE analysis, we observed that individuals from the TLGroup, XZGroup, and YXGroup show a higher proportion of a component found in Yellow River populations (orange) compared with other ShanDong populations (Fig. 2C). Unlike the previous wave during the DaWenKou cultural period that affected all sampled ShanDong populations, the second wave may have only influenced a subset of the early dynastic ShanDong populations.

As population interactions can be bidirectional, we also tested whether Yellow River populations were influenced by ShanDong populations. Using an f4-analysis, we found that ancient Yellow River populations are similarly related to ShanDong populations, i.e. f4 (aYR, aYR; 6000–1500 BP ShanDong, Mbuti) ~ 0 (−3.0 > Z > 3.0, Supplementary Table S8), suggesting that Yellow River populations were not differentially influenced by ancestry related to ShanDong populations (Fig. 3B).

Introduction of ancestry related to ShanDong coastal East Asians in Ryukyu Islanders at least after 2800 BP

To examine the relationship of ShanDong populations to those who lived in the Japanese archipelago, we next compared our newly sampled individuals to previously published ancient Japanese populations. In the PCA, ancient Japanese populations form two main clusters, where younger populations dating to the post-Yayoi period are intermediate between the BQGroup, GSGroup, and FJGroup ShanDong populations from the DaWenKou cultural period, and older Jōmon hunter-gatherer populations (Fig. 1C, Supplementary Fig. S2A, B). In an ADMIXTURE analysis, younger Japanese populations, especially the Yayoi, Kofun, and historical Nagabaka populations, possess genetic components that are widely distributed among ShanDong populations younger than 6000 BP, confirming a strong connection between ShanDong populations associated with the DaWenKou cultural period and more recent populations from Japan (Fig. 2C).

This connection was further confirmed by f4 statistics⁴³, where most f4(>3000 BP aJapanese, <3000 BP aJapanese; 6000–1500 BP ShanDong populations, Mbuti) < 0 (−21.7 < Z < 3.1, Supplementary Table S9), showing more shared alleles between the <3000 BP post-Yayoi Japanese populations and 6000–1500 BP ShanDong populations compared with the older Jōmon hunter-gatherers (Supplementary Table S9).

To determine whether this connection is specific to 6000–1500 BP ShanDong populations instead of other northern East Asians, we next tested whether the ShanDong populations could be described as a necessary ancestral source population for different recent Japanese populations (Nagabaka_2800BP, Kofun, Nagabaka_historic) in a rotational qpAdm analysis. We found three major patterns: (1) We found that the Nagabaka population dating to 2800 BP can be modeled as solely ancestry related to the Jōmon (Supplementary Table S5). (2) Then, we found that the historical Nagabaka population could only be modeled as a mixture of two ancestries, related to the ~4600 to 3500 BP CZYGroup and YJCGroup (75.0–75.2%) from the LongShan cultural period in ShanDong and ~3900 to 3700 BP individuals from the Late Jōmon (24.8–25.0%, Supplementary Table S5). We also found that the YR_LN, the LongShan ShanDong (YJCGroup/CZYGroup), and the Jōmon populations were included in the best three-ancestry model (but when comparing with the two-ancestry model, pnest = 0.04, Supplementary Table S5). The three-ancestry and two-ancestry models do not contradict each other, because these ShanDong populations (CZYGroup and YJCGroup) already carry northern inland East Asian (YR-related) and coastal East Asian components, where the coastal East Asian component is specific to ShanDong populations and was not identified in previously sampled ancient East Asian populations. Previous analysis of the historical Nagabaka population showed that they were best described by a three-ancestry model composed of Jōmon ancestry, northern East Asian ancestry, and an ambiguous ancestry related to present-day Han populations. Here we found that a coastal East Asian ancestry related to the ShanDong population better represents the ancestry represented previously by the Han, where the historical Nagabaka population is best described by a three-ancestry model as a mixture of Jōmon ancestry, northern inland East Asian ancestry related to YR populations, and additional northern coastal East Asian ancestry related to ShanDong populations. (3) The genetic influence of ancient ShanDong populations is limited to the Ryukyu islands and is not observed in the Kofun population in Hondo. That is, in a qpAdm analysis, the Kofun could not be modeled as carrying any ShanDong-related ancestry, and a working three-ancestry model shows Jōmon ancestry, northern inland East Asian related to YR populations, and an ambiguous northern East Asian-related ancestry (Supplementary Table S5).

We also observed differences in genetic linkage between the Nagabaka and Kofun populations and ShanDong populations in an f4-test. We observed that most ShanDong populations, particularly the 6000 BP BQGroup, tends to share more alleles with the historical Nagabaka group than other northern East Asians, i.e. many f4(WLR/YR/AR, 6000–1500 BP ShanDong populations; Nagabaka_historic, Mbuti) < 0 (Z < −2.5, Fig. 4, Supplementary Table S10), while most f4 (WLR/YR, 6000–1500 BP ShanDong populations; Japan_Kofun, Mbuti)~0 (|Z|<3, Fig. 4, Supplementary Table S10).

**Fig. 4: F4-statistics depicting the relationship of the BQGroup from ShanDong to ancient mainland East Asians and post-Yayoi populations from the Japanese archipelago.**

To further analyze the northern coastal ancestry in the historical Nagabaka population and connect it to ShanDong populations, we designed a simulation test to use with the f4-analysis^44,45. We found two major patterns: (1) First, the simulation analysis further confirms that there is an additional genetic connection between the historical Nagabaka and ShanDong populations. We tested f4 (X, Nagabaka_historic; SDEN/Xiaojingshan/BQGroup, Mbuti), where X was a simulated population ((1-x)% Jomon+x% LongShan ShanDong populations (YJCGroup, CZYGroup), Supplementary Fig. S8). In all two sets of simulation tests (ShanDong populations = SDEN/Xiaojingshan), when x% is within the range of the proportion of the LongShan ShanDong population (~75%) calculated using the rotational qpAdm strategy, the value of the f4-tests approximate zero (between the blue lines, Supplementary Fig. S8). This supports the rotational qpAdm mixture model for the historical Nagabaka population containing a northern coastal population component related to ShanDong populations. In the set of simulation tests (ShanDong population = BQGroup), when x% is within the range of ~87% of the proportion of the LongShan ShanDong population calculated by qpAdm (~0.87 * 75%, between the yellow lines), the value of f4 is approximately zero. This is because the BQGroup population is a mixture of ~87% ShanDong-related ancestry and ~13% YR-related ancestry. These patterns support that there were additional genetic connections between ShanDong populations and the historical Nagabaka population beyond general northern East Asian connections. (2) In order to test for the additional contribution of a northern coastal component related to ShanDong populations in the historical Nagabaka population compared to the Kofun population, we tested f4 (X, Nagabaka_historical; Kofun, Mbuti), where X was a simulated population ((1-x)% Kofun+x% ShanDong populations, sample size = 30, Supplementary Fig. S9). In all four sets of simulation tests, f4 values gradually decreased in all four groups as the different ShanDong components in population X increased, suggesting that the historical Nagabaka population does share additional genetic connections with the ShanDong population compared to the Kofun population. In addition, because it was not possible to model the Nagabaka population using the Kofun population as the ancestral source in the rotational qpAdm analysis, the values of f4-tests approximately equal to 0 was not observed.

Here, since there is no component related to ShanDong populations in the Nagabaka population of 2800 BP (in both ADMIXTURE and f4 results, Fig. 2C and Fig. 4), we have inferred that admixture related to northern coastal ancestry into the Nagabaka population from the Ryukyu Islands happened at least after 2800 BP. We next estimated the time of admixture integrating ShanDong ancestry into the historical Nagabaka population using DATES⁴⁶. A consistent admixture time of ~102-43 generations ago is obtained using the ShanDong populations (CZYGroup, LJZGroup, XCGroup, HLGroup) to represent northern coastal populations and the West Liao River populations (WLR_LN, WLR_BA) to represent northern inland populations as mainland East Asian sources, and the Jōmon as another source (Supplementary Table S7). The most likely timing of the admixture is estimated to be 1600–1400 BP assuming one generation is 28 years (LJZGroup is the best fit for the ancestral source, with the smallest nrmsd = 0.180, mean = 43.3, Supplementary Table S7). This can potentially be linked to population interactions between the Sui Dynasty (around 1400 BP) and the Ryukyu Islands populations that are known to have occurred according to historical documents (e.g. “Sui Shu”, “Chuzan-sefu” and “Chuzan-sekan”)⁴⁷; the specific historical events associated needs further support from archaeological study.

Discussion

Through sampling of ancient individuals from the northern coastal region of ShanDong in East Asia, we reconstructed fine-scale population dynamics from the ShanDong region over the past 9000 years, allowing us to answer several long-standing questions on not only population interaction and change during formative cultural periods in northern East Asia, but also the source of mainland East Asian ancestry into the Japanese archipelago.

First, we reconstructed the population history of mainland East Asians, focusing particularly on interactions across major cultural periods associated with the Neolithic. We found that before the emergence of the coastal DaWenKou culture, by at least 7700 BP, some ShanDong populations were influenced by populations from further north and south, about 3000 years earlier than that estimated in previous studies⁴⁸. Later, with the establishment of two major Neolithic cultures in East Asia, the YangShao and DaWenKou cultures, we observed gene flow related to inland YangShao populations from the Yellow River region into the coastal DaWenKou populations from the ShanDong region, a pattern consistent with cultural interactions observed in the archaeological record^{16,17,18,20,49,50}. We observed different interaction patterns during three major cultural periods since 6000 BP. First, we observed admixture from Yellow River-related populations to ShanDong populations during the DaWenKou cultural period, likely associated with the expansion of the YangShao culture during 6000–4600 BP^17,20,21. Second, we observed little to no gene flow from external regions into the ShanDong region from 4600–4000 BP, when both the Yellow River and ShanDong regions experienced similar cultural changes that led to the LongShan cultural period^20,21,51. This pattern suggests that during this time period, within-region population continuity was predominant in the ShanDong region. Finally, in the early dynastic period after 3500 BP, we observed a second wave of gene flow from Yellow River-related populations to some ShanDong populations, potentially associated with increased trade and conflict between the Shang Dynasty and Dongyi populations, which was shown in the historical record to have been driven by demand for sea salt^22,23,25,52. During the dynastic period, the establishment of socioeconomic structure may have contributed to a second wave of Yellow River-related ancestry into the ShanDong region^53,54. The different patterns of gene flow that occurred during the DaWenKou, LongShan, and early dynastic cultural periods show the history of how the genetic structure of the ShanDong populations was formed between 6000 and 1500 BP.

Further studies have shown that post-Yayoi populations from the Japanese archipelago (e.g. Nagabaka_2800 BP, Kofun, and Nagabaka_historic) derive ancestry from at least three sources: Jōmon hunter-gatherers, a northern East Asian ancestry likely associated with Yayoi migrants, and a mainland East Asian ancestry that entered Japan after the Yayoi period associated with the present-day Han^6,7,9,10. However, the provenance of the mainland East Asian ancestry and the timing of the related admixture was not known. Here, we identified the previously unknown East Asian ancestry associated with the Han as a coastal East Asian ancestry that was also found in ancient ShanDong populations (e.g. CZYGroup, YJCGroup, and LJZGroup), and we estimated that this ancestry was introduced through admixture ~1600 to 1400 BP using a DATES analysis. Interestingly, this model can only explain the unknown genetic component in Ryukyu islanders, and the mainland East Asian ancestry found in Hondo Japanese populations remains unclear. Therefore, while the genetics of recent populations from Japan fits a model of three ancestries related to the Jōmon, northern inland East Asians (analogous to the northern East Asian ancestry associated with the Yayoi previously proposed), and northern coastal East Asian ancestry (analogous to mainland East Asian ancestry associated with the post-Yayoi previously proposed), northern coastal East Asian ancestry can be further differentiated within different populations of the Japanese archipelago. This observation also fits the population structure previously observed in present-day Japanese¹⁰, and highlights the complex population history within different regions of Japan.

Methods

Ethics and inclusion statement

Permission to test for ancient DNA in the human specimens from this study was obtained through discussions with local archaeologists who excavated them, with final approval granted by the institutes in Shandong where they are managed and cared for, the Shandong Provincial Institute of Cultural Relics and Archaeology and Shandong University. Additional oversight and approval were obtained from the Institutional Review Board at the Institute of Vertebrate Paleontology and Paleoanthropology of the Chinese Academy of Sciences to sample the genomes of the ancient humans included in this study (202310250014). Protocols used to sample the genomes follow the highest standards used in archaeogenomic research. The work was done in collaboration with several local archaeologists, who were included as co-authors for their contributions to collation of archaeological material, dating of specimens, and/or discussions that contributed to the connections made to archaeological research cited in this study.

Ancient DNA extraction, sequencing, and data processing

For ancient DNA extraction, we primarily selected temporal bone fragments and teeth from human skeletal remains from ancient sites in the ShanDong region and drilled for bone powder. For each specimen, about 100 mg of bone or tooth powder was extracted. In order to avoid inter-sample contamination, use of a disposable drill bit for each specimen was strictly followed during the sampling process. For temporal bone samples, two drilling methods were employed: when we could isolate the temporal bone, we drilled a small hole on the inner side of the temporal bone to obtain bone powder⁵⁵. When we could not isolate the temporal bone from the intact cranial bone, we drilled from the bottom of the cranial bone⁵⁶, in order to protect the recognizable morphological features on the surface of the cranial bone.

For DNA library construction, single-stranded DNA libraries (SS) were constructed^57,58 for samples from the GangShang and XiChen sites, and these libraries were not subjected to uracil-DNA glycosylase treatment (non-UDG) (Supplementary Data 1). For samples from the other nine sites, double-stranded DNA libraries (DS) were constructed^58,59, and partial uracil-DNA glycosylase treatment was used (half-UDG⁶⁰). Amplification of DNA libraries was carried out by the AccuPrimepfx DNA enzyme in a polymerase chain reaction (PCR), and libraries were amplified for 35 cycles. The amplification process involved 35 cycles to ensure that enough ancient DNA was available for capture, followed by the addition of P5 and P7 primers to specific libraries. A NanoDrop2000 spectrometer was used to measure the amount of DNA extracted from each sample⁴.

Sequencing and reads alignment

Oligonucleotide probes designed for ancient nuclear whole genome SNP capture was used, which focused on ~1,240,000 SNPs (1240 K SNP array^61,62,63) (Supplementary Data 1). The enriched captured DNA fragments were sequenced on Illumina Hiseq2500 and HiSeq X platforms, generating end-paired fragments of 2× 100 bp and 2× 150 bp in length. Primer fragments were removed from the original sequences using the leeHom software⁶⁴, and forward and reverse sequences with at least 11 base pairs of overlap were screened and merged into a single sequence. BWA software⁶⁵ aligned the merged sequence with the hg19 human reference genome with parameters set to “-n 0.01 -l 16500”. According to criterion that the mapping quality of the comparison should be greater than or equal to 30, the fragments that did not meet the criterion were filtered. Duplicated sequence fragments, i.e. fragments with the same sequence orientation and same start and end positions, were excluded, where the fragment with the highest quality was retained for further processing.

Test for contamination and genotyping

The C-T substitution rate of each individual terminal nucleotide was calculated. A relatively high C-T substitution rate at the terminal nucleotide is characteristic of ancient DNA⁶⁶, suggesting that the sequence read represents genetic material from the ancient human sampled. The mitochondrial contamination rate of each individual was assessed by comparing the sequenced fragments with the mitochondrial genomes of 311 present-day humans from around the world using ContamMix software²⁶. For one male individual where mtDNA was not captured, we used an X chromosome contamination test²⁷. Libraries with estimated mitochondrial contamination levels greater than 5.0% were reprocessed to retain only damaged fragments containing patterns typical of ancient DNA, i.e. they exhibit damage patterns not found in modern DNA²⁸. Damaged fragments were obtained by filtering out fragments containing at least one C-T substitution in the first three positions of the 5’ end and in the last three positions of the 3’ end using pmdtools0.60 and the “--customterminus” parameter²⁹, and the individuals corresponding to these damage-restricted libraries were labeled with “_d” for subsequent analyses (Supplementary Table S1). For SNP loci that have reads covered at least once in each individual, a random read was selected to determine the allele for that individual⁶¹, leading to pseudohaploid genome-wide data for downstream population genetic analyses.

Principal components analysis, Umap dimensionality reduction, t-sne dimensionality reduction

Principal Component Analysis (PCA) was performed using the smartpca program from the EIGENSOFT package³⁰, in which we used published present-day humans (34 present-day populations from the HO project⁴³, and 17 Tibetan and Han populations differentiated according to region in their published studies⁶⁷) to determine the principal components (PC1 = 5.5%, PC2 = 3.6%, Supplementary Fig. S1). We then projected ancient ShanDong individuals sampled in this study, as well as previously published ancient individuals^{3,4,6,7,8,41,42} (Fig. 1C).

We then assessed PC1 through PC10, collapsing the data through new eigenvalues onto a two-dimensional plane using Umap³¹ and t-sne³². Compared to PCA, Umap and t-sne can visualize all 10 PCs on a two-dimensional plane, where Umap (Supplementary Fig. S2) focuses more on global structure, and t-sne (Supplementary Fig. S3) focuses more on local structure.

F3- and f4-analyses

To determine genetic relationships among East Asian populations, the outgroup-f3 and f4 analyses found in the the software package Admixtools were used⁶⁸. Raghavan et al. first proposed the outgroup f3-analysis³³, which uses an f-statistic of the form f3(Outgroup; X, Y), where the Outgroup is an outgroup population to X and Y. We used the modern Central African population Mbuti as the Outgroup, and ancient East Asians from this and previously published studies as X and Y populations. In practice, we used the qp3Pop software from the AdmixTools⁴³ package and plotted heatmaps using the matplotlib package for Python 3.7 (Fig. 2A). X and Y populations that share a high f3 value show high genetic similarity between these two populations. Similarity due to shared ancestry versus admixture can be further differentiated using the f4 analysis.

We used the qpDstat software in the AdmixTools⁴³ package to perform f4 analyses and evaluate the relative degree of allele sharing between ancient individuals in East Asia. The f4 statistic takes the form f4(P1, P2; P3, P4), where P4 is generally fixed as an outgroup to P1, P2, and P3. We used the Central African population Mbuti as P4, which is outgroup to East Asian populations. In an f4 analysis, f4 > 0 (Z > 2.5 or more strictly Z > 3) indicates that the number of alleles shared between the P1 and P3 populations is greater than the number of alleles shared between the P2 and P3 populations. f4 < 0 (Z < −2.5 or more strictly Z < −3) indicates that the number of alleles shared between the P1 and P3 populations is less than the number of alleles shared between the P2 and P3 populations, and f4 ~ 0 (Z <| 2.5| or more strictly Z <|3|) indicates that the number of alleles shared by P3 with P1 and P2 is approximately equal.

Kinship analyses

In order to exclude the influence of kinship on population genetics analysis, READ (Relationship Estimation from Ancient DNA) software was used to analyze the kinship between human individuals from ancient sites in ShanDong⁶⁹. READ software was specially developed for use with ancient DNA, as the low content of endogenous DNA, fragmentation, terminal damage and other characteristics of aDNA can make estimating kinship difficult. The principle is (1) to divide the genome into non-overlapping windows of 1 Mbp; (2) calculate the proportion of mismatched alleles (P0) for each window for each pair of individuals; (3) randomly select the expected value of a pair of unrelated individuals in the same population to normalize P0; and finally, (4) classify the kinship between samples according to the threshold value. After processing through the READ analysis, each pair of individuals may be categorized into one of the following four types of kinship: (1) identical individuals/identical twins; (2) first-generation kinship: parents and children, siblings; (3) second-generation kinship: maternal/grandparents and grandchildren, aunts/uncles and nieces/nephews, half-siblings; and (4) unrelated individuals: the distance of kinship is greater than the second-generation range.

Because individuals within two generations of kinship share similar genetic characteristics that can bias population genetics analyses that assume independence of data, we filtered the newly sampled individuals to exclude related individuals in population genetic analyses. That is, for any kinship groups, we retained the individual with the highest data quality. We ultimately excluded 15 individuals (BeiQian = 13, TongLin = 1, GangShang = 1) from downstream population genetics analyses and marked the excluded individuals with the suffix “_k” (Supplementary Table S2).

Grouping analysis based on Pairwise D method

To group individuals within sites, we focused on comparing differences between individuals within the same site using the Pairwise D method. Pairwise D entails using the functionality of the f4 analysis in the AdmixTools software package⁴³, with the formula D(ind1, ind2; Pop, Mbuti), where ind1 and ind2 are two different individuals from the same site, and Pop is a published ancient individual or present-day population. A higher number of D-statistics where |Z| > 3 indicates genetic differences between ind1 and ind2 that suggest ind1 and ind2 may not share enough genetic similarities to be grouped together. In this study, 107 representative ancient^{40,42,70,71,72,73} and present-day^43,67 populations were rotated into the Pop position, and the number of D-statistics for each (ind1, ind2) pairing where |Z| > 3 was determined. If there were greater than five D-statistics where |Z| > 3, then the ancient individuals were divided into subgroups or outliers as appropriate (Supplementary Fig. S3).

From the grouping analysis using Pairwise D, BQ4628 and BQ4610 were classified as outliers at the BeiQian site, YJC4658 was classified as an outlier at the YinJiaCheng site, XZ3470 was classified as an outlier at the XinZhi site, and HL4788 was classified as an outlier at the HouLi site. Outliers were labeled with the suffix “_o”, and the remaining individuals from that site were grouped together for downstream population genetic analysis (BQGroup of BeiQian site, GSGroup of GangShang site, FJGroup of FuJia site, YJCGroup of YinJiaCheng site, CZYGroup of ChengZiYa site, HLGroup of HouLi site, LJZGroup of LiuJiaZhuang site, XCGroup of XiChen site, XZGroup of XinZhi site, YXGroup of YiXi site, TLGroup of TongLin site).

Phylogeny modeling with Treemix

Treemix v1.13³⁴ was used to determine the phylogenetic relationships of various ancient East Asians^{3,4,6,7,41,42,63}, allowing for admixture events. We rooted the tree using the Central African Mbuti (with the option “-root Mbuti”) and used blocks of 500 SNPs at a time (with the option “–k 500”). We ran 1000 replicates for each tree, adding the options “-bootstrap -q”. The 1000 bootstrap trees were assessed in Phylip v3.695 using the “consense” program. With that, we could assess the robustness of each clade in the tree (Supplementary Table S3). Results for m = 0 to m = 6, and a heatmap of the residuals were determined (Supplementary Figs. S4 and S5), and the tree for m = 3 is visualized in Fig. 2B.

Admixture analysis

We applied the program ADMIXTURE³⁵ to compute stratified components in different East Asian populations based on its likelihood model with a block relaxation algorithm to estimate individual ancestry and cross-validate the estimated population structure. We used PLINK v1.90b3.40⁷⁴ to prune the dataset to minimize linkage disequilibrium, with the parameter “--indep-pairwise 200 25 0.4”. We included present-day and ancient populations used in the PCA analysis. Twenty replicates for each of K = 2 to K = 9 were performed, using different random seeds. The lowest CV was for K = 2 (Supplementary Fig. S6), with similarly low CVs for K = 3 to K = 5 (0.4452–0.4471). By comparing the results from K = 2 to K = 5 (Supplementary Fig. S7), we found that the first separation of components is between North and South East Asians (K = 2), the second separation distinguishes continental and island populations amongst northern East Asians (K = 3), the third separation distinguishes the Amur River populations (and populations further north) from Yellow River populations (K = 4), and the final separation distinguishes Yellow River (inland) and ShanDong (coastal) populations (K = 5). These separations across K = 2 to K = 5 mirror the population relationships observed in the Treemix analysis (Fig. 2B). We visualized the results for K = 5 in Fig. 2C.

Admixture modeling with qpAdm

To model ancestry proportions for any target population, we used qpAdm⁶² in AdmixTools with the parameter “allsnps”: default”. We utilized python scripts to implement a rotational strategy to examine the potential ancestral origins of the target population in one-, two-, and three-way mixing scenarios. In the rotational strategy, a standard outgroup “Yamnaya_Samara”⁷⁵ was added to the “right population”. The possible ancestral source populations are categorized into two groups, “rotating” and “no rotating” (Supplementary Table S4): (1) Populations in the “rotating” group who are not used as a source in the “left population” are incorporated into the “right population”. The identification of a set of best-fitting mixture models is a major advantage of a rotational qpAdm analysis. Compared to qpAdm without rotation, in a rotational qpAdm analysis, each potential source population is sequentially included as the left_population, where all unused potential sources in that qpAdm analysis are included in the right_population. Finding successful mixture models for one or a few potential sources highlights the optimal sources relative to the other potential sources (where Tail_prob >0.05, each ancestral mixture proportion >standard error, pnest <0.05). Using this exhaustive strategy, combined with a smaller number of source populations, tends to reveal optimal combinations of sources, because the source populations can only be identified when they outperform the rest of the tested potential source populations. Therefore, by this method, we can find the most suitable combination of source populations among all combinations of source populations. (2) Populations in the “non-rotating” group who are not used as a source for the “left population” are not included in the “right population” to avoid situations where the ‘right population’ contains groups that are younger in age than the target population⁷⁶. See Supplementary Table S4 for details.

The admixture proportions calculated using qpAdm were stratified using the age of the sources, i.e. ShanDong_EN was used as an ancestral source for ShanDong_DWK, ShanDong_DWK was used as an ancestral source for ShanDong_LS, and ShanDong_LS was used as an ancestral source for ShanDong_CD. Thus, to calculate the proportion of the YR component in ShanDong_LS populations, we weighted the ShanDong_EN and YR components based on the ShanDong_DWK proportion observed in each ShanDong LS population. We used the same method for the ShanDong_CD populations, using the proportions for the ShanDong_LS populations. The results of this re-estimate of proportions was used to visualize the changes in the proportion of YR components in ShanDong populations over the past 6000 years (Fig. 3B).

Estimating admixture time of ancestral source components with DATES

The timing of admixture events among populations of interest in East Asia was estimated using DATES v4010⁴⁶ (https://github.com/MoorjaniLab/DATES_v4010). The genetic distance was set to 0.45 cM using “lovafit: 0.45”, and the maximum genetic distance was set to Morgan’s maximum using “maxdis: 1” to ensure that it was larger than the confounding LD block. The recommended optimal subgroup size of 0.001 molecules was used (“binsize: 0.001”). Standard errors were estimated by a weighted block jackknife method with the parameter ‘jackknife: Yes’. We considered all results with NRMSD < 0.7, Z > 2, and generations < 200, and assumed that each “generation” corresponds to 28 years⁷⁷ in order to convert generations to years (Supplementary Table S5).

F4-test based on the simulation method

We included a simulation to test the possible connections between the tested population (Population A) and the possible ancestral components in related populations (Population C), which leverages the linear relation between the proportion of arbitrary components in the simulated population (Population B) and the value of f4. The line where a series of f4 values are located will pass through the zero point when the ratio of the two components is consistent with that of the population being tested^44,45.

Specifically, in f4 (A, B; C, O), (1) Population A contains two population components, i% x and j% y, where i + j = 1 and is a fixed constant; (2) Population B is a series of populations generated by the simulation method and consists of a%x + b%y with a + b = 1; Population C is a population with 100% x component; and (4) O is an outgroup.

We know f4 (A, B; C, O) = (p_A − p_B) × (p_C − p_O), where p_X denotes the frequency of a given allele in population X. So, p_A = i × p_x + j × p_y; p_B = a × p_x + × p_y; p_c = p_x; p_O = 0.

Then, f4(A, B; C, O) = (i − a) × p_x² + (a − i) × p_xp_y is a linear equation with respect to the variable a.

Finally, f4(A, B; C, O) = (p_x × (i − a) + p_y × (j − b)) × p_x = 0, only if all 3 of the following conditions are met at the same time:

$${{{\rm{i}}}}+{{{\rm{j}}}}=1$$

(1)

$${{{\rm{a}}}}+{{{\rm{b}}}}=1$$

(2)

$${{{\rm{a}}}}={{{\rm{i}}}}\left({{{\rm{j}}}}-{{{\rm{b}}}}=\right.\left(1-{{{\rm{i}}}}\right)-\left(1-{{{\rm{a}}}}\right)=\left.{{{\rm{a}}}}-{{{\rm{i}}}}\right)$$

(3)

The conditions that need to be satisfied simultaneously in more complex populations (A = i%x + j%y + k%z + …) can be further derived from the following equations:

$${{{\rm{i}}}}+{{{\rm{j}}}}+{{{\rm{k}}}}+\ldots=1$$

(4)

$${{{\rm{a}}}}+{{{\rm{b}}}}+{{{\rm{c}}}}+\ldots=1$$

(5)

$${{{\rm{a}}}}={{{\rm{i}}}},{{{\rm{b}}}}={{{\rm{j}}}},{{{\rm{c}}}}={{{\rm{k}}}},\ldots$$

(6)

The simulation can be used to demonstrate whether the population components in the tested population A contain only the population components calculated by qpAdm, and to verify if the corresponding component proportions are the same as the proportions of each component calculated by qpAdm when the value of the linear distribution for f4 crosses the zero point.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The raw sequence reads and aligned BAM files generated in this study have been deposited in the in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2021) in the National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences^78,79 under accession code HRA008755 (BioProject accession number: PRJCA026441) (https://ngdc.cncb.ac.cn/gsa-human/browse/HRA008755). The pseudo-diploid genotype calls (Eigenstrat format) generated in this study have been deposited in OMIX, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences^78,79 under accession code OMIX007544 (BioProject accession number: PRJCA026441) (https://ngdc.cncb.ac.cn/omix/release/OMIX007544). All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.

References

Hung, H.-C. Early Maritime Navigation and Cultures in Coastal Southern China, Taiwan, and Island Southeast Asia, 6000–500 BCE. In: The Cambridge History of the Pacific Ocean: Volume 1: The Pacific Ocean to 1800 (eds Jones, R. T. & Matsuda, M. K.) (Cambridge University Press, 2023).
Hugh, M. et al. The prehistoric peopling of Southeast Asia. Science 361, 88–92 (2018).
Article ADS MATH Google Scholar
Wang, T. et al. Human population history at the crossroads of East and Southeast Asia since 11,000 years ago. Cell 184, 3829–3841.e3821 (2021).
Article CAS PubMed MATH Google Scholar
Yang, M. A. et al. Ancient DNA indicates human population shifts and admixture in northern and southern China. Science 369, 282–288 (2020).
Article ADS CAS PubMed MATH Google Scholar
Liu, Y. et al. Maternal genetic history of southern East Asians over the past 12,000 years. J. Genet Genomics 48, 899–907 (2021).
Article CAS PubMed MATH Google Scholar
Robbeets, M. et al. Triangulation supports agricultural spread of the Transeurasian languages. Nature 599, 616–621 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Cooke, N. P. et al. Ancient genomics reveals tripartite origins of Japanese populations. Sci. Adv. 7, eabh2419 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Wang, C. C. et al. Genomic insights into the formation of human populations in East Asia. Nature 591, 413–419 (2021).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Cooke, N. P. et al. Genomic insights into a tripartite ancestry in the Southern Ryukyu Islands. Evol. Hum. Sci. 5, e23 (2023).
Article PubMed PubMed Central MATH Google Scholar
Liu, X. et al. Decoding triancestral origins, archaic introgression, and natural selection in the Japanese population by whole-genome sequencing. Sci. Adv. 10, eadi8419 (2024).
Article CAS PubMed PubMed Central Google Scholar
Okada, Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).
Article ADS PubMed PubMed Central MATH Google Scholar
Sakaue, S. et al. Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction. Nat. Commun. 11, 1569 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Yamaguchi-Kabata, Y. et al. Japanese population structure, based on SNP genotypes from 7003 individuals compared to other ethnic groups: effects on population-based association studies. Am. J. Hum. Genet 83, 445–456 (2008).
Article CAS PubMed PubMed Central Google Scholar
Du, J. An experimental study of the Yingshui type of the Dawenkou culture (in Chinese). Archaeology 157–169, 181 (1992).
MATH Google Scholar
Jin, S. The patterns, pathways and historical background of prehistoric cultural exchanges between Hailuo and Haidai Areas (in Chinese). Acad. J. Zhongzhou 170–175 (2010).
Luan, F. Dongyi Archaeology (in Chinese). (Shandong University Press, 1996).
Luan, F. An experimental study of the relationship between the East and the Central Plains in the YangShao Era (in Chinese). Archaeology 45–58 (1996).
Wu, J. A brief discussion on the Dawenkou culture discovered in Henan Province (in Chinese). Archaeology 261–265 (1981).
Xu, Y. A preliminary study of the phenomenon of cultural migration around 5,000 years ago (in Chinese). Archaeological J. 38, (2010).
Zhang, C. LongShanization, the LongShan Period, and the LongShan Era - Re-reading LongShan Culture and the LongShan Era. Cultural Relics in Southern China. (in Chinese) 62–69 (2021).
Yan, W. & Long, S. Culture and LongShan Period. Chin. Cultural Relics, 41–48 (in Chinese) (1981).
Wang, X. An Experimental Study of the Archaeological Culture of the Oriental Region during the Xia and Shang Dynasties. J. Peking Univ. 57–70 (in Chinese) (1989).
He, Y. Research on “Foreign Cultural Factors” in Yin Ruins. Cultural Relics of Central China, 33-49+128 (in Chinese) (2020).
Li, X. in Xia, Shang and Zhou and ShanDong S. J. (in Chinese) 332–337 (Yantai Univ., 2002).
Li, L. Cultural integration between merchants and Eastern barbarians in the late early Shang to Yinxu periods (in Chinese). J. Zhengzhou Inst. Aeronautical Ind. Manag. 28, 46–49 (2009).
CAS Google Scholar
Fu, Q. et al. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl Acad. Sci. USA 110, 2223–2227 (2013).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinforma. 15, 356 (2014).
Article MATH Google Scholar
Briggs, A. W. et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl Acad. Sci. USA 104, 14616–14621 (2007).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Skoglund, P. et al. Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc. Natl Acad. Sci. USA 111, 2229–2234 (2014).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet 2, e190 (2006).
Article PubMed PubMed Central MATH Google Scholar
Mcinnes, L. & Healy, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. J. Open Source Softw. 3, 861 (2018).
Article MATH Google Scholar
Maaten, L. J. P. V. D. & Hinton, G. E. Visualizing High-Dimensional Data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
MATH Google Scholar
Raghavan, M. et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91 (2014).
Article ADS PubMed MATH Google Scholar
Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet 8, e1002967 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central MATH Google Scholar
Wang, F. et al. Briefing on the 2007 excavations at Beiqian Site, Jimo City, ShanDong, China (in Chinese). Archaeology 3–23, 197–100 (2011). 113.
Google Scholar
Fan, Y. A Trial Analysis of the Health Condition of the Dawenkou Culture Period Residents at the Beiqian Site (in Chinese). (Shandong Univ., 2013).
Wu, R. Research on the Biotechnology Economy of the Dawenkou Culture (in Chinese). (Shandong Univ., 2018).
Zhu, C. et al. Dawenkou culture burial ground, Gangshang Site South, Tengzhou City, ShanDong, China (in Chinese). Archaeology 62–64, 165–181 (2023). 128122.
Google Scholar
Fu, Q. et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Ning, C. et al. Ancient genomes from northern China suggest links between subsistence changes and human migration. Nat. Commun. 11, 2700 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Mao, X. et al. The deep population history of northern East Asia from the Late Pleistocene to the Holocene. Cell 184, 3256–3266.e3213 (2021).
Article CAS PubMed MATH Google Scholar
Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
Article PubMed PubMed Central MATH Google Scholar
Bai, F. et al. Ancient genomes revealed the complex human interactions of the ancient western Tibetans. Curr. Biol. 34, 2594–2605.e2597 (2024).
Article CAS PubMed MATH Google Scholar
van de Loosdrecht, M. et al. Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations. Science 360, 548–552 (2018).
Article ADS PubMed Google Scholar
Chintalapati, M., Patterson, N. & Moorjani, P. The spatiotemporal patterns of major human admixture events during the European Holocene. Elife 11, e77625 (2022).
Li, X. Ryukyu: Ryukyu Kingdom, China, and Japan. Pac. J. 18, 56–64 (2010).
MATH Google Scholar
Liu, J. et al. Maternal genetic structure in ancient Shandong between 9500 and 1800 years ago. Sci. Bull. 66, 1129–1135 (2021).
Article CAS MATH Google Scholar
Zhang, X. Research on the Dawenkou Culture (in Chinese). (Jilin Univ., 2015).
Zhao, Y. et al. Westward migration of the inhabitants of the Dawenkou culture from human bone materials (in Chinese). Southeast Culture 56–65 (2019).
Liang, S. LongShan Culture - One of the Prehistoric Periods of Chinese Civilization. Archaeol. J., 5–14 (in Chinese) (1954).
Zhang, K. Archaeological Study of the Eastern Barbarian Culture (in Chinese). (Chinese Academy of Social Sciences (CASS), 2010).
Yan, W. On the Copper and Stone Age in China (in Chinese). Prehistory 36–44+, 35 (1984).
MATH Google Scholar
Yan, W. A brief discussion of the origins of Chinese Civilization (in Chinese). Chin. Cultural Relics 40–49+, 25 (1992).
MATH Google Scholar
Pinhasi, R. et al. Optimal Ancient DNA Yields from the Inner Ear Part of the Human Petrous Bone. PLoS One 10, e0129102 (2015).
Article PubMed PubMed Central MATH Google Scholar
Pinhasi, R., Fernandes, D. M., Sirak, K. & Cheronet, O. Isolating the human cochlea to generate bone powder for ancient DNA analysis. Nat. Protoc. 14, 1194–1205 (2019).
Article CAS PubMed Google Scholar
Kircher, M., Sawyer, S. & Meyer, M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3 (2012).
Article CAS PubMed MATH Google Scholar
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Gansauge, M. T. & Meyer, M. Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protoc. 8, 737–748 (2013).
Article PubMed MATH Google Scholar
Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. & Reich, D. Partial uracil-DNA-glycosylase treatment for screening of ancient DNA. Philos. Trans. R. Soc. Lond. B Biol. Sci. 370, 20130624 (2015).
Article PubMed PubMed Central MATH Google Scholar
Fu, Q. et al. An early modern human from Romania with a recent Neanderthal ancestor. Nature 524, 216–219 (2015).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207–211 (2015).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Yang, M. A. et al. 40,000-Year-Old Individual from Asia Provides Insight into Early Population Structure in Eurasia. Curr. Biol. 27, 3202–3208.e3209 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar
Renaud, G., Stenzel, U. & Kelso, J. leeHom: adaptor trimming and merging for Illumina sequencing reads. Nucleic Acids Res. 42, e141 (2014).
Article PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central MATH Google Scholar
Sawyer, S., Krause, J., Guschanski, K., Savolainen, V. & Pääbo, S. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS One 7, e34131 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Lu, D. et al. Ancestral Origins and Genetic History of Tibetan Highlanders. Am. J. Hum. Genet 99, 580–594 (2016).
Article CAS PubMed PubMed Central MATH Google Scholar
Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 2239–2252 (2011).
Article CAS PubMed PubMed Central Google Scholar
Monroy Kuhn, J. M., Jakobsson, M. & Günther, T. Estimating genetic kin relationships in prehistoric populations. PLoS One 13, e0195491 (2018).
Article PubMed PubMed Central Google Scholar
Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).
Article ADS CAS PubMed MATH Google Scholar
de Barros Damgaard, P. et al. The first horse herders and the impact of early Bronze Age steppe expansions into Asia. Science 360, eaar7711 (2018).
Mondal, M. et al. Genomic analysis of Andamanese provides insights into ancient human migration into Asia and adaptation. Nat. Genet. 48, 1066–1070 (2016).
Article CAS PubMed MATH Google Scholar
Yu, H. et al. Paleolithic to Bronze Age Siberians Reveal Connections with First Americans and across Eurasia. Cell 181, 1232–1245.e1220 (2020).
Article CAS PubMed MATH Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central MATH Google Scholar
Patterson, N. et al. Large-scale migration into Britain during the Middle to Late Bronze Age. Nature 601, 588–594 (2022).
Article ADS CAS PubMed MATH Google Scholar
Yüncü, E. et al. False discovery rates of qpAdm-based screens for genetic admixture. bioRxiv, https://www.biorxiv.org/content/10.1101/2023.04.25.538339v1 (2023).
Moorjani, P. et al. A genetic method for dating ancient genomes provides a direct estimate of human generation interval in the last 45,000 years. Proc. Natl Acad. Sci. USA 113, 5652–5657 (2016).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2024. Nucleic Acids Res. 52, D18–d32 (2024).
Chen, T. et al. The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genomics Proteomics Bioinformatics, 19, 578–583 (2021).

Download references

Acknowledgements

We acknowledge archaeological teams from ShanDong for valuable support. This work was supported by the National Natural Science Foundation of China (41925009), Chinese Academy of Sciences (CAS) (YSBR-019), and Archaeological Talent Promotion Program of China (2024-278). Y.L. is supported by Chinese Academy of Sciences (CAS) (2023000065).

Author information

These authors contributed equally: Juncen Liu, Yichen Liu, Yongsheng Zhao, Chao Zhu, Tianyi Wang.

Authors and Affiliations

Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, Beijing, China
Juncen Liu, Yichen Liu, Tianyi Wang, Xiaotian Feng, Peng Cao, Feng Liu, Qingyan Dai, Ruowei Yang, Weihong Hou, Wanjing Ping, Fan Bai, Bo Miao, Wenjun Wang & Qiaomei Fu
College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing, China
Juncen Liu, Tianyi Wang, Fan Bai & Qiaomei Fu
Institute of Cultural Heritage, Shandong University, Qingdao, China
Yongsheng Zhao & Wen Zeng
Shandong Provincial Institute of Cultural Relics and Archaeology, Jinan, China
Chao Zhu, Bo Sun, Hui Han, Zhenguang Li, Zimeng Wang & Chengmin Wei
School of Archaeology, Shandong University, Jinan, China
Fen Wang & Fengshi Luan
Jinan Municipal Institute of Archaeology, Jinan, China
Junfeng Guo
Department of History, Shanghai University, Shanghai, China
Qiaowei Wei
College of Life Sciences, Northwest University, Xi’an, China
Bo Miao
Science and Technology Archaeology, National Centre for Archaeology, Beijing, China
Wenjun Wang
Department of Biology, University of Richmond, Richmond, VA, USA
Melinda A. Yang

Authors

Juncen Liu
View author publications
Search author on:PubMed Google Scholar
Yichen Liu
View author publications
Search author on:PubMed Google Scholar
Yongsheng Zhao
View author publications
Search author on:PubMed Google Scholar
Chao Zhu
View author publications
Search author on:PubMed Google Scholar
Tianyi Wang
View author publications
Search author on:PubMed Google Scholar
Wen Zeng
View author publications
Search author on:PubMed Google Scholar
Bo Sun
View author publications
Search author on:PubMed Google Scholar
Fen Wang
View author publications
Search author on:PubMed Google Scholar
Hui Han
View author publications
Search author on:PubMed Google Scholar
Zhenguang Li
View author publications
Search author on:PubMed Google Scholar
Xiaotian Feng
View author publications
Search author on:PubMed Google Scholar
Peng Cao
View author publications
Search author on:PubMed Google Scholar
Fengshi Luan
View author publications
Search author on:PubMed Google Scholar
Feng Liu
View author publications
Search author on:PubMed Google Scholar
Qingyan Dai
View author publications
Search author on:PubMed Google Scholar
Junfeng Guo
View author publications
Search author on:PubMed Google Scholar
Zimeng Wang
View author publications
Search author on:PubMed Google Scholar
Chengmin Wei
View author publications
Search author on:PubMed Google Scholar
Qiaowei Wei
View author publications
Search author on:PubMed Google Scholar
Ruowei Yang
View author publications
Search author on:PubMed Google Scholar
Weihong Hou
View author publications
Search author on:PubMed Google Scholar
Wanjing Ping
View author publications
Search author on:PubMed Google Scholar
Fan Bai
View author publications
Search author on:PubMed Google Scholar
Bo Miao
View author publications
Search author on:PubMed Google Scholar
Wenjun Wang
View author publications
Search author on:PubMed Google Scholar
Melinda A. Yang
View author publications
Search author on:PubMed Google Scholar
Qiaomei Fu
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, Q.F.; formal analysis, J.L., Q.F., and T.W.; resources, W.Z., B.S., Y.Z., F.W., H.H., Z.L., C.Z., X.F., P.C., F.L., F.L., Q.D., J.G., Z.W., C.W., Q.W., R.Y., W.H., F.B., B.M., and W.W.; writing – original draft, J.L., Q.F., Y.L., T.W., and M.A.Y.; writing – review & editing, J.L., Q.F., M.A.Y., Y.L., and W.P.; supervision, Q.F.

Corresponding authors

Correspondence to Melinda A. Yang or Qiaomei Fu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Ke Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Peer Review File (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data 1 (download XLSX )

Reporting Summary (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, J., Liu, Y., Zhao, Y. et al. East Asian Gene flow bridged by northern coastal populations over past 6000 years. Nat Commun 16, 1322 (2025). https://doi.org/10.1038/s41467-025-56555-w

Download citation

Received: 30 May 2024
Accepted: 17 January 2025
Published: 03 February 2025
Version of record: 03 February 2025
DOI: https://doi.org/10.1038/s41467-025-56555-w

This article is cited by

Ancient genomes give insight into 160,000 years of East Asian population dynamics and biological adaptation
- Guanglin He
- Yuntao Sun
- Mengge Wang
Genome Biology (2025)
Patrilineages of ethnolinguistically diverse populations reveal multifactorial influences on Chinese paternal population stratification
- Ting Yang
- Yunhui Liu
- Mengge Wang
BMC Biology (2025)