Main

Shimao, bordering the northern Loess Plateau and Ordos desert, is among the largest prehistoric settlements discovered in China. The stone-walled site encompasses roughly 4 km2 and can be divided into outer and inner enclosures, showing features typical of state-level societies: craft production, large fortifications and high social stratification with abundant forms of human sacrifice2,3,4,5,6. With a strict hierarchical polity and unique culture of human sacrifice—more than 80 human skulls were found buried under its East Gate2—Shimao can serve as an excellent illustration of the roles of family and ancestry in the structuring of political and social relationships in early state-level human societies. Two cemeteries were found at Shimao: one attributed to the ruling class at the city centre, Huangchengtai, and another to the elite class southwards at Hanjiagedan within the inner enclosure. The East Gate (Dongmen) in the outer enclosure contains mass burial pits of sacrificed victims. The burials at Shimao culture cemeteries are organized into four to five categories, corresponding to classes from high to low status residents7, which together feature a strict hierarchy within Shimao society8,9. The layout shows signs of urban planning and clear social stratification. Previous efforts to explain the emergence of hierarchically organized societies in East Asia analysed a range of dispersed archaeological sites10,11,12, or focused on the early dynasties such as Xia and Shang, or other large Late Neolithic settlements such as Taosi10 or Liangzhu13, which have been considered early forms of regional states14,15. The extensive archaeological record of this large urban settlement has further expanded our knowledge of early state-level communities. Two ideas have attempted to explain the origins of Shimao city and its diffused cultural aspects. One proposes that Shimao was the cosmopolitan centre of a vast trade network, yielding bronze knives reminiscent of those in eastern steppe cultures, jade blades and alligator skins, possibly from coastal northern East Asian or the Yangtze River cultures2,3,6, and pottery types similar to the Central Plains Longshan culture. Another interprets Shimao as a separate regional cultural centre, possibly originating locally, boasting some of the mural paintings and mouth harp in China. This view is based in part on differences in construction techniques and cultural assemblages and argues that later similarities in the region may have been due to Shimao’s growing influence10,16.

Extensive sampling and the large-scale recovery of DNA from many individuals and burial sites make it possible to rebuild large family trees, providing an unprecedented opportunity to describe past mating and burial practices of ancient cultures17,18,19. Studies of megalithic elite tombs or massive family burials frequently illustrate patrilineal and patrilocal kinship systems19,20,21, although this is not always the case, as occasional instances have shown matrilineal or combined patterns22,23. Trans-regional studies supply further knowledge not only for social coherence but also record individual and familial mobility within or between large ancient communities24,25. So far, these studies have mostly focused on the regions of Mesoamerica or West and Central Eurasia. More recently, these genetic studies have begun to explore kinship in a Neolithic settlement of East Asia26. However, there is still no comprehensive genetic analysis comparable in scope to the large, organized settlements with social hierarchies of East Asian prehistoric cultures. Flourishing in a corridor between farming and nomadic communities, Shimao culture, represented by Shimao city and its contemporary satellite sites (for example, Muzhuzhuliang, Shengedaliang, Xinhua and Zhaishan), played a crucial role in establishing the model of large settlements at the very beginning of Chinese civilization, opening a unique genetic window into the population history and early social structures of their inhabitants.

Previous surveys of uniparental markers of Shimao and its surrounding sites have depicted a diversity of mitochondrial haplotypes, contrasting with relatively fewer Y haplotypes27,28. A deeper sampling of nuclear genomes from those large prehistoric settlements would allow a more comprehensive investigation into the genetic and societal history of the inhabitants. To illuminate the population origin and kinship practice of Shimao society, we have undertaken a dense genomic sampling from the Middle to Late Neolithic and Bronze Age of nine sites showing many prehistoric cultures (Yangshao, Shimao and Taosi culture) and covering the area on the Loess Plateau from the Ordos Desert to the lower Yellow River of Shaanxi and Shanxi provinces. We generated genome-wide data for 169 ancient individuals out of 207 human remains tested from seven archaeological sites of Shaanxi and two sites of Shanxi Province, China (142 out of 169 samples overlapped with the previous mitogenomic study27, and two were genetically identical with three previously reported Shengedaliang samples29; Fig. 1, Supplementary Table 1 and Supplementary Figs. 15). In total, radiocarbon dates were collected from 32 individuals representing each genetic cluster of retained individuals from nine sites (Fig. 1 and Supplementary Table 1). After excluding 24 genetically identical individuals (Supplementary Tables 1 and 2), 13 individuals with low numbers of single-nucleotide polymorphisms (SNPs) and one individual with a high mapping mismatch rate to the reference genome (Methods and Supplementary Tables 1 and 3), we carried out the population analysis on 144 unrelated individuals and kinship analysis on another 25 individuals having first-degree or second-degree kinships in total. DNA libraries were enriched for 1.2 million SNPs30, resulting in SNP counts ranging from 29,604 to 976,271 with an average captured SNP coverage of 2.74 times (Supplementary Table 1).

Fig. 1: Overview of Neolithic Shaanxi and Shanxi samples.
Fig. 1: Overview of Neolithic Shaanxi and Shanxi samples.
Full size image

a, Geographic locations of newly sampled archaeological sites (filled, coloured symbols) from the Loess Plateau of Northern Shaanxi Province to Southern Shanxi Province, China. Purple symbols represent the geographical sites of previously published ancient samples from northern, western and central East Asia. b, Temporal distribution of new samples with direct radiocarbon dates from the Middle Neolithic (MN) to Late Neolithic (LN) and the Bronze Age (BA). Colours and symbols correspond with those in the geographic map. c, PCA projecting ancient samples (coloured) onto the genomic variation observed in Eurasian present-day humans (grey circles). Here cEA refers to the cline of central East Asians and Deep/wEA refers to the cline of Deep Asians/West East Asians. Yumin-related outliers are encircled by a dashed line and marked as Yumin cline. d, PCA analysis projecting ancient samples (coloured) onto the genomic variation observed in East Asian present-day humans. The sEA and Yumin clines are shown by dashed circles. The symbols are the same as c. Image in a reproduced with permission from ref. 46, Wiley-Blackwell, created with WorldClim (https://www.worldclim.org/) and Natural Earth (https://www.naturalearthdata.com) data.

Genetic make-up of Shimao populations

Efforts to explain the origins of the Shimao population have centred on cultural commonalities with populations from the neighbouring Central Plain and nearby northeastern populations in the Ordos region2,6, and to more distant cultural features from northeastern China, for example, the Amur River basin or coastal regions31,32. We investigated the genetic formation of various populations showing Shimao culture from the Loess Plateau through their genetic connections with a large panel of published Eurasian populations. First, we found that 4,200- to 3,800-year-old populations (upper range given for all carbon dates) attributed to Shimao culture (roughly 2300–1800 bce1) from Shimao city and its surrounding sites (Muzhuzhuliang, Shengedaliang, Xinhua and Zhaishan, together referred to hereafter as Shimao_4k) were closely related to the earlier population, Yangshao culture-associated roughly 4,800-year-old populations (Miaoliang and Wuzhuangguoliang, referred to together as preShimao_5k) in Northern Shaanxi province. Both Shimao_4k and preShimao_5k clustered with northern East Asian (nEA) ancestries (for example, YR_MN, YR_LN, WLR_LN and Miaozigou_MN) from outside Shaanxi province, supported by principal component analysis (PCA) and admixture analysis when K = 3 (where ‘K’ represents the number of ancestral source components; Fig. 1, Extended Data Fig. 2 and Supplementary Fig. 11). In outgroup-f3 analysis and D statistics, comparing Shimao culture-related populations with various ancient Eurasian populations, including nEA ancestries from the Yellow River basin, represented by YR_MN and YR_LN, and those further away such as Early Neolithic Shandong (Xiaogao and Bianbian), the Amur River basin (AR19K and DevilsCave_N) and the West Liao River basin (WLR_MN and WLR_LN), Shimao_4k populations had the highest overall affinity with the preShimao_5k compared with other nEA ancestries (Fig. 1c, Extended Data Fig. 1, Supplementary Tables 4, 9 and 10 and Supplementary Fig. 10). In addition, a maximum likelihood phylogeny with admixture (m = 2; Fig. 2) confirmed that Shimao_4k populations were found to have a clear genetic continuity with preShimao_5k (Fig. 2 and Supplementary Fig. 6). Genetic continuity of the Shaanxi populations was also validated using qpGraph, in which Shimao_4k could be modelled as a single source (100%) from preShimao_5k (here represented by the better covered Wuzhuangguoliang; Fig. 2 and Supplementary Figs. 79).

Fig. 2: Dual prevalence of the Middle Neolithic progenitor and Yumin-related ancestry across the Loess Plateau.
Fig. 2: Dual prevalence of the Middle Neolithic progenitor and Yumin-related ancestry across the Loess Plateau.
Full size image

a, qpAdm analysis showing the ancestry proportions of Yumin and Wuzhuangguoliang for Middle to Late Neolithic Shaanxi populations. Colours represent the different ancestral sources of Yumin (green) and Wuzhuangguoliang (yellow). The measure of centres of the error bars is presented as the mean value of Yumin ancestry proportion ±1 standard error for the estimated admixture proportions by qpAdm using the block-Jackknife analysis. b, Admixture graphs built by the qpGraph module in AdmixTools, the selected admixture graph is built on a base graph containing the central African Mbuti as an outgroup, the early western Eurasian Ust’-Ishim and early Asian Tianyuan, sEastAsia_EN, YR_MN, preShimao_5k (Wuzhuangguoliang), Shimao_4k (Shimao_HJGD1), Yumin, preShimao_5k_o (Wuzhuangguoliang_o1) and Shimao_4k_o (Xinhua_o) added interactively. c, Constrained graph with two admixture events in 100 algorithm iterations. The log-likelihood (LL) score is 31.19. Mbuti, Ust’-Ishim, sEastAsian_EN (represented by Qihe2 and Liangdao2) and Tianyuan are constrained as a non-admixed population. For a more detailed fitting graph, see Supplementary Figs. 8 and 9. d, Treemix analysis setting two migration branches (m = 2). The range of bootstrap values (n = 1,000) is marked on the tree node in different colours and shapes, and the individuals included in the group of preShimao_5k, preShimao_5k_o, Shimao_4k, Taosi_4k and Shimao_4k_o are described in Supplementary Table 1.

To further clarify whether extra ancestries could be included along with the preceding Yangshao ancestry to the population of Shimao, a broad f4 analysis was performed (Supplementary Table 4 and Supplementary Fig. 10). Notably, several individuals within the Shimao culture-related populations of both Shimao city and its satellite sites (denoted as the Shimao southern East Asian (sEA) cline in Fig. 1) differed from the Late Neolithic Longshan population represented by YR_LN29 in showing diverse affinities of southern East Asian ancestry (represented by indigenous Ami population of Taiwan and the Xitoucun population of Fujian), evidenced by D statistics (Supplementary Table 5). qpAdm modelling indicates these Shimao sEA outliers harbour predominantly 70–90% Yangshao culture-related ancestry (represented by Wuzhuangguoliang) with a further 10–30% southern ancestries, which can be represented by 22–31% of southern mainland ancestry (Xitoucun) or 7–20% southeast coastal ancestries, represented by an Iron Age indigenous Hanben33 or Ami populations in Taiwan (Supplementary Table 6), suggesting influences from southern ancestry during the Late Neolithic expansion of rice farming had extended further north than the Central Plain, in line with a recent finding34 (see Supplementary Note 2 for further discussion; Supplementary Tables 5 and 6). Admixture modelling using qpAdm for Shimao and its contemporaneous related populations (that is, Muzhuzhuliang, Shengedaliang Xinhua and Zhaishan) of the Late Neolithic shows an extremely high contribution from the 4,800–4,600-year-old ancestry represented by Wuzhuangguoliang (9 out of 18 populations, listed and highlighted in grey in Supplementary Table 4, have a single ancestry source and 5 have more than 80% Wuzhuangguoliang ancestry; Supplementary Table 6), further supporting the hypothesis that the Shimao people mostly originated from a Yangshao culture-related population that was established in the region at least 1,000 years before. To further understand the ancestry sources of the earlier Wuzhuangguoliang population, we applied a simulation method (Methods). Our results indicated a mixed ancestry source for the Wuzhuangguoliang population (Methods and Supplementary Figs. 1215), distinguished from Yellow River farming ancestries. We further explored the genetic relationship between Shimao_4k populations with the contemporary populations attributed to the Taosi culture (Taosi and Zhoujiazhuang, together referred to as Taosi_4k) located further south in Shanxi Province, indicating a close connection between Shimao and Taosi culture-related populations (see Supplementary Note 3 for further discussion).

Persistent Yumin-related presence

Agro-pastoralist societies in the Ordos region have frequently transitioned between herding and farming lifestyles from the Middle to Late Neolithic35. Located in the transitional corridor, Shimao showed steppe-related features with the introduction of herding animals36 and the presence of chiselled stone faces found at Shimao. To explore whether neighbouring steppe-culture populations genetically influenced Shimao populations and, if so, the timing and extent, we looked at the genetic connections between preShimao_5k, Shimao_4k, ancient western and eastern steppe populations37,38,39,40 (Afanasievo37, Yamnaya_EMBA38 and Shamanka39), West and Central Eurasians41,42, and other East Asians, including the nearby northern East Asian ancestry, Yumin32, represented by an 8,000-year-old individual from the Yumin site in Inner Mongolia, who inhabited the Inner Mongolian steppe and was absent from northern East Asia throughout the Neolithic and Bronze Age periods. We found the predominant preShimao_5k populations had little to no evidence of admixture with ancestries outside East Asia (Supplementary Table 8). When compared with the various nEA ancestries, we observed some outlier individuals from the Middle Neolithic Wuzhuangguoliang site having ancestries different from those predominant in the remaining populations. Two genetic outliers among the Wuzhuangguoliang population (Wuzhuangguoliang_o1; 4,831–4,585 calibrated years before present (Cal. BP); Supplementary Table 1) clustered with the Yumin population in the PCA (Fig. 1). The Treemix analysis also showed these Wuzhuangguoliang outliers (preShimao_5k_o) clustering with the Yumin branch (Fig. 2 and Supplementary Fig. 6). To further investigate the genetic make-up of these two Yumin-related outliers, we applied distal admixture modelling (Methods), which supported a 2-source admixture of 50.2 ± 11.5% Yumin-related ancestry and 49.8 ± 11.5% 4,832–4,820-year-old predominant Yangshao ancestry represented by Wuzhuangguoliang (Supplementary Table 6).

Looking at whether Yumin-related ancestry had a continuing influence on Shimao culture-related populations 1,000 years later, we observed an incidence of increasing Yumin-related ancestry lasting to the Late Neolithic but without obscuring the local genetic continuity of the previous 1,000 years. PCA and f3 analysis detected six genetic outliers (4,148–3,390 Cal. BP; Xinhua_o, Shimao_HCT_o, Shimao_DM_o1 and Shimao_DM_o2, and two belonging to Muzhuzhuliang_o) among the Shimao culture-related populations clustering with or close to Yumin (Fig. 1 and Extended Data Fig. 1). Those later outliers (Xinhua_o, Shimao_DM_o1 and Muzhuzhuliang_o) shared more alleles with Yumin than with Shimao or other nEA ancestries, as shown by the following D statistics: D(Yumin, Shimao/nEA; Shaanxi outliers, Mbuti) > 0 (−0.3 < Z < 9.2) and D(Shaanxi outliers, Yumin; Shimao/nEA, Mbuti) roughly 0 (−2.9 < Z < 2.9) (Supplementary Tables 9 and 11). Treemix analysis also showed that these Late Neolithic outliers (Shaanxi_4k_o) act as sister clades with the Yumin branch (Fig. 2 and Supplementary Fig. 6). Only one Late Neolithic outlier in Shimao (Shimao_HCT_o) was admixed with roughly 28–31% Yumin-related and roughly 69–72% Yangshao ancestry (ancestry proportion ranges are based on qpAdm models presented in Fig. 2 and Supplementary Table 6). Despite a time span of more than 4,500 years between Yumin and the most recent dated outlier in Dongmen of Shimao city (3,390–3,253 Cal. BP; Shimao_DM_o2), we found no evidence of admixture in the other five Late Neolithic outliers (Xinhua_o, Shimao_DM_o1, Shimao_DM_o2 and two from Muzhuzhuliang_o). This is evident by qpAdm analysis of distal or proximal modelling in which these five Late Neolithic outliers in Shaanxi province are best modelled as having a single source of ancestry related to Yumin (100%; Fig. 2 and Supplementary Table 6), and further supported by D statistics (Fig. 2 and Supplementary Tables 9 and 11). Together, these results indicate long-term interaction through coexistence and occasional admixture between ancient Shaanxi inhabitants and Yumin-related populations, and even an increase of Yumin-related influence from the Middle to Late Neolithic, in line with the discovery of the increasing incidence of herd animal exploitation36. It is unclear whether these interactions were related to trade, the maintenance of an agro-pastoralist lifestyle, perhaps in response to seasonal climate fluctuations, or other causes, but they were not substantial enough to interrupt the genetic continuity of the local ancestry36.

Sex-specific sacrifice at Shimao

The diversity of the sacrificial traditions of Shimao culture indicates a high degree of social stratification and strict hierarchy2. Sacrificial traditions at Shimao and its surrounding sites consisted of two forms: mass burials that may have served public ritual purposes, as found in Shimao Dongmen or on a raised area potentially containing a palace at Huangchengtai (Supplementary Figs. 1 and 2), and sacrifice accompanying high-status burials, where the sacrificed victims would be entombed with the tomb owners, as found at the cemeteries in Shimao and Zhaishan sites (Figs. 3 and 4 and Supplementary Figs. 35). To explore whether we could detect a demographic bias of the victims selected for sacrifice, we investigated the site of Dongmen (East Gate) at Shimao. In contrast to previous archaeological reports that identified these sacrifices as female-biased on morphological criteria, our results showed the sacrificial victims in Dongmen showed no evidence of female bias, with 9 out of 10 victims being men (female/male assigned female/male at birth). Three of these male individuals were previously identified in these reports as female by morphology. The archaeological context, beneath the foundation of the gate2,5, suggested that these sacrifices were probably connected to a construction ritual of the walls or gate, a custom observed at later sites in China2. To further understand these findings, we explored the genetic composition and kinship relationships of these sacrificed individuals in comparison to the dominant populations of Shimao. Our analysis identified two genetic outliers at Dongmen who possessed Yumin-related ancestry32, including a sacrificed victim from the pit and an individual from a late tomb (Shimao_DM_o1 and Shimao_DM_o2; Figs. 1 and 2 and Supplementary Table 1), who were buried alongside inhabitants with predominantly Wuzhuangguoliang ancestry (Supplementary Table 6). No pairwise kinships or shared identity by descent (IBD) segments were detected between these outliers and others within or across the sites (Fig. 4, Extended Data Figs. 3 and 4 and Supplementary Figs. 1621). Except for these two Yumin-related sacrificed individuals, no differences in ancestry were detected between those selected for sacrifice at Dongmen and the elite class of tomb owners at interior Shimao sites.

Fig. 3: Kinship and social organization at Zhaishan site.
Fig. 3: Kinship and social organization at Zhaishan site.
Full size image

Grave locations and reconstructed pedigree at the Zhaishan site. The connections inferred from trustworthy IBD sharing are marked in pink, representing sample pairs with either coverage above 1× for both samples. The IBD edges bar was added based on the maximum IBD length when IBD 12 cM.

Fig. 4: Kinship and social organization across Shimao societies.
Fig. 4: Kinship and social organization across Shimao societies.
Full size image

a, Map of sites within Shimao city including Dongmen (DM), Huangchengtai (HCT), Hanjiagedan (HJGD), Houyangwan (HYW) and Mahuangliang (MHL); the symbols represent the inferred social organizations in each Shimao site. Sacrificed victims from the high-level graves are marked as bold golden rectangles (men) or circles (women) of Dongmen, Huangchengtai and Hanjiagedan sites, and individuals from the same grave are marked with the same colour. b, Grave locations in the cemetery of Huangchengtai (the grave level at this site is higher than those at Hanjiagedan), and the kinships between residents. The high-level graves at Huangchengtai typically feature one to three sacrificed individuals alongside a niche containing burial goods. Tomb owners and individuals with unknown identity but who have kin connections with the sacrificed individuals are also plotted. c, Grave locations in the cemetery of Hanjiagedan within Shimao city and the reconstructed pedigree spanning at least three generations between tomb owners. Haplotypes of the mitochondrial and Y chromosomes are marked as circles and rectangles in different colours. Here the light blue dotted line represents one possible case of a matrilineal pedigree among several possibilities. d, Sampled individuals in the grave from Houyangwan, the East Gate (Dongmen) of Shimao city (Shimao_DM_o2 of M2: a later resident grave, different from other sacrificed people) and the Mahuangliang site. The mitochondrial haplotypes of b and d are simplified into lineages from A to Z. All second-degree kinships are marked with a dashed line in golden yellow. Connections inferred from trustable or low-confidence IBD sharing are marked in pink or green, representing sample pairs with both coverage above 1× or with coverage below 1× for either sample, respectively. The IBD edges bar was added based on the maximum IBD length when IBD 12 cM. Map in a reproduced with permission from Shaanxi Academy of Archaeology7. ‘Rob hole’ denotes an illegal looting tunnel; a specimen recovered from such a feature is thereby deprived of its original burial context.

Although the limited sample size reduces our statistical power to detect significant sex biases, we observe a marked contrast between the predominantly male sacrifices at Dongmen in Shimao city and the predominantly female sacrifices associated with elite burials at several Shimao cultural sites: including Hanjiagedan and Huangchengtai within Shimao city (Fig. 4), as well as secondary settlements such as Zhaishan (Fig. 3). Among these, Hanjiagedan, located south of the inner enclosure, served as a noble cemetery. Nearly all sampled sacrificed individuals were female (six out of seven) and unrelated to the tomb owners. Another high-level cemetery in the city centre, Huangchengtai of Shimao, potentially where the ruling class resided, also showed predominantly female sacrifices (14 out of 19). However, unlike other sites, second-degree kinships were observed among the sacrificed victims (Fig. 4, Supplementary Table 3 and Supplementary Fig. 19), indicating that families or communities may have been selected for burial sacrifices by the ruling elite. A small burial ground, similar to the mass burial at Dongmen, was unearthed near the palace area of Huangchengtai. All individuals buried there were women (three out of three) and none showed detectable familial ties to individuals from nearby communities (Fig. 4, Extended Data Fig. 3 and Supplementary Table 3). The identity of these female sacrifices could be extrapolated from the handcrafted products excavated alongside them, offering an assumption that the craftsmen who mastered the core production technology were concentrated in the upper elite residential quarters. In concordance with practices at Shimao, its secondary settlement, Zhaishan, also featured only female sacrifices (two out of two), with no close kinships (first-degree to second-degree kin) observed between the sacrificed individuals and their tomb owners (Fig. 3 and Supplementary Fig. 18). These patterns of mostly female sacrifices starkly contrast with Dongmen, in which decapitation and mass burial involved mostly sampled men. The sacrificial practices observed in the cemeteries of Shimao city and Zhaishan may represent ancestor veneration, in which women were sacrificed to honour elite nobles or rulers. The divergent traditions seen in these sacrificial customs suggest a complex, hierarchical social system in Shimao culture, which was not observed in previous ancient genomics studies. However, our analysis is based on a limited number of well-preserved remains, which may not fully represent the entire population of sacrificed individuals. This sample size, unfortunately, lacks robust statistical power, limiting the interpretation of the bias ratios.

We then looked for signs of consanguinity in Middle to Late Neolithic Shaanxi communities and found three individuals whose parents were probably first-degree or second-degree relatives (Extended Data Fig. 5 and Supplementary Table 14). At Zhaishan, a sacrificed woman (C6213) showed extensive long runs of homozygosity (ROH) (roughly 400 centimorgans (cM); Extended Data Fig. 5a), consistent with her being the offspring of a second-degree relative mating (Extended Data Fig. 5). However, distinct from the high-status consanguineous offspring reported in Neolithic Ireland whose parentage was considered to have had high social sanction43, we found that this sacrificed woman (Fig. 3) shared only distant kinships (third to fifth degrees kin) with two lineage tomb owners (individuals on the pedigrees denoted in Fig. 3). Close-kin mating was not observed among other elites or commoners with available pedigree or ROH information, suggesting such unions may have been avoided or less common in higher-status lineages, although larger sample sizes are needed to confirm this pattern.

Dominant patrilineal descent structure

In the absence of more advanced political systems, researchers have traditionally regarded family relationships as a means of maintaining elite status and perpetuating power. To investigate the family ties among the tomb owners of Shimao culture, we sampled individuals from low-level to high-level graves of the Middle to Late Neolithic large communities and uncovered a web of relatedness ranging from two to four generations. We were able to identify 25 kinship pairs within-second-degree kinship with high confidence and 31 pairs for IBD sharing that showed a possible third-degree to fifth-degree kinship in total across all sites (Figs. 3 and 4, Extended Data Fig. 3, Supplementary Tables 3 and 12 and Supplementary Figs. 1821). We then extended pedigrees up to four generations among tomb owners in low-level to high-level graves at Shimao city and Zhaishan, finding that the largest pedigrees at both sites were established by a high-status man. Their male offspring also appeared to have had high social status with the right to wealth inheritance (for example, burial goods and offerings of sacrifice). This is indicative of a predominant patrilineal descent structure both at Zhaishan and Hanjiagedan, although we cannot exclude one possible matrilineal case at Hanjiagedan (Fig. 4). We determined the uniparental haplotypes of all higher-status individuals, both lineage and non-lineage members (as shown on or off the pedigrees in Fig. 4), and found that all lineage male tomb owners of Zhaishan (Fig. 3) and Hanjiagedan (Fig. 4) nearly universally carried the same paternal haplogroup (O3a2c). Except for one non-lineage man at Huangchengtai, who had a different Y haplotype (C2e2). This contrasts with the diverse maternal haplogroups of ten female tomb owners observed in these three cemeteries (Figs. 3 and 4). Likewise, human remains in the contemporaneous settlements attributed to the Shimao culture (Muzhuzhuliang, Shengedaliang, Xinhua and Zhaishan) also demonstrated a diversity of mitochondrial haplotypes but relatively limited paternal haplotype structures among the residents (Supplementary Figs. 18 and 19), showing a patrilineal descent structure in which group membership primarily derives from the father’s lineage.

The practice of female exogamy can be useful for maintaining genetic diversity and reducing the incidence of close-kin mating and has been identified in several Neolithic communities in West Eurasia18,19,44. To see whether these practices played a role in maintaining the developed social hierarchy system at Shimao, we checked all the lineage and non-lineage female individuals along with their male relatives. The constructed pedigree at Hanjiagedan showed the second-generation males’ female partners came from different biological families, a circumstance that is also found at Zhaishan, although it is not clear whether these partnerships occurred serially or were polygamous (Fig. 3). We observed no close biological relatives—such as daughters, parents or siblings—of these female tomb owners at the site from high to low-level graves at Zhaishan and Hanjiagedan cemeteries, which may suggest that they were not descended from local families but rather originated outside the community (Figs. 3 and 4). Whether these are instances of female exogamy practices is unclear due to the influence of incomplete sampling. A better understanding of the Shimao culture’s mating customs would require a broader sampling of tomb owners.

Tracking how burial goods as indicators of status can help infer patterns of wealth transmission and patrilineal influence in Shimao’s potentially hierarchical society. We found two non-lineage female tomb owners at Huangchengtai, a presumed ruling-class cemetery, having high social status as evidenced by their rich burial goods and sacrifice offerings. This is comparable to the five male tomb owners at cemeteries at Hanjiagedan and Zhaishan, conveying that in Shimao culture, high social status and associated wealth may not have been constrained to men, and that women could also have had political powers. Because these female tomb owners do not belong to a discernible family lineage, or their direct relatives were not recovered, it is challenging to determine whether their wealth was inherited from parents or husbands, or accumulated independently. Overall, both pedigrees of Hanjiagedan and Zhaishan depicted a core role of families in Shimao communities. We used the pedigree information to determine whether the spatial arrangement of tombs could reflect familial ties. Notably, the arrangement through geographical proximity and direction of tombs has no strong correlation with the first-degree or second-degree kinship among tomb owners, demonstrating that the blood relations were not a factor in grave placement (Figs. 3 and 4c). At Zhaishan cemetery, father–adult son tombs were spatially closer than those of fathers and adult daughters (Fig. 3), supporting a patrilineal and potentially patrilocal system.

Discussion

The extensive and high-resolution dataset from well-preserved settlements of Shaanxi and Shanxi provinces offered us a genetic window into the human migration, interaction and kinship practice of these distinctive and important past societies in prehistoric China. The populations of Shimao society within the Ordos loop were found to have originated from a single ancestral source, corresponding to a regional genetic continuity from the Middle to Late Neolithic, which shows consistency with the hypothesis from archaeologists that Shimao city was founded by the agro-pastoralist elites of the Loess Plateau and Ordos region2. This region acted as an interaction corridor between farming-associated and herding-associated ancestries, separating the large settlements attributed to Shimao culture from those in the Central Plain. In addition, we detected the presence of an inland northern East Asian ancestry from the Inner Mongolian steppe, Yumin, before and during the occupation period of Shimao. The lasting presence of Yumin ancestry suggests a regular and lengthy interaction with periodic genetic inflow from the Yumin populations of northern China without disrupting the genetic continuity of the dominant local Shimao ancestry. Together, our results reveal new insights into the lasting coexistence and interactions of Yumin ancestry from Inner Mongolia with the Yangshao and Shimao culture-related ancestry in northern Shaanxi, in line with the progressive transition from exclusive farming to integrated agro-pastoral subsistence across this region36. In addition, our findings showed a broader genetic contribution from southern mainland Xitoucun ancestry and southeast coastal indigenous ancestries, represented by Taiwan-Hanben or Ami, extending over a long distance from Fujian or Taiwan to the Shanxi and Shaanxi populations. This aligns with evidence of rice farming expanding further north with a broader population contact34. Nevertheless, it remains unclear whether these genetic affinities originated directly from southern coastal or mainland populations, or were mediated through Yangzi River Longshan populations. Further sampling is needed to resolve this question.

Given the genetic continuity with populations inhabiting the same region 1,000 years previously, apart from the Yumin-related introgression, the people of Shimao showed little to no admixture with outside groups that shared some cultural similarities, such as populations to the west occupying the western Eurasian steppe and Northern and Central Asia, or coastal Shandong to the east. This suggests that the presence at Shimao of anthropomorphic stone carvings, specialized knives and artefacts such as jade blades and alligator bone plates were most likely sourced from these regions through expansive trade networks without genetic exchange. Despite the distance between the two, the inhabitants of Taosi, the contemporary large settlement comparable to Shimao, and a nearby settlement, Zhoujiazhuang, share close ancestry with pre-Shimao populations from the northern Ordos plain. This is not at odds with proposals based on archaeological data that a more complicated relationship involving both trade and pillage may have existed between the two large communities2. We have shown that Yangshao culture-related populations living at least 5,000 years ago at Wuzhuangguoliang are ancestral to Shimao and Taosi regions of cultural influence, with limited interactions with Yumin-related populations to the north, resolving important questions about the origins of the Shimao city builders and the relationship between Shimao and Taosi, but more questions remain to more precisely define these relationships, and the role both ancestral and familial lineages may have played in the early societies of the Shaanxi–Shanxi region.

Along with the settlement patterns of the emerging agro-pastoral society at Shimao, the rich quantity of burials attributed to different social classes allowed us to examine the three dimensions of social structure based on kinship practices: lineage descent, marriage patterns and residential rules45. Through genomic sampling of low-level to high-level graves and sacrificial burial pits, we further clarified the social organization and kinship patterns of the hierarchy-driven Shimao society. Within these burials from elite to common people, our results support a predominantly patrilineal organization along with apparent male-specific and female-specific sacrifice customs, together shaping a hierarchically structured Shimao society. Unlike the extensive pedigrees recovered so far from the family burials from the West to Central Eurasia17,20, many first-degree and second-degree kinships and IBD pairs among the diverse burial practices allowed us to reconstruct several extended pedigrees spanning from both high-level and low-level graves. On the larger scale, we show that Yangshao culture-attributed pre-Shimao and Shimao culture populations maintained healthy population diversity with little close-kin mating and a large effective population size for more than 1,000 years. Furthermore, we coanalysed the spatial distance and the genealogies among individuals between all the sites located in the Loess Plateau (Extended Data Fig. 3). We found no close kinship practices or shared IBD blocks with high confidence within Shimao cultural communities or between the southward Taosi cultural communities in Shanxi Province, implying restrained movement and mating patterns between families belonging to different cultures, or those far from the dominant local communities.

Within the Shimao communities, no direct familial linkages were detected between social elites and sacrificed individuals, suggesting the presence of constrained mating practices and social boundaries. However, given the observed kinship connection between the elite and a lower-status individual, these boundaries were probably permeable to a certain extent. Further data from individuals of intermediate status will be essential to more comprehensively evaluate the social stratifications of Shimao society. Instead, the lack of kinship between the elite tomb owners and the sacrificed people in the tombs may suggest that graves were centred around social elites and their families, and the sacrifice rite was administered according to social status. Whereas the uniparental genetic data indicate a patrilineal kinship system, the presence of high-status female individuals suggests that gender roles did not strictly limit access to elevated social positions in the Shimao communities. In summary, our analyses highlight the utility of extensive genomic sampling to reveal detailed patterns of prehistoric social organization. Tracking how burial goods such as weapons, pottery, as well as animal and human sacrifices, follow the pedigrees has also clarified patterns of wealth inheritance and class differentiation within an early East Asian political power centre and its satellite settlements. Situating these findings within a broader cultural and symbolic framework enriches our understanding of the ancient ritualistic practices and social dynamics of Shimao.

Methods

Ethics and inclusion statement

Permission to access the ancient DNA in the human remains in this study was approved by the archaeological team that lead the excavation from Shaanxi Academy of Archaeology, Institute of Archaeology (Chinese Academy of Social Sciences) and Archaeology Institute of National Museum of China. The Institutional Review Board of the Chinese Academy of Sciences, Institute of Vertebrate Paleontology and Paleoanthropology provided further monitoring and permission for the sampling of ancient humans in this research. All the work was done in collaboration with local archaeologists, who were named co-authors for their contributions to the collection of material and archaeological information, such as on-site photographs, classification of high-level tombs and/or discussions that contributed to the associations derived from the archaeological research cited in this study. All wet laboratory work and data analysis are performed with equipment from the Molecular Paleontology Laboratory, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences.

Ancient DNA experiments and sequencing

We sampled and sequenced 207 human remains from Shaanxi and Shanxi provinces, China, among which 169 individuals were analysed in this study (Supplementary Table 1). All extraction, sequencing and data processing of ancient human samples were carried out in dedicated laboratories at the Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, in Beijing. Following standard protocols47, DNA was extracted from each sample from less than 100 mg of bone powder, obtained through drilling. We prepared double-stranded libraries (denoted ‘DS’ in Supplementary Table 1) for 134 samples with uracil-DNA glycosylase partially treated library protocol (denoted ‘half UDG’)48,49 (Supplementary Table 1). For 35 samples, we prepared single-stranded libraries (denoted ‘SS’, Supplementary Table 1) with full UDG treatment (4 samples; denoted ‘UDG’, Supplementary Table 1) or no UDG treatment (31 samples; denoted ‘No’ in Supplementary Table 1). To collect enough DNA for capture, libraries were amplified for 35 cycles using the AccuPrime Pfx polymerase. We then evaluated the amount of DNA extracted per sample using a Thermo Scientific NanoDrop 2000 spectrometer. We applied a capture strategy on both mitochondrial and nuclear DNA. For mitochondrial DNA (mtDNA), we used oligonucleotide probes synthesized from the complete human mitochondrial genome50, for nuclear DNA, oligonucleotide probes targeted 1.2 million SNPs (the ‘1,240k’ SNP panel) were applied. After enrichment, sequencing was performed on an Illumina MiSeq sequencing platform to generate 2 × 76 base pairs (bp) paired-end reads for the mtDNA and an Illumina Hiseq 4000 sequencing platform to generate 2 × 100 bp and 2 × 150 bp paired-end reads.

Read alignment and variant calling

We used leeHom51 to trim adaptors and merge paired-end reads into a single sequence (minimum overlap of 11 bp), keeping only merged reads with a length of at least 30 bp. Reads were aligned with BWA (v.0.5.10)52 using the bam2bam command with default parameters, except for samples with no UDG treatment, for which we used the parameters -n 0.01, -l 16500 and -o 2. We aligned the mtDNA reads to the revised Cambridge Reference Sequence53 and the nuclear DNA reads to the human reference genome hg19 (ref. 54). Duplicate reads with the same orientation, start and end positions were removed, and reads with a minimum mapping quality score of 30 were kept for analysis. The frequency of terminal C-to-T misincorporations was used to validate ancient DNA sequences, and contamination rates were estimated on the basis of two approaches. For all the individuals, we applied ContamMix55 to compare mtDNA fragments between the new consensus mitochondrial genomes with the present-day sequences50. To minimize the impact of damaged bases, we ignored the first and last five positions of the fragments during estimation. We treated the libraries as contaminated if the estimated contamination rate was greater than 5%. Contamination rates for men were also estimated using ANGSD56, leveraging the fact that men have one copy of the X chromosome, and verified using HapCon57, to improve the performance of low-coverage data. To keep enough individuals for further analysis, for 12 individuals with contamination above 5% (‘b’ annotated in the column of SNP number, Supplementary Table 1), we restricted our analysis to only damaged fragments with ancient DNA characteristics. The damaged fragments were obtained by pmdtools v.0.60 (ref. 58) with the --customterminus parameter, keeping fragments with at least one C → T substitution in the first three positions at each end. To eliminate the potential bias caused by the terminal deaminated cytosines, we masked 2 bp at the end of mapped reads for all the double-strand libraries with half UDG treatment, and 5 bp at the end of the reads were masked for all the single-strand libraries with no UDG treatment. To generate pseudo-haploid genotypes, heterozygote SNPs were randomly sampled to determine a single allele for the individual. During genotyping, the first and last 5 or 2 positions of the fragments were ignored for non-UDG-treated and UDG-treated libraries, respectively, and 13 poorly covered samples (with fewer than 27,000 SNPs) were removed.

Uniparental haplogroup identification

Mitochondrial sequences for each individual were mapped to the revised Cambridge Reference Sequence59. We only kept reads of a minimum of 30 bp in length and with a minimum mapping quality of 30. Haplogroups for each individual were called using HaploGrep2 (ref. 60) based on PhyloTree Build v.17 (ref. 59). We also confirmed all the haplogroups using the phylogenetic tree constructed with mtphyl v.5.003 and found that two individuals with an R# haplogroup (R + 16189)27 were assigned into the subclade of Haplogroup B4c1a since the 9-bp deletion (8281–8289). In comparison, four individuals with B haplogroups were assigned to the ancestral haplogroup with the best fit (that is, B4a, B4b1 and B4c1b). For the male individuals, Y-chromosome haplogroups were determined by identifying the assigned position in the phylogenetic tree on the basis of the International Society of Genetic Genealogy dataset version 9.77 (www.isogg.org/tree). In cases in which the most derived allele upstream of the Y-chromosome was a C to T or G to A substitution, indicative of possible deamination, at least two derived alleles were required to assign the Y-chromosome haplogroup. Otherwise, the haplogroup of the tested individual would be assigned to the ancestral haplogroup. When the subclade of the haplogroup assignment could not be determined, the haplogroup of the individual would be assigned to the most recent ancestral haplogroup they best fit (for example, No).

Population structure analysis

We conducted a principal component analysis (PCA) with smartpca in the EIGENSOFT package61. To calculate the principal components, we used 82 present-day populations from the Affymetrix Human Origins dataset62. We merged newly sequenced and published ancient individuals to the Human Origins dataset and projected them using the following program settings: ‘lsqproject: YES’, numoutlieriter: 0, and shrinkmode: YES. Newly sequenced or previously published ancient individuals were projected onto the principal components calculated based on present-day Eurasians (Fig. 1c) or only the East Asians (Fig. 1d). We estimated individual ancestries by model-based maximum likelihood clustering using ADMIXTURE63. We used 44 of the 82 populations used in the PCA, along with 10 present-day Han and Tibetan populations from ref. 64. Before the admixture analysis, we pruned genotypes with high linkage disequilibrium (r2 > 0.4) using PLINK (version v.1.90)65 and the parameters ‘-indep-pairwise 200 25 0.4’ were applied for SNP filtering, leaving 597,573 SNPs. ADMIXTURE analysis was conducted with K from 2 to 10. For each K, we ran the analyses ten times with different seeds to estimate the cross-validation error, and the best K was determined according to the lowest cross-validation error.

Relatedness analyses

Kinship patterns among the samples from Shaanxi and Shanxi provinces were analysed using READ v.2 (ref. 66) to determine the degree of kinship, and confirmed by lcMLkin67. Further connections of the pedigrees or individuals were investigated using ancIBD68. For Huangchengtai samples, we introduced the third hidden Markov-model-based approach by KIN69 to test all the kinships within the second-degree. After filtering 22 genetically identical individuals, we found 25 kinship pairs with high confidence (Supplementary Table 3), consisting of 8 pairs of first-degree relationships, including 1 full sibling and 7 parent–offspring, and 17 second-degree relatives (Supplementary Table 3). For the READ analysis, a genome-wide approach was applied for calculating a single value (P0) across all sites without splitting the genome into windows, and the average P0 was then normalized by the median of all average pairwise P0 across all samples. To estimate standard errors, a block-jackknife approach was applied with blocks of 50 mega bp (Mbp). For the differentiation between parent–offspring and siblings, a different window size of 20 Mbp was used. We ran separate analyses by three groups: all Middle Neolithic Shaanxi samples, all Late Neolithic Shaanxi samples and all Shanxi samples. We used unrelated individuals without having first-degree and second-degree kinships estimated by READ for subsequent genetic analyses.

A further genotype likelihoods-based method to determine kinship, lcMLkin, was applied that considers the inaccuracy of genotype calling when sequence coverage is low. lcMLkin outputs the estimated probability of two diploid individuals sharing zero (k0), one (k1) or two (k2) alleles that are identical by descent (IBD) and calculates the combined kinship coefficient by the equation: r = k1/2 + k2. The kinship categories (for example, identical twins or self, parent–offspring, full siblings, second-degree and unrelated) were determined by comparing with the theoretical expectation for k0, k1 and k2 (ref. 70). The method requires a SNP set with minor allele frequency higher than 5% and without linkage disequilibrium with each other. SNPs with allele frequency lower than 5% among all present-day East Asians from the Simons Genome Diversity Panel dataset71 were removed, and the resulting data were pruned for linkage disequilibrium using PLINK, with the parameter ‘-indep-pairwise 200 25 0.5’, which resulted in 135,642 SNPs available for the downstream analysis. We called genotype likelihoods at these SNP sites for our ancient individuals using the script SNPbam2vcf.py available with lcMLkin and estimated their biological relatedness using lcMLkin (Supplementary Table 3). When two samples from the same burial site were identified as identical twins or the same individual, the sample with the lower coverage was removed from analysis. For samples with first-degree or second-degree familial relationship, we only retained genome-wide data for the individual who had the higher coverage and no first-degree or second-degree relationships with the remaining individuals in the same site for further downstream genetic analysis. This resulted in 25 samples being excluded from the population analyses.

For the shared IBD analysis, genomes from individuals with coverage above 0.5× were imputed by GLIMPSE2 (refs. 72,73) setting parameters for quality control in phasing as --mapq 30 and --baseq 30. The 1000 Genomes Phase 3 dataset was used as a reference panel and was downloaded at http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. To obtain the IBD sharing information, imputed files were merged by chromosomes and analysed by the software ancIBD68 with suggested parameters, including genotype posterior probabilities higher than 0.99, gap maximum at 0.0075 and IBD blocks with more than 220 SNPs. The results of IBD sharing between pairs of individuals are recorded in Supplementary Table 12. We focused on IBD connections between individual pairs both having coverages above 1×. We also provided the results for IBD connections when the coverage of either of the individuals in a pair is between 0.5 and 1×, but denoted them as ‘low confidence’. To confirm the within-second-degree kinship relationships estimated by READ, lcMLkin and KIN (Supplementary Table 3 and Supplementary Figs. 16 and 17a), we counted and plotted the sum and number of IBD segments longer than 12 cM, which could be applied to infer kinship relationships68. We also depicted IBD segments that were longer than 8 cM (Supplementary Figs. 17b–d) in karyotype plots for these individuals who shared within-second-degree kinships.

To further infer distant kinships of more than second degrees (denoted as third-degree to fifth-degree degree kinships in Fig. 3, Extended Data Fig. 3 and Supplementary Figs. 1821), we counted and plotted the length distribution and the karyotype of shared IBDs longer than 8 cM. The kinship relations were determined based on the goodness of fit between the observed and the expected curves of various kinship categories, denoted in the top right of each karyotype plot (Supplementary Figs. 17–21). Individual pairs within each site with IBD sharing shown in figures were denoted as possibly having third-degree to fifth-degree kinship relationships (Figs. 3 and 4 and Supplementary Figs. 17–21). Individual pairs across sites sharing single IBDs of at least 25 cM in length were also counted (Extended Data Fig. 3). We also compared the sum and number of IBDs that the segments longer than 8 cM or 12 cM between individuals with different social status (that is, tomb owners and sacrificed victims) at different sites (Supplementary Table 13 and Extended Data Fig. 3). We only included individuals with an average SNP panel coverage of at least 1×. Only one individual was considered for pairs identified as genetically identical.

ROH

The presence of close-kin mating and kinship-based mating systems could be reflected by the ROH, which represent long stretches of homozygous segments along the genome of an individual. For 171 individuals with SNP counts at least 29,000, we applied hapROH74 that detects ROH in low-coverage ancient DNA data using haplotype information from a modern phased reference panel. We detected ROH with four length classes (4–8 cM, 8–12 cM, 12–20 cM and more than 20 cM). Results for individuals with SNP counts more than 400,000 were recorded as ‘high confidence’, whereas others were recorded as low confidence (Supplementary Table 14). Observed and expected ROH blocks above 4 cM were plotted using hapROH.

Genetic clustering among new samples

Samples were grouped using a combination of D statistics and PCA. We calculated the D statistics in form D(Sample 1, Sample 2; Population, Mbuti) for each pair of samples. Here, we took Mbuti as the outgroup. Sample 1 or sample 2 are two unrelated individuals from each archaeological site. The 45 individuals and populations (both ancient and present-day) making up the population in the above statistic are grouped as follows:

aNorthEA(14)

AR_EN, Bianbian, DevilsCave_N, Chokhopani, HMMH_MN, Kolyma, Mebrak, Mongolia_N_East, Okunevo_EMBA, Shamanka_EN, WLR_MN, WLR_LN, WLR_BA, Yumin; aSouthEA(7): Liangdao1, Liangdao2, Longlin, Man_Bac, Nui_Nap, Qihe and Qihe3.

DeepAsia(9)

G1, Jomon, Malta1, Onge, Papuan, Tianyuan, USR1, Vanuatu, Yana; aWest/SouthAsia(15): Afanasievo, Anatolia_N, Botai_CA, Ganj_Dareh_N, IndusPeriphery, Hajji_Firuz_C, Harappan, Iran_N, Karelia, Kotias, Kostenki14, Shahr_I_Sokhta_BA3, Ust-Ishim, Vestonice16 and Yamnaya_Kalmykia.SG.

If the samples can be grouped in the same genetic cluster, we predict that D ≈ 0 (|Z score| < 3) for most populations. Samples that deviated from this expectation within groups, or were found to be outliers in the PCA, were separated from the main group. Results were only considered from sample pairs having at least 25,000 overlapping SNPs. We summarize the matrix for the count of significant pairwise D statistics for each archaeological site in Supplementary Table 2. The detailed genetic grouping is listed in Supplementary Table 1 and the PCA clustering is presented in Fig. 1. The 18 populations defined in the text are denoted and highlighted in grey in Supplementary Table 4.

Outgroup f 3 and D statistics

We calculated f3 statistics using qp3Pop (v.412) with the form f3(Population X, Population Y; Mbuti), measuring the shared genetic drift between all combinations of populations relative to the outgroup. We used the present-day central African population, Mbuti, as the outgroup and compared newly sequenced and previously published ancient populations within or outside East Asia (Extended Data Fig. 1). The higher the f3 statistic, the more genetic drift (or shared genetic similarity) two populations share relative to Mbuti. We calculated D statistics using qpDstat (version 712) with the form D(population X, population Y; population Z, Mbuti), measuring the shared number of alleles between all combinations of grouped new populations and a diverse array of previously published ancient and present-day populations (Supplementary Tables 4, 5 and 711). A negative D statistic means that population Y shares more alleles with population Z than it does with population X. A positive D statistic means that population X shares more alleles with population Z than it does with population Y. Both the outgroup f3 and D statistics were calculated using AdmixTools75.

Admixture modelling with qpAdm

The ancestry proportions of ancient populations were estimated using qpAdm (v.634) in the AdmixTools package, modelling one, two or three different sources. Distal and proximal modelling were used to model the ancestry of target populations, in which two modelling types differ in the relative age of the source populations. Distal modelling considers older source populations with larger genetic distance (Yumin, sEastAsia_EN, Xitoucun, Man_Bac, Coastal_nEastAsia_EN, Xingyi_EN, AR14K, YR_MN, Wuzhuangguoliang and WLR_MN) and proximal modelling looks at younger source populations with closer genetic distances (Yumin, Wuzhuangguoliang and YR_MN), which was applied for the outlier modelling (sEastAsian_EN = Qihe2 and Liangdao2; Coastal_nEastAsia_EN = Bianbian, Boshan and Xiaogao)32. Both model types used the same set of reference populations, which include Mota, Ust-Ishim, Kostenki14, Iran_N, IndusPeriphery, LBK_EN, Motala12, Kotias, AR33K, Yana, Karelia (IndusPeriphery = Shahr_I_Sokhta_BA2 and Shahr_I_Sokhta_BA3 (5--4 ka) from Shahr-i-Sokhta in Iran, and Gonur Depe (Gonur2_BA) (roughly 4 ka) from the Bactria-Margiana Archaeological Complex in Turkmenistan)32,76. As for the subgroups (Shimao_HJGD3 and Zhoujiazhuang3) potentially having connections with Western or Central Asian populations, we consider extra sources (Anatolia_N, Afanasievo, AYTH, Saidu_Sharif_H, Kashkarchi_BA, Zevakinskiy_LBA, IndusPeriphery, Bustan_BA, Botai_CA, Satsurblia, Gonur1_BA, Dzharkutan1_BA, Gogdara_IA, Loebanr_IA, Shahr_I_Sokhta_BA1, LaBrana1, Levant_N, Stuttgart and Loschbour) along with the source populations in the distal model described above, and with the same reference population but excluding Iran_N and IndusPeriphery. A ‘rotating’ scheme77 of source and reference populations was used for distal and proximal modelling. Wuzhuangguoliang and the extra sources were used only as the source population. As the genetic make-up of the proximal sources is too close to compete for the most optimal model, on the basis of the determined sources (Wuzhuangguoliang or further sources) from distal modelling, we fixed them as the source populations. The other parameters we adopted for qpAdm modelling are: ‘allsnps: YES’, ‘details: YES’ and ‘summary: YES’. The model was deemed plausible if the tail probability of rank0 is above 0.05 and the estimated admixture proportions are between 0 and 1. A valid model with more source populations was considered only when fewer sources were rejected, and results with the lower number of sources were marked as the high-confidence model in Supplementary Table 6.

We observed two-way admixture models for Shimao_HJGD3, including 87–93% Yellow River Yangshao ancestry (represented by Wuzhuangguoliang or YR_MN) with an extra 7–13% ancestry components from diverse non-East Asian ancestries, represented mainly by Iranian farmer-related and Western Hunter-gatherer-related ancestry. Zhoujiahzuang3 could still be modelled as having one source from either Coastal_nEastAsia_EN or Wuzhuangguoliang (Supplementary Table 6). To get a broader source for the Wuzhuangguoliang population, we also conducted further qpAdm testing by swapping AR14K with the later ARpost9K population, and we found ARpost9K could better represent the AR14K and Yumin ancestry components for Wuzhuangguoliang ancestry.

Modelling of Wuzhuangguoliang ancestry

To further understand the composition of ancestry sources for Wuzhuangguoliang, we simulated the admixed populations with 20 replicates to better represent the potential ancestries with admixture proportions from 0% to 100% in increments of 1% following the method implemented in ref. 78. The measurement is based on f4 statistics in the form of f4(Wuzhuangguoliang, simulated mixture populations; nEA/outEA/steppe/AR-related, Mbuti) with a two-sourced mixture of Early Neolithic Shandong including Bianbian, Boshan and Xiaogao subpopulations with YR farming ancestry (YR_MN), or a three-sourced mixture by adding the third group with Yumin or Amur River ancestry (AR14K and ARpost9K) in comparison to nEA (Miaozigou_MN, WLR_MN and AR-related such as AR14K, AR19K, ARpost9K and DevilsCave_N), outside East Asian (outEA = Shamanka_EN, Lokomotiv_EN and Loschbour), steppe-related (Mongolia_N_East, Mongolia_N_North and Yumin) and Tibetans (Shannan2k and Zongri5.1k). The ancestral component selection is mainly based on the valid models from the qpAdm analysis. For the proportional simulation of the ternary simulated population, we first use the admixed population from YR and Coastal_nEA_EN/ARpost9K as the first admixed component, varying proportionally from 0% to 100%, and then add the second component of ARpost9K, Yumin or WLR_MN proportionally from 0% to 100%. The ranges of proportion thresholds that marked a plausible range of the second or third admixed ancestry are estimated by qpAdm modelling (Supplementary Table 6) with the source proportion ± standard error rate. The past models of admixed ancestries of Wuzhuangguoliang could be explained by Wuzhuangguoliang harbouring an ancestry that contains more affinity with Neolithic Shandong, Amur River, West Liao River or Tibetan ancestries. Although none of the specific nEA groups served as a good proxy for this unknown ancestry because adding them as the second or third source was still insufficient to model Wuzhuangguoliang ancestry (Supplementary Figs. 1215).

Demographic modelling with qpGraph

We applied qpGraph and findGraphs functions from AdmixTools and AdmixTools2 packages79, respectively. First, we followed a general approach for modelling admixture graphs using qpGraph (v.6065) in the AdmixTools package, which started with a basic and well-understood tree (including the central African Mbuti as an outgroup, the early western Eurasian UstIshi and early East Asian Tianyuan). We then added extra populations (sEastAsia_EN, YR_MN, Wuzhuangguoliang, Shimao_HJGD1, Yumin, Wuzhuangguoliang_o and Xinhua_o) one at a time in their best-fitting positions iteratively32. An optimum tree model could be constructed based on the observed f statistics (f2, f3 and f4 for all possible pairs of populations). We required a |Z| score of less than 3 between the observed and expected values (determined by the block Jackknife) to accept the model. A small number (0.0001) was added to the diagonal entries of the estimated covariance matrix of the f statistics (Q matrix) to stabilize the matrix inversion. The qpGraph program was run with the following recommended parameters80: ‘outpop: Mbuti, blgsize: 0.05, lsqmode: YES, diag: 0.0001, hires: YES, initmix: 1000, precision: 0.0001, zthresh: 0, terse: NO, useallsnps: NO’. To not miss any unexplored graph models, we first carried out fully automated graph exploration using findGraphs tools, allowing 0 to 8 admixture events to occur in 100 algorithm iterations (Stop_gen = 100). Next, we constrained deeply diverged populations (Mbuti, Tianyuan, sEastAsia_EN, Ust-Ishim), assuming they are non-admixed with 100 algorithm iterations, setting admixture events from 0 to 3. Here sEastAsia_EN, including Qihe2 and Liangdao2, and all the denoted populations are similar to the admixture graph by qpGraph in AdmixTools. Graphs of best-fitting models are listed in Fig. 2 and Supplementary Figs. 8 and 9.

Treemix analysis

The phylogenetic relationships among populations were estimated using Treemix v.1.13 (ref. 81). The dataset for the Treemix analysis included the newly sequenced cohort including five groups based on their genetic and spatiotemporal characteristics: preShimao_5k, Shimao_4k, Taosi_4k, Shimao_4k_o and preShimao_5k_o (denoted in Supplementary Table 1) and previously published cohorts including DevilsCave_N, WLR_MN, WLR_LN, Coastal_nEastAsia_EN, Yumin, AR19K, sEastAsia_LN, sEastAsia_EN, Tianyuan, Yana and Ustlshim. The maximum likelihood tree was rooted by Mbuti (-root Mbuti) and linkage disequilibrium was compensated for by grouping sites in blocks of 500 SNPs (-k 500). A round of global rearrangements and no sample size correction were used with the parameters ‘-global -noss’, allowing 0 to 7 migration events (-m 0–7). For m = 2, we ran 1,000 bootstraps for the maximum likelihood tree (-bootstrap), then assessed 1,000 bootstrapped trees in phylip, using the consense command82 to count the number of times a particular branch or populations clustered for maximum likelihood tree with migration. Results are shown in Fig. 2. The inferred maximum likelihood trees with migrations from 0 to 7 and the corresponding residuals (Supplementary Fig. 6) were visualized with an R script from Treemix v.1.13.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.