Extended Data Fig. 3: Principal component analysis (PCA) comparing linguistic groups in BIGCS and nine present-day Asian linguistic families in AADR. | Nature

Extended Data Fig. 3: Principal component analysis (PCA) comparing linguistic groups in BIGCS and nine present-day Asian linguistic families in AADR.

From: The Born in Guangzhou Cohort Study enables generational genetic discoveries

Extended Data Fig. 3: Principal component analysis (PCA) comparing linguistic groups in BIGCS and nine present-day Asian linguistic families in AADR.

(a-b) Geographic distribution of the 4,053 BIGCS participants (a) and all the 836 Chinese samples from the AADR dataset (b). (c-d) PCA was conducted on a merged sample comprising 2,245 present-day unrelated Chinese individuals from the BIGCS dataset, 402 present-day Chinese individuals from the AADR dataset, and 202 present-day Asian groups from the AADR dataset. Each data point on the PCA plot represents one participant, with colors and shapes denoting their linguistic or ethnic groups. In plot (c), nine different shapes were used to represent nine linguistic families. In (d), the shapes remained consistent, with additional colors assigned to each linguistic family to represent linguistic groups. The analysis utilized 258,552 biallelic sites and applied the following pruning and filtration parameters: “--maf 0.01 --geno 0.3 --hwe 1e-6 --vcf-half-call m --indep-pairwise 1000 100 0.9”. The map in panels a and b was sourced from an approved standard map service (http://bzdt.ch.mnr.gov.cn) endorsed by the Ministry of National Resources of the People’s Republic of China (GS YUE(2023)1422).

Back to article page