Fig. 5: Archaic hominin versus human haplotype diversity. | Nature Communications

Fig. 5: Archaic hominin versus human haplotype diversity.

From: Evidence for opposing selective forces operating on human-specific duplicated TCAF genes in Neanderthals and humans

Fig. 5

Haplotypes were inferred using 1,275 SNVs in the three unique diploid sequences around the TCAF SD region (Fig. 1). a Haplotype-based principal component analysis (PCA) was performed, followed by haplotype clustering and cluster visualization using t-SNE—a dimension-reduction technique (Methods). On the t-SNE plot, each dot/triangle is a haplotype and colored according to population/species origin. Neanderthal and Denisovan haplotypes are indicated by the black and blue triangles, respectively. Numbers and ellipses in the 3D t-SNE plots indicate individual clusters (Supplementary Figs. 27–32). The zoom-in above the 3D t-SNE shows that all archaic haplotypes are in close proximity to each other and associate with cluster 1. b The maximum likelihood phylogeny was constructed using 10 randomly selected haplotypes from the 12 inferred clusters, in addition to eight archaic and one chimpanzee haplotypes. Note that the branch length of chimpanzee (dashed line) is truncated by 90% of its actual length for the purpose of illustration. c Comparisons of the Tajima’s D distributions between the TCAF locus and the entire genome in human and archaic populations. Tajima’s D statistics are computed for individual archaic and modern groups across the entire genome (gray) versus TCAF (red) based on 2 kbp windows. For each boxplot, the lower and upper hinges correspond to the first and third quartiles, respectively, in addition to the median value. The upper and lower whiskers show 1.5 * IQR from the upper and lower hinges, respectively, where IQR is the inter-quartile range defined as the distance between the first and third quartiles. P values (one-tailed) are empirically determined by testing if the observed values at TCAF (29 windows) are significantly more negative (green) than those from the genome (1,224,349 windows), except for the American and Melanesian groups, where their p values indicate if the observed D values at TCAF are significantly more positive (italic) than the those from the genome. Bold fonts indicate Bonferroni’s p values (one-tailed) < 0.05.

Back to article page