Fig. 1: Population structure and genetic affinities of South-Eastern Bantu-speaking (SEB) groups from South Africa correspond to both linguistic phylogeny and geographic distribution.
From: Genetic substructure and complex demographic history of South African Bantu speakers

a Map showing the language majority areas (LMAs) of each SEB group. The centroid of each of the regions is indicated using a black dot. The three sampling sites are shown in coloured circles; Soweto in blue, Dikgale in orange and Agincourt in yellow. The original map was obtained from: https://en.wikipedia.org/wiki/Languages_of_South_Africa#/media/File:South_Africa_2011_dominant_language_map.svg. The user acknowledges Stats SA as the source of the basic data wherever they process, apply, utilise, publish or distribute the data, and also that they specify that the relevant application and analysis (where applicable) result from their own processing of the data. The language centroid points were calculated for this study (see methods for details). b Principal Component (PC) plot for the unrelated SEB samples (Pedi N = 1065, Sotho N = 366, Swazi N = 126, Tsonga N = 1644, Tswana N = 242, Venda N = 73, Xhosa N = 177 and Zulu N = 626) shows an overall correspondence between the distribution of SEB groups on the geographic map and the PCA. The colours showing the LMA for each SEB group on the geographic map corresponds to the colours used for the SEB group in the PCA. c PC plot based on ethno-linguistically concordant samples (self-reported ancestry of the participant is the same as at least 5 of the parents and grandparents) (Pedi N = 851, Sotho N = 46, Swazi N = 30, Tsonga N = 1438, Tswana N = 73, Venda N = 24, Xhosa N = 63 and Zulu N = 177) shows much clearer separation between the three major linguistic divisions (Sotho-Tswana, Nguni, and Tsonga speakers). d A composite representation of the first 10 PCs (generated using PCA-UMAP) also shows separation of the SEB groups corresponding to the three major linguistic divisions. e UPGMA tree based on pairwise FST distance between SEB groups. Sample sizes are same as of panels c. f, Linguistic phylogeny based on lexical data (majority-rule consensus tree) with posterior probability values. The SEB groups from the current study are indicated using the same colours as used in the PCA plots. The topology of the trees in e and f shows an overall alignment.