Extended Data Fig. 6: Diagnostic plots of RAiSD predictions, and GO-clustering of protein-coding genes harboring adaptive out-of-Africa-associated signals.

(a-b) Comparison of high-scoring top signals predicted with RAiSD in out-of-Africa populations, at the global population scale, using two different score threshold methods. (a) The bar plot in Y2-axis shows the total number of high-scoring top outliers within hard selective sweeps obtained with five equivalent cutoffs, as calculated with a “percentile threshold” (for example, only the high-scoring top 1% signals are retained) and with an “FDR-adjusted p-value threshold” (for example, only the high-scoring top signals with FDR < 5% resulting in false positives are retained). The Y1-axis shows the proportion (%, dots) of intersected protein-coding genes harboring high-scoring top signals from each threshold method and across equivalent cutoffs. (b) The bar plots show the distribution of the number of peak positions (outliers) within hard selective sweeps that are mapping protein-coding genes for equivalent cutoffs, as obtained with a top 1% percentile score threshold (left) and with an FDR-adjusted p-value <5% score threshold (right). Note that most genes harbor several high-scoring top outliers (>2) with either method. (c) The number of “Aaa molecular signature” genes obtained from the intersection of RAiSD, PCAdapt and MKT-DoS methods is shown by different percentile cutoffs applied for the high-scoring top signals detected with RAiSD. (d) A GO enrichment analysis is shown for 185 “Aaa molecular signature” protein-coding genes with an annotated GO-term; categories with a p-value <0.05 threshold from the weighted-Fisher test were considered significantly enriched. P-values were not adjusted for multiple testing, as recommended in Alexa et al. (2006)184. For each GO-term, the significance level (black line, top Y-axis) and the observed-expected ratio of genes annotated to the respective GO-term (black bars, bottom Y-axis) are plotted. (e-g) Clustering of the enriched GO-terms for the predicted protein-coding genes harboring adaptive out-of-Africa-associated signals is shown separately for (e) RAiSD, (f) PCAdapt and (g) MKT-DoS, and shows the convergence into five major functional categories: chemosensory (blue), neuronal (red), metabolic (green), regulatory (black) and others (purple). Note that several of the analyzed genes lack of an annotated or predicted GO-term function. The results of GO enrichment analyses from the selection methods are available in Supplementary Tables 15, 17, 19, 22, 23 and 26; and the full list of GO-terms and merged GO information, which was also used to plot (e-g), is available at the GitHub repository: https://github.com/naborlozada/Aaegypti_domestication.