Fig. 3: Genetic variability and genetic burden in the BIG cohort.
From: Insights from the Biorepository and Integrative Genomics pediatric resource

a Joint principal component analysis of genetic data from individuals in the BIG and in the 1000 Genomes populations, represented separately for clarity. Colors represent inferred genetic ancestry. The first two principal components explain 76% of the variance captured by the first 20 PCs. b Number of variable sites per genome compared to the reference sequence as a function of inferred ancestry. c Estimate of the number of novel variants by individuals per ancestry with indication of variants private to the ancestry (d) Count of rare novel variants by ancestry segments. Individuals in admixed groups are represented twice (e) Proportion of known and novel variants across different impact categories (top panel). Data are presented as ratios of variant counts to total variants, with known variants (n = 6,114,914) in light blue and novel variants (n = 771,717) in purple. The bottom panel shows logistic regression coefficients comparing the likelihood of variants being novel across impact categories, with MODIFIER serving as the reference level. Error bars represent 95% confidence intervals. Asterisks indicate statistical significance (***p < 0.001). Detailed statistics from this logistic regression analysis are presented in Supplementary Table 3. f Rare deleterious-to-synonymous variant ratio across inferred ancestries. The peaks and spreads of these distributions highlight variation in the frequency of deleterious mutations across ancestries, reflecting potential differences in genetic diversity, mutation load, and evolutionary pressures. g Count of rare deleterious variants in EUR-AMR admixed individuals (n = 426), which have the highest deleterious-to-synonymous ratio. Variant counts are assigned based on the inferred ancestry of the genomic regions where they are found. This means individuals are counted twice: once for their AMR ancestry regions and once for their EUR ancestry regions. Statistical comparison was performed using a two-sided Wilcoxon rank-sum test with exact p-value = 2.2e-16. No adjustments were made for multiple comparisons.