Extended Data Fig. 2: Associations of taxon relative abundances, α-diversity, and microbial composition principal components with human genetic variants.
From: Human and bacterial genetic variation shape oral microbiomes and health

a, Genome-wide associations with relative abundances of 1,262 taxa observed in >10% of SPARK samples (n = 12,519). For each genetic variant, the most significant p-value is shown (across the 1,262 tests); the red line indicates the study-wide significance threshold (p < 4.0x10−11). Protein-altering variants and copy number variants of note are highlighted: nonsense (red squares), missense (green triangles), and multi-allelic CNVs (blue diamonds). b, Genome-wide associations with α-diversity (Shannon entropy) in SPARK. c, Quantile-quantile plot of p-values computed by our mPC-based test for associations between human genetic variants and oral microbiome composition (Fig. 2b). The genomic inflation factor λGC was calculated as the median chi-square statistic divided by F−1(0.5), where F−1(x) is the inverse cumulative distribution function for a \({\chi }_{439}^{2}\) random variable. d, Associations of 11 lead variants identified by the mPC-based test (red) and 1,000 randomly selected variants (black) with dissimilarity of relative abundances for the 439 most prevalent species using multivariate distance matrix regression (MDMR, y-axis) as compared with our mPC-based test (x-axis). e, Analogous to d, for dissimilarity after applying the centered log-ratio transform to relative abundance measurements. f, Associations of AMY1 copy number (y-axis) with each of the 439 individual microbial principal components (x-axis). g-p, Analogous to f, for the other 10 lead variants identified by the mPC-based GWAS. P-values were computed using two-sided linear mixed models (a,b,f-p), one-sided chi-squared test (c; x-axis of d,e), or one-sided multivariate distance matrix regression (y-axis of d,e).