Fig. 2: Confounding due to unmodelled factors.

a Relationship between the number of inferred factors and tissue sample size. Fitted line (r ≈ 0.95, p ≈ 5.4 × 10−26) corresponds to a linear least-squares regression. The two-sided p-value is based on the null hypothesis that the slope is zero, using the Wald Test with t-distribution for the test statistic. b The difference in the variance of the distribution of Pearson correlation values for each tissue over all genes, before and after correction. Empty cells correspond to tissues in which only one value of the confound is available. The "Cohort'' variable undergoes the most substantial change after the correction across all tissues. c Distribution of Pearson correlation between the expression of a gene in whole blood and age, before and after correction. After the correction, the correlation values move towards zero and show considerably less dispersion. d The p-value distribution from panel c’s, in logarithmic space. The enrichment for significant (low) p-values is greatly attenuated after the correction, suggesting that unmeasured variables can induce spuriously significant correlations.