Fig. 1: Quality control and identification of confounding variables.

A Distribution of immune cell types across all 231 PBMC samples. Cell types are shown on the x-axis, and the percentage of the cell population that is described by each cell type is shown on the y-axis. Each box represents all 231 samples. B Bar chart showing the number of significantly different genes (DESeq2 adjusted p value < 0.01) across all clinical parameters with at least five significant genes. C Gene expression heatmaps highlighting the size and consistency of the confounding effects of age (left), gender (middle), and BMI (right) on the PBMC RNA-seq data. Samples are given by column and differentially expressed genes (adjusted p < 0.01) by row. Colour intensity indicated row scaled (z-score) gene expression, with blue as low and yellow as high. D Gene expression boxplots of the most significantly different gene between youngest and oldest (ROBO1), male and female (ZFY), and lowest and highest BMI (CA1). Sample groups are shown on the x-axis and gene expression values (Corrected DESeq2 normalised counts) on the y-axis.