Fig. 1: SI concept and SI metric calculation in CMC-HBCC.
From: Bipolar patients display stoichiometric imbalance of gene expression in post-mortem brain samples

a Illustration of the SI concept in the simplified scenario of only two genes. Gene X and Y have correlated expression and the solid line depicts the linear regression fit to the HC samples. “res” = residuals from the regression line. Here, the residual is a measure of the extent to which a gene Y’s expression is in stoichiometric imbalance with gene X (with a residual of 0 indicating perfect stoichiometric balance). b HC and BD samples have a similar range of absolute expression levels for gene X. For gene Y, BD levels are higher than HC, but perhaps not significantly so. However, the residuals (of gene Y modelled as a function of X) are clearly higher in BD samples with no overlap in the boxes. c Illustration of the resampling strategy, exemplified with the CMC-HBCC dataset. The gene expression model cannot be fitted and tested on the same HC samples, as this would introduce a bias for lower residuals in the HC samples relative to BD. Instead, the HC samples are split into a set used for fitting the models and a test set for which residuals will be calculated. This ensures that when applying a gene model to compute a sample’s residual, the sample was not used in fitting the model. In order to avoid the final result being determined by one random sampling and in order to obtain residuals for all HC samples, we iteratively perform random sampling, fitting, and residual calculation. Residuals are then averaged across iterations and standardised, before being aggregated in the SI score.