Fig. 4: Distributions of the predicted class labels for each promoter library on the train (left) and test (right) set.

Different distributions are separated by the true class labels of the sequences. For the test sets, the weighted mean absolute error, weighted accuracy and Spearman correlation are given in Table 1. The distributions, which are centered around the true class labels, demonstrate the advantage of using a model for ordinal regression. The overlap of the different distributions can at least partially be attributed to the expected intrinsic and extrinsic noise on the labels. (σ sigma factor, WT wild type). Source data are provided as a Source Data file.