Fig. 4: Quantitative prediction of RBS activity with SAPIENs.

a Schematic architecture of a single ResNet. One-hot encoded 17-bp RBS sequences are fed into three residual blocks, composed of two convolutional layers (conv1/2), and two sets of two fully connected layers (FC1α/β, FC2α/β). Yellow and purple boxes represent the output of the convolutional and fully connected layers, respectively. The gray box represents the output of the flattening operation. The model yields a probability distribution of IFP0–480 min for each sequence from which the predicted IFP0–480 min value (mean µ) and an uncertainty estimate (s.d. σ) are calculated. SAPIENs is a combination of ten individually parametrized ResNets. b Comparison of predictive performance of single ResNet and SAPIENs with classical ML models trained on the same set of 248,451 RBSs (see “Methods” section). MAE: mean absolute error, RMSE: root-mean-square error. c Comparison of IFP0–480 min values predicted by SAPIENs with experimental values measured by uASPIre. Test-set sequences were binned (bin size: 0.05) according to measured IFP0–480 min. Violins comprise percentiles 0.5–99.5 of sequences with median and outliers shown as white circles and blue dots. Black bars contain the 25th to 75th percentiles. d Cellular Bxb1-sfGFP concentrations are reliably predicted from experimentally determined (light green circles) and predicted (dark green circles) IFP0–480 min values as shown for the 31 internal-standard RBSs. IFP0–480 min values (n = 1) were converted into the slope of the cell-specific Bxb1-sfGFP signal between 0 and 290 min after induction relying on the logistic fit parameters determined earlier (Fig. 2e). e Dependence of predictive performance of different ML models on the size of the training data set (see “Methods” section). Predictive performance is evaluated with four metrics: percentage of predicted IFP0–480 min values within 2-fold error, MAE, RMSE and R2. The color scheme indicates performance, from dark blue (best) to dark red (worst). f The confidence intervals of the predicted probability distributions (horizontal axis) fully assess the uncertainty of the prediction values (vertical axis), i.e. x% of the predicted values lie within x% confidence interval. The obtained values (blue circles, n = 1) are well aligned with a theoretical perfect uncertainty assessment (dotted line). Source data for b–f are available as a Source Data file.