Fig. 3: Analysis of toehold-switch performance using multilayer perceptron (MLP) models. | Nature Communications

Fig. 3: Analysis of toehold-switch performance using multilayer perceptron (MLP) models.

From: A deep learning approach to programmable RNA switches

Fig. 3

a Sequence logos for k-mer motifs discovered to be disproportionately represented in weakly induced switches (low ON) and leaky switches (high OFF), functional proportions, and E-values. b The Pearson correlation (left, |max| = 0.4) and R2 metric (right, |max| = 0.16) for 30 state-of-the-art thermodynamic features and obtained RBS Calculator v2.1 outputs. c Base architecture of investigated MLP models, featuring three fully connected layers. For training in regression mode, three different outputs were predicted (ON, OFF, ON/OFF), whereas for classification training, only a single binary output based on ON/OFF (threshold at 0.7) was predicted. d Box-and-whisker plots for R2 between experimental and regression-based predictions for best-performing rational features, logistic regression models and MLPs using tenfold cross-validation (test sets randomly selected from quality control process #2, QC2 in Supplementary Fig. S13 and Supplementary Table 1). e Box-and-whisker- plots for mean absolute error (MAE) between experimental and predicted values for these same models. f Box- and-whisker plots for the area under the curve (AUC) of the receiver–operator curve (ROC) and the precision-recall curve (P–R) in classification-mode predictions compared to experimental values using threefold cross-validation (test sets randomly selected from quality control process #2, QC2 in Supplementary Fig. S13 and Supplementary Table 1). In both regression and classification, the one-hot encoded sequence MLP delivered top-in-class performance without using pre-computed thermodynamic or kinetic metrics. g ROC curves of pre-trained MLP classification models validated with an unseen 168-sequence external dataset from Green et al.2. For all box-and-whisker plots, the horizontal line indicates the median, box edges are at the 25th and 75th percentiles, and whiskers indicate the smaller of either 1.5 × IQR or max/min. All source data are provided as a Source Data file.

Back to article page