Fig. 2: Comparison of the generative power of computational models. | Nature Communications

Fig. 2: Comparison of the generative power of computational models.

From: Exploring the space of self-reproducing ribozymes using generative models

Fig. 2

a Experimental activity as a function of the number of mutations relative to Azo, by bins of 5 mutations (N = 150 per bin). Top: Statistical learning and hybrid models; Bottom: structure-based models. Per bin, dots are the mean activity, and quartile bars correspond to the first and third quartiles. N per bin per model provided in Source Data file. The red dashed line is the active threshold set at a z-score of 3.09 or equivalently a p-value = 103, which corresponds to an activity of −2.76. b Fraction of designs that were catalytically active for each model (Active Fraction) as a function of the number of mutations, by bins of 5 mutations. Dots are the active fraction. Error bars upper (lower) bound is the active fraction including (excluding) activity scores within the 98.5 percentile of the measurement error distribution around the threshold. N per bin per model is provided in the Source Data file. c Effective support size of models as a function of mutational distance, showing how many different sequences the model can generate at any given distance. The star indicates the estimation at Lmax. The star values have been corrected with the experimental active fraction at Lmax, which is of order 1%. Inset: zoom on the experimental correction at Lmax for DCA T = 1. d Principal Component Analysis (PCA), with the first two principal components, PC1 and PC2, shown for the generated sequences (gray) overlayed with active sequences (color). All panels are projected on the same axis system, which is the first two principal components of chimeric sequences (analogs of natural Group I introns), with the Azoarcus sequence located at the origin. Source data are provided as a Source Data file.

Back to article page