Fig. 3: Diversity explored by the generated sequences.
From: Exploring the space of self-reproducing ribozymes using generative models

a Upper panel: Violin plots of minimal distance to the set of chimeric sequences for generated sequences (light shades) and active sequences (solid shades), per model, with identical colors as in Fig. 2b; The average of distances for chimeric sequences is represented by a dashed line. Lower panel: Violin plot of distances between all pairs for generated sequences (light shades) and active sequences (solid shades), per model. b Mutational enrichment from chimeric sequences. Top: At each position (x-axis), Azoarcus ribozyme nucleotides are marked by a black dot, and shades are the log ratio of mutation frequencies relative to Azoarcus ribozyme in chimeras (green) and in active variants over all models (red). Bottom: effective number of nucleotides in chimera (green), candidate designs (black), and active designs (red). c Mutational enrichment due to the structure: Effective number of nucleotides per position in DCA T = 1 (blue) and DCA-SB (purple). d Top: PCA projection of DCA (left) and DCA-SB sequences (right), on the 2 principal components of the DCA T = 1 set. Active sequences are colored and inactive ones are gray. Bottom: same but in the principal components of the DCA-SB set, revealing an upper cluster (Cu) populated by DCA-SB only, distinct from the cluster below (Cd) that comprises all DCA sequences. Inset: Distribution of structure scores in the two clusters, showing that Cu has a significantly better structure score (***, one-sided t-test, p-value = 10−16, numerical precision). Source data are provided as a Source Data file.