Fig. 5: CVAE-generated sequences bind potently to daunomycin yet are unrelated to experimental sequences. | Nature Communications

Fig. 5: CVAE-generated sequences bind potently to daunomycin yet are unrelated to experimental sequences.

From: Generating experimentally unrelated target molecule-binding highly functionalized nucleic-acid polymers using machine learning

Fig. 5: CVAE-generated sequences bind potently to daunomycin yet are unrelated to experimental sequences.

a MST characterization of the binding affinity of CVAE-generated daunomycin-binding HFNAPs identified directly or with a single round of high-stringency selection. Sequences were found to bind with Kd = 13–15 nM and Kd = 9–26 nM respectively. Values and error bars reflect mean and SEM for n = 3 independent replicates. b UMAP projection (a non-linear method of dimensionality reduction) of reference set sequences, selection 2 daunomycin-binding HFNAP sequences, and CVAE-generated daunomycin-binding HFNAP sequences. 3072 reference-based random sequences and 3072 truly random sequences were included in the distance matrix and embedding but omitted from the final UMAP plot for clarity. Reference set sequences form clusters, indicating sequence similarity. Selection 2 daunomycin-binding HFNAPs, which are sequence related to the reference set, are found within these clusters. The selection 2 daunomycin-binding HFNAPs used were Dm-HS-1, 2, 3, 6, 7, 8, and 10. Dm-HS-4 and Dm-HS-5 were omitted for sequence similarity to Dm-HS-1, and Dm-HS-9 was omitted for sequence similarity to Dm-HS-3. c Aggregate SCI (structure conservation index) scores for CVAE-generated sequences, reference-based random sequences, and selection 2 sequences to Dm-HS-1-5 and Dm-HS-9. SCI ≈ 1 indicates complete conservation and SCI = 0 or less indicates lack of structure conservation. Many of the top 25 CVAE-generated sequences from the competition selection demonstrate structure conservation with Dm-HS-1-5 and Dm-HS-9, while structure conservation is rarer for reference-based random sequences. CVAE-generated sequences achieve increased numbers of positive SCI scores and larger SCI scores. Randomly chosen CVAE-generated sequences demonstrate increased structure conservation to Dm-HS sequences compared to the structure conservation between randomly chosen reference-based random sequences and Dm-HS sequences. Median values are indicated using solid black lines. Five sequences with SCI values < −4 are omitted for clarity. Two CVAE-generated sequences from the competition selection, two randomly chosen CVAE-generated sequences, and one reference-based random sequence from the competition selection.

Back to article page