Extended Data Fig. 4: 6-mer motif analyses and C3PO learned sequence features from MPIVA.

Counts of 6-mers from (a) L3 and (b) SVL backbones are plotted alongside the nucleotide content of significantly enriched 6-mers in the top sensitive (left logo) and resistant (bottom logo) 10,000 CS variants. Sequence logos use DNA-encoding of RNA nucleotides. Top 10,000 resistant and sensitive sequences were converted into their 6-mer counts. 6-mers in the top 10,000 resistant and sensitive sequences were found to be significant by a binomial test with a null hypothesis of probability of success = 0.256 and alternative hypothesis of > 0.256. p-value threshold was adjusted by the number of possible k-mers, 46, and thus significant 6-mers must have p-values ≤ 0.05/46. The nucleotide content of significant resistant and sensitive 6-mers are shown next to their respective axes. (c) C3PO’s layer 1 filters’ max activation sequence consensus and correlations with 12.5 μM Compound 2 sensitivity predictions. Related to Fig. 4d. Convolutional layers 1 and 2 were analyzed similarly to a previously published analysis of a CNN that predicts alternative polyadenylation (APARENT). In brief, every filter in both convolutional layers were correlated with predictions of drug sensitivity at the 12.5 μM dose. The top 5,000 input sequences from the training set that achieved maximal filter activation were put into a position weight matrix and used to generate position-aware consensus sequence logos. Pearson’s r plots of each filter’s activations with predicted 12.5 μM Compound 2 sensitivity at each position are plotted below these filter-specific sequence logos. Layer 1 filters are 8 positions wide, and layer 2 filters are 15 positions wide. Note that the convolutional layers in C3PO contain even zero-padding to maintain an input/output size of 25. The padding should be accounted for when analyzing the filters’ Pearson r plots. For example in layer 1, the RNA sequences are padded with 4 0′s on both the left and right, and the first position in the correlation plots corresponds to 3 0′s and 5 nts of the randomized region. (d) C3PO’s layer 2 filters’ max activation sequence consensus and correlations with 12.5 μM Compound 2 sensitivity predictions. Related to Fig. 4e.