Fig. 3: Quantitative modelling predicts single-mutation effects on splice isoforms. | Nature Communications

Fig. 3: Quantitative modelling predicts single-mutation effects on splice isoforms.

From: High-throughput mutagenesis identifies mutations and RNA-binding proteins controlling CD19 splicing and CART-19 therapy resistance

Fig. 3

a Based on the experimentally measured frequencies of five major isoforms in 9321 minigene variants (top box), a softmax regression model was formulated to estimate 4255 single-mutation effects (middle box) using L1 penalisation. Splicing-affecting mutations were selected for each isoform based on their respective empirical WT frequency distribution using the 2.5% and 97.5% quantiles as cutoff. b The model performs well in fitting and 10-fold cross-validation. Bars show Pearson correlation coefficients between model and data for two replicates and each of the five isoforms across all combined mutation minigenes considered in model training and validation, respectively (Supplementary Fig. 4a, b). c Splicing-affecting mutations accumulate in distinct regions around exons 2 and 3. Landscape of model-predicted single-mutation effects on five major isoforms. Predicted isoform frequencies are plotted as a function of the position of a mutation. Colours indicate nucleotide substitution of splicing-affecting point mutations (see legend), and non-effective mutations (grey). d Zoom-in shows model-predicted delta inclusion isoform frequency (frequency for a point mutation - frequency in WT) for nucleotides 445–552 of the minigene. Splicing-affecting mutations are highlighted as filled circles. e Model validation by splicing analysis of 19 minigene variants containing single point mutations. Isoform frequencies (in %) of the five major isoforms (see legend) are shown as mean values of three biological replicates (error bars, s.d.m.). ‘NALM-6’, splicing pattern of WT minigenes (RNA-seq) in the mutagenesis screen, ‘HEK293′, RT-PCR-based quantification of the baseline minigene containing mutation G742C (see Methods) in HEK293 cells. G748C* is a minigene containing G748C but lacking G742C. Schematic representation of CD19 minigene (top) highlighting mutated regions (red rectangles). Error bar represent s.d.m., n = 3 replicates. f Splicing outcomes from e (y axis) are related to single-mutation predictions of the regression model (x axis; mean of two fits, each explaining one mutagenesis replicate). Changes in isoform frequency of the major isoforms (see legend) are expressed as differences (delta) relative to the baseline. Pearson correlation coefficient and P value (two-sided) were calculated over all isoforms (see Supplementary Fig. 5c for correlations of individual isoforms). Source data are provided as a Source Data file.

Back to article page