Extended Data Fig. 7: Basal spliceosomal recruitment in vitro and ex-vivo.

(a) Fit of Model I to the data. Note that the model cannot explain the differences in the GFP condition among sequence groups. The model is the line, the dots the median of the data, and the shaded area the interquartile range. Same fit procedure as in Fig. 7c–e, but assuming that groups differ in c4 instead of c2. (b) Distribution of MSE (Mean Squared Error) for the best parameter sets found in each of 500 independent optimization runs, assuming groups differ in either c2 or c4, when the model was fitted either to the experimental data or to 9 random datasets with synthetic data constructed by permuting the median experimental PSI values across experimental conditions. (c) Frequency of the distance between the SRRM4 CLIP tags and the dinucleotide AG. (d) Number of events with or without SRRM4 CLIP tags in the 200 nts preceding the microexons. Statistics: two-sided Fisher’s Exact test. (e) SRRM4 CLIP tag densities in the last 150 nts of the upstream intron of endogenous HS and LS events, data from9. (f) Representation of the experimental approach used to investigate the efficiency of A spliceosomal complex formation in vitro. Briefly, RNA corresponding to the sequences cloned in Fig. 1c were transcribed and incubated in HeLa cells nuclear extracts under splicing conditions10. The H and A spliceosomal complexes were resolved by electrophoresis and RNA contained in A complex was isolated. The bands cut from the gels are highlighted by the red squares. Both the input and isolated RNA population (output) were amplified by RT-PCR assays and analyzed by deep-sequencing. Primers used for both input and output amplifications are depicted with the green arrows and amplicons in green line. (g–i): For 71 LS and 73 HS events: (g) PSI in GFP condition, (h) MAXENT3, MAXENT5 entropy scores and (i) SpliceAI scores of acceptor and donor sites. (j–m) For wild-type sequences of 71 LS microexons, 73 HS microexons and 27 42 nts long CS exons Mean ESRseq score (j), Number of ESE (k) or ESS (l) elements normalized by the exonic length and of ISR (m) elements identified in the 93 last nucleotides of the upstream intron and 25 first nucleotides of the downstream intron normalized by the total intronic length. Fisher tests for contingency tables: For ESE: LS-vs-HS, P = 0.035; LS-vs-CS = 1.3e-03; HS-vs-CS = 0.108. For ESS: LS-vs-HS, P = 0.181; LS-vs-CS = 0.010; HS-vs-CS = 2.5e-4. (n–p) The boxes represent the square root of the peak intensities of U2 snRNP binding in the 93 nts preceding the LS and HS microexons (Fig. 7g) as a result of either SF3A3_Flag_IP-seq (n), RBM5_RNP-seq (o) or RBM10_Flag_IP-seq (p). (q) Expression of genes (in cRPKMs) presented in (Fig. 7f, g, and Extended Data Fig. 7g, h). (r) U2 snRNP peaks detected at the full transcript level. Statistics in (a–c, e, g–p): Mann-Whitney-Wilcoxon test two sided, in (q, r): Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction (ns: non significant).