Abstract
Microexons are essential for the proper operation of neurons and pancreatic endocrine cells, in which their inclusion depends on the splicing factors SRRM3 and SRRM4 (SRRM3/4). However, in pancreatic cells, lower expression of these regulators limits inclusion to only the most sensitive subset among all neuronal microexons. Although various cis-acting elements can contribute to microexon regulation, how they determine this differential dose response and the corresponding high or low sensitivity to SRRM3/4 remains unknown. Here we use massively parallel splicing assays probing 28,535 variants to show that sensitivity to SRRM4 is conserved across vertebrates. Our data support a regulatory model whereby high or low microexon sensitivity is largely determined by the interplay between core splicing architecture and length constraints. This conclusion is further supported by distinct spliceosome activities in the absence of SRRM3/4 and by a mathematical model that assumes that the two types of microexons differ only in their efficiency to recruit early spliceosomal components.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout







Similar content being viewed by others
Data availability
The datasets supporting the conclusions of this article were deposited to the Gene Expression Omnibus (GSE276143) repository. Source data are provided with this paper.
Code availability
All original codes used to generate the results in this paper were deposited to GitHub repositories (https://github.com/simon-bt/lme-mapsy and https://github.com/theobiolab/SRRM4_MapSy_2024). The code deposited is meant to allow reproducing the results of the paper. However, given its limited annotation and its specificity to the analyses of this paper, it is not meant to be usable for the community for other MaPSy applications.
Change history
01 September 2025
In the version of this article initially published, there was an error in the Author contributions: Software line which now includes Simon Bajew and Rosa Martinez-Corral in the HTML and PDF versions of the article.
References
Ule, J. & Blencowe, B. J. Alternative splicing regulatory networks: functions, mechanisms, and evolution. Mol. Cell 76, 329–345 (2019).
Baralle, F. E. & Giudice, J. Alternative splicing as a regulator of development and tissue identity. Nat. Rev. Mol. Cell Biol. 18, 437–451 (2017).
Mantica, F. & Irimia, M. The 3D-Evo space: evolution of gene expression and alternative splicing regulation. Annu. Rev. Genet 56, 315–337 (2022).
Juan-Mateu, J. et al. Pancreatic microexons regulate islet function and glucose homeostasis. Nat. Metab. 5, 219–236 (2023).
Irimia, M. et al. A highly conserved program of neuronal microexons is misregulated in autistic brains. Cell 159, 1511–1523 (2014).
Nakano, Y., Wiechert, S. & Bánfi, B. Overlapping activities of two neuronal splicing factors switch the GABA effect from excitatory to inhibitory by regulating REST. Cell Rep. 27, 860–871 (2019).
Ciampi, L., Serrano, L. & Irimia, M. Unique transcriptomes of sensory and non-sensory neurons: insights from splicing regulatory states. Mol. Syst. Biol. 20, 296–310 (2024).
Wan, R., Bai, R., Zhan, X. & Shi, Y. How Is precursor messenger RNA spliced by the spliceosome? Annu. Rev. Biochem. 89, 333–358 (2020).
Raj, B. et al. A global regulatory mechanism for activating an exon network required for neurogenesis. Mol. Cell 56, 90–103 (2014).
Torres-Méndez, A. et al. A novel protein domain in an ancestral splicing factor drove the evolution of neural microexons. Nat. Ecol. Evol. 3, 691–701 (2019).
Gonatopoulos-Pournatzis, T. & Blencowe, B. J. Microexons: at the nexus of nervous system development, behaviour and autism spectrum disorder. Curr. Opin. Genet. Dev. 65, 22–33 (2020).
Nakano, Y. et al. A mutation in the Srrm4 gene causes alternative splicing defects and deafness in the Bronx waltzer mouse. PLoS Genet. 8, e1002966 (2012).
Torres-Méndez, A. et al. Parallel evolution of a splicing program controlling neuronal excitability in flies and mammals. Sci. Adv. 8, eabk0445 (2022).
Li, Y. I., Sanchez-Pulido, L., Haerty, W. & Ponting, C. P. RBFOX and PTBP1 proteins regulate the alternative splicing of micro-exons in human brain transcripts. Genome Res. 25, 1–13 (2015).
Matalkah, F. et al. The Musashi proteins direct post-transcriptional control of protein expression and alternate exon splicing in vertebrate photoreceptors. Commun. Biol. 5, 1011 (2022).
Gonatopoulos-Pournatzis, T. et al. Genome-wide CRISPR–Cas9 interrogation of splicing networks reveals a mechanism for recognition of autism-misregulated neuronal microexons. Mol. Cell 72, 510–524 (2018).
Lee, J.-S., Lamarche-Vane, N. & Richard, S. Microexon alternative splicing of small GTPase regulators: Implication in central nervous system diseases. Wiley Interdiscip. Rev. RNA 13, e1678 (2022).
Ohnishi, T., Shirane, M. & Nakayama, K. I. SRRM4-dependent neuron-specific alternative splicing of protrudin transcripts regulates neurite outgrowth. Sci. Rep. 7, 41130 (2017).
Rusconi, F. et al. LSD1 neurospecific alternative splicing controls neuronal excitability in mouse models of epilepsy. Cereb. Cortex 25, 2729–2740 (2015).
Gupta, A. K. et al. Degenerate minigene library analysis enables identification of altered branch point utilization by mutant splicing factor 3B1 (SF3B1). Nucleic Acids Res. 47, 970–980 (2019).
Mikl, M., Hamburg, A., Pilpel, Y. & Segal, E. Dissecting splicing decisions and cell-to-cell variability with designed sequence libraries. Nat. Commun. 10, 4572 (2019).
Wong, M. S., Kinney, J. B. & Krainer, A. R. Quantitative activity profile and context dependence of all human 5′ splice sites. Mol. Cell 71, 1012–1026 (2018).
Ishigami, Y. et al. Specificity, synergy, and mechanisms of splice-modifying drugs. Nat. Commun. 15, 1880 (2024).
Julien, P., Miñana, B., Baeza-Centurion, P., Valcárcel, J. & Lehner, B. The complete local genotype–phenotype landscape for the alternative splicing of a human exon. Nat. Commun. 7, 11558 (2016).
Baeza-Centurion, P., Miñana, B., Schmiedel, J. M., Valcárcel, J. & Lehner, B. Combinatorial genetics reveals a scaling law for the effects of mutations on splicing. Cell 176, 549–563 (2019).
Baeza-Centurion, P., Miñana, B., Valcárcel, J. & Lehner, B. Mutations primarily alter the inclusion of alternatively spliced exons. eLife 9, e59959 (2020).
Rhine, C. L. et al. Massively parallel reporter assays discover de novo exonic splicing mutants in paralogs of autism genes. PLoS Genet. 18, e1009884 (2022).
Rong, S. et al. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc. Natl Acad. Sci. USA 120, e2218308120 (2023).
Soemedi, R. et al. Pathogenic variants that alter protein code often disrupt splicing. Nat. Genet. 49, 848–855 (2017).
Wang, Z. et al. Systematic identification and analysis of exonic splicing silencers. Cell 119, 831–845 (2004).
Enculescu, M. et al. Exon definition facilitates reliable control of alternative splicing in the RON proto-oncogene. Biophys. J. 118, 2027–2041 (2020).
Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C. & Shendure, J. Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123 (2014).
Fairbrother, W. G. & Chasin, L. A. Human genomic sequences that inhibit splicing. Mol. Cell. Biol. 20, 6816–6825 (2000).
Rosenberg, A. B., Patwardhan, R. P., Shendure, J. & Seelig, G. Learning the sequence determinants of alternative splicing from millions of random sequences. Cell 163, 698–711 (2015).
Adamson, S. I., Zhan, L. & Graveley, B. R. Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency. Genome Biol. 19, 71 (2018).
Ke, S. et al. Saturation mutagenesis reveals manifold determinants of exon definition. Genome Res. 28, 11–24 (2018).
Baeza-Centurion, P. et al. Deep indel mutagenesis reveals the regulatory and modulatory architecture of alternative exon splicing. Preprint at bioRxiv https://doi.org/10.1101/2024.04.21.590414 (2024).
Braun, S. et al. Decoding a cancer-relevant splicing decision in the RON proto-oncogene using high-throughput mutagenesis. Nat. Commun. 9, 3315 (2018).
Gergics, P. et al. High-throughput splicing assays identify missense and silent splice-disruptive POU1F1 variants underlying pituitary hormone deficiency. Am. J. Hum. Genet. 108, 1526–1539 (2021).
Smith, C. et al. High-throughput splicing assays identify known and novel WT1 exon 9 variants in nephrotic syndrome. Kidney Int. Rep. 8, 2117–2125 (2023).
Cortés-López, M. et al. High-throughput mutagenesis identifies mutations and RNA-binding proteins controlling CD19 splicing and CART-19 therapy resistance. Nat. Commun. 13, 5570 (2022).
Chiang, H.-L. et al. Mechanism and modeling of human disease-associated near-exon intronic variants that perturb RNA splicing. Nat. Struct. Mol. Biol. 29, 1043–1055 (2022).
Black, D. L. Does steric interference between splice sites block the splicing of a short c-Src neuron-specific exon in non-neuronal cells? Genes Dev. 5, 389–402 (1991).
Ke, S. et al. Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 21, 1360–1374 (2011).
Cáceres, E. F. & Hurst, L. D. The evolution, impact and properties of exonic splice enhancers. Genome Biol. 14, R143 (2013).
Paronetto, M. P. et al. Regulation of FAS exon definition and apoptosis by the Ewing sarcoma protein. Cell Rep. 7, 1211–1226 (2014).
Damianov, A. et al. The splicing regulators RBM5 and RBM10 are subunits of the U2 snRNP engaged with intron branch sites on chromatin. Mol. Cell 84, 1496–1511 (2024).
Head, S. A. et al. Silencing of SRRM4 suppresses microexon inclusion and promotes tumor growth across cancers. PLoS Biol. 19, e3001138 (2021).
Quesnel-Vallières, M., Irimia, M., Cordes, S. P. & Blencowe, B. J. Essential roles for the splicing regulator nSR100/SRRM4 during nervous system development. Genes Dev. 29, 746–759 (2015).
Quesnel-Vallières, M. et al. Misregulation of an activity-dependent splicing network as a common mechanism underlying autism spectrum disorders. Mol. Cell 64, 1023–1034 (2016).
Capponi, S. et al. Neuronal-specific microexon splicing of TAF1 mRNA is directly regulated by SRRM4/nSR100. RNA Biol. 17, 62–74 (2020).
Ciampi, L. et al. Specialization of the photoreceptor transcriptome by Srrm3-dependent microexons is required for outer segment maintenance and vision. Proc. Natl Acad. Sci. USA 119, e2117090119 (2022).
Kumar, S. et al. TimeTree 5: an expanded resource for species divergence times. Mol. Biol. Evol. 39, msac174 (2022).
Yeo, G. & Burge, C. B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).
Freund, M. et al. Extended base pair complementarity between U1 snRNA and the 5′ splice site does not inhibit splicing in higher eukaryotes, but rather increases 5′ splice site recognition. Nucleic Acids Res. 33, 5112–5119 (2005).
Tapial, J. et al. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res. 27, 1759–1768 (2017).
Sakamoto, H., Inoue, K., Higuchi, I., Ono, Y. & Shimura, Y. Control of Drosophila sex-lethal pre-mRNA splicing by its own female-specific product. Nucleic Acids Res. 20, 5533–5540 (1992).
Corvelo, A., Hallegger, M., Smith, C. W. J. & Eyras, E. Genome-wide association between branch point properties and alternative splicing. PLoS Comput. Biol. 6, e1001016 (2010).
Stadler, M. B. et al. Inference of splicing regulatory activities by sequence neighborhood analysis. PLoS Genet. 2, e191 (2006).
Yeo, G. W., Van Nostrand, E. L. & Liang, T. Y. Discovery and analysis of evolutionarily conserved intronic splicing regulatory elements. PLoS Genet. 3, e85 (2007).
Desmet, F.-O. et al. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37, e67 (2009).
Acknowledgements
We would like to thank J. Valcárcel, all members of the M.I. lab and L. Chasin (Columbia University) for providing a comprehensive list of ESRseq scores, A. Damianov, C.-H. Lin and D. Black (University of California, Los Angeles) for providing the high-molecular-weight peak intensities and discussion on their data47, A. Torres-Mendez for insightful discussion on microexon regulation, L. Iñiguez-Rabago and M. Anglada-Girotto for assistance with bioinformatics, D. Gray for his insights into pharmacological models, the Genomics Unit at the CRG for assistance with the sequencing and the Protein Technologies Unit at the CRG and the CRG Core Technologies Program for their support and assistance in this work. Research for this publication was partially carried out in the Barcelona Collaboratorium for Modeling and Predictive Biology. Animal silhouettes were obtained from PhyloPic. The research was funded by the European Research Council under the European Union’s Horizon 2020 research and innovation program (ERC-CoG, grant agreement 101002275 to M.I.) and Spanish Ministry of Science and Innovation (PID2020-115040GB-I00/AEI/10.13039/501100011033 to M.I.). R.M.-C. acknowledges support from RYC2021-033860-I funded by MCIN/AEI/10.13039/501100011033 and by European Union NextGeneration EU/PRTR. CRG acknowledges support of the Spanish Ministry of Science and Innovation through the Centro de Excelencia Severo Ochoa (CEX2020-001049-S, MCIN/AEI/10.13039/501100011033) and the Generalitat de Catalunya through the CERCA program.
Author information
Authors and Affiliations
Contributions
Conceptualization, S. Bonnal, S. Bajew, R.M.-C. and M.I. Methodology, S. Bonnal, S. Bajew, R.M.-C. and M.I. Software, S. Bajew and R.M.-C. Validation, S. Bonnal. Formal analysis, S. Bonnal and S. Bajew. Investigation, S. Bonnal, S. Bajew, R.M.-C. and M.I. Resources, S. Bonnal and M.I. Writing—original draft, S. Bonnal and M.I. Writing—review and editing, S. Bonnal and M.I. Visualization, S. Bonnal, S. Bajew. and R.M.-C. Supervision, S. Bonnal and M.I. Project administration, M.I. Funding acquisition, M.I.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Structural & Molecular Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editors: Chiara Anania and Dimitris Typas, in collaboration with the Nature Structural & Molecular Biology team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Microexon sensitivity to SRRM4 expression is largely conserved from shark to human.
(a) Different microexon groups were defined according to their response upon titration of SRRM4. Supervised heatmap representing the ΔPSI for each event (rows) under different conditions (columns) with respect to cells expressing GFP. Conditions: HEK 293 Flp-In T-REx cells expressing SRRM4 in a doxycycline dependent manner, CTR are cells not treated with doxycycline (only leaky expression), while LOW, MID, HIGH are cells treated with increasing concentrations of doxycycline (Methods). The membership of the events to either LS/HS/CR/CS/NR microexons is indicated by the colored squares on the left side of the heatmap. The number of events for each category is shown within brackets. (b) Heatmap showing the pattern of splicing of the selected events both in the endogenous and in the MaPSy context (represented as the median of PSI values in the different libraries in which the events were quantified). White squares correspond to missing values due to insufficient read coverage. (c) The lines represent an estimate of the central tendency and the corresponding 95% confidence interval of the ΔPSI (VAR-Hsa) of orthologous WT sequences from all tested species with respect to human (Hsa) in GFP and LOW, MID, HIGH expression of SRRM4 conditions for HS (green) and LS (blue) events accordingly to the defined sensitivity in human (Fig. 1). The species are indicated at the bottom as well as the number of HS and LS events represented per species. Teleosts, which showed more dissimilar inclusion patterns relative to their corresponding human sequences even though their dose response was also conserved, are highlighted using a yellow background as in Fig. 1e. The red lines separate the species for each phylogenetic node. The numbers at the bottom show the results of two-sided Mann-Whitney tests between the corresponding node and all other ones combined for the condition HIGH(VAR-WT). (d) Top: Distribution of PSIs in various tissues for orthologous of LS (blue) and HS (green) events from Mus musculus, Rattus norvegicus, Bos taurus, Gallus gallus and Danio rerio. Bottom: mRNA expression levels (cRPKMs) of Srrm3 (yellow) and Srrm4 (pink) in each species. (e) Schematics of the sequences involved in each swapping experiment. Intronic sequences from either a human event (grey) or another species (black) were swapped to generate the chimeric constructs depicted at the top of each subpanel, where the exonic part is either from human or from another species, respectively. The lines represent an estimate of the central tendency and the corresponding 95% confidence interval of ΔPSI (VAR-WT) between the variants in which the microexon sequence is from human and both flanking introns are from its ortholog in the indicated species (left), or in which the microexon sequence is from a given species and both flanking introns are from its human ortholog (right). Species involved in the shuffling are listed along the x-axis together with the number of HS and LS events represented per species. (f) Correlation of PSI between 6 biological replicates in the condition of HIGH expression level of SRRM4. PSI* correspond to the final output of the quantification pipeline while PSIreg* values are recovered from an intermediate file (see Methods). ‘Rep’ indicates each of the 6 replicates. (g) Correlation of the 268 sequences present in both T1|T2 and T3|T4 in four experimental conditions (GFP and LOW, MID, HIGH expression of SRRM4).
Extended Data Fig. 2 Exon length and exonic cis-acting elements contribute to SRRM4 mediated splicing regulation.
(a) Scheme of the strategy used to increase the length of the microexons. To increase their length, we designed sixteen 36-nt long sequences devoid of potential splicing regulatory motifs such as Exonic Splicing Enhancers (ESE) or Silencers (ESS) according to either the Human Splicing Finder tool61 or data from 44 (“designed sequences”; Methods). Then, for each microexon, we increased the length of the microexon through the stepwise addition of 3 nts (color coded) to its center for each of the designed sequences, up to a maximal exon length of 42 nts (Supplementary Table 4). (b) ESRseq scores provided for each hexamer with a 1-nt sliding window44 for each of the 15 sequences used in Fig. 2. Scores range from -1 to 1 depending on whether they are likely to harbor negative or positive regulatory motifs, respectively. Scores close to 0 are less likely to have splicing regulatory elements. (c–f) ESRseq scores obtained upon length variation of LS (c), HS (d), CS (e) and CR (f) events represented in Fig. 2a–c, Extended Data Fig. 2j, k. (g) Pearson correlation matrices represent the impact of each designed sequence when added into LS and HS microexons under two experimental conditions (GFP in the lower triangle, HIGH in the upper triangle), quantified as ΔPSI (VAR-WT). For both LS and HS, ΔPSI values are averaged by sequence and length variation, and correlations are calculated across all sequence and length combinations. (h) PSI across four experimental conditions (GFP vs LOW, MID, HIGH expression of SRRM4) for each CS WT (42 nts) and their most SRRM4-responding shortened variants (in which the indicated number of nucleotides N have been removed, remN). (i) Two examples of CS events whose shortening results in SRRM4-responding variants (EPS15L1-HsaEX6093676 and POSTN-HsaEX6040016). The arrow indicates the maximal value of ΔPSI (HIGH-GFP). (j) RT-PCR assays showing the splicing patterns of various SNAP25-HsaEX6077184 splicing reporters (WT or following the removal of either 9, 11, 18 or 33 nucleotides) under control condition (-) or upon expression of SRRM4 (+) in HEK 293 cells. The PCR amplicons corresponding to inclusion / skipping are indicated with the squares. PSI and standard deviations (std) from at least three biological replicates per minigene variant are provided. (k) Change in CR exon inclusion upon shortening or lengthening was quantified as ΔPSI (VAR-WT) in either the GFP condition or under HIGH level of expression of SRRM4. Each data point corresponds to the effect of adding a given number of nucleotides from one of the fifteen designed sequences (positive number on the x-axis), or removal of a given number of nucleotides from the exonic sequence (negative number on the x-axis). The length of the WT is indicated by the dashed vertical line. The numbers at the right side of each subpanel indicate the Spearman correlation (ρ) and corresponding p-value (P) of the effects in ΔPSI according to length variation. (l) Impact of length variation on the pattern of splicing and regulation under different levels of expression of SRRM4 with respect to GFP condition for each sequence, expressed as ΔPSI (HIGH/MID/LOW-GFP) for CR events.
Extended Data Fig. 3 Exonic elements contribute to splicing and regulation by SRRM4.
(a) Approach used to perform a semi-deep mutagenesis scan of the exon. Three successive nucleotides (indicated in red) were mutated at the time using 1-nt sliding windows from the 5’ end to the 3’ end of each microexon and following a systematic pattern of nucleotide substitution: A > C, G > T, C > A, T > G. (b–d) For each event and experimental condition (y-axis), the ΔPSI (VAR-WT) upon mutation at different exonic positions (x-axis) is shown as a heatmap (blue/red represents higher/lower inclusion than the WT) (LS in (b) and HS in (c)). For CS events (d), the positions of the mutations in the short exonic version have been aligned to their corresponding WT version of 42 nts (note that because of the removal, the central nucleotides are different). On the right side of panel (d), the line plots represent the PSI in the same experimental conditions for each of the CS events and their corresponding shortened variants. The boxes show the quantification of the perturbations observed on the heatmap. Statistics: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction (ns: non significant). (e) Stacked bar plots for LS, HS, CS and CSrem events in the four experimental conditions counting the number of mutations leading to ΔPSI (VAR-WT) > 10 (POS), ΔPSI (VAR-WT) < -10 (NEG) or without effect (ELSE). (f) Examples of changes in PSI and SRRM4 response for variants of a LS (CLASP1-HsaEX0015478), a HS (CPEB4-HsaEX0016976), a CS (SNHG14-HsaEX1038166) and a shortened CS (ARHGAP23-HsaEX7108093-rem-15) event. (g) Top: Representation of the approach used to introduce ESR hexamers at two positions within the exon. Two ESEs and two ESSs were inserted either at the 5′ end (from position 4 to 9) or at the 3′ end (from position n-9 to n-4, n being the exon length) of the exon as indicated. The sequences of each hexamer are shown (Supplementary Table 6). Bottom: Distributions of ΔPSI (VAR-WT) in GFP vs LOW, MID, HIGH expression of SRRM4 for LS, HS, CS and their corresponding CSrem events. The colors indicate both the hexamer and the position where it was inserted.
Extended Data Fig. 4 Effect of UGC mutation and insertion on splicing regulation of LS and HS events.
(a) Effect on the response to SRRM4 at MID and HIGH conditions of mutating all UGCs in the orthologous variants. The barplots represent the median ΔPSI for all the orthologous WT constructs in either MID or HIGH SRRM4 expression conditions with respect to GFP. Boxplots show the distribution of the ΔPSI of the UGC mutant variants, with each dot representing a UGC variant from a given species. The total number of UGCs considered for the mutagenesis for each event are indicated by the color of the dots. (b) MaPSy-derived PSI values under four experimental conditions (GFP or LOW, MID, HIGH expression of SRRM4) for selected variants of the KIF1C and PUS7 minigenes tested in Fig. 4b. (c) RT-PCR assays showing the splicing pattern of different variants (WT or mutation from the “UGC-addition walk” combined or not with exon length shortening) of three events under control condition (-) or expression of human SRRM4 (Hsa). PSI and standard deviations (std) from at least three biological replicates are provided. (d) Top: Schematics of the location at which 1, 2 or 3 UGCs have been added in the context of the WT sequence. Bottom left: Variation of the ppt_scr of each variant with respect to its corresponding WT upon addition of 1, 2 or 3 UGCs downstream to the best predicted BP. Outliers were removed for visibility. Statistics: Mann-Whitney-Wilcoxon test. Bottom right: Correlation between the changes in ppt_scr and the variation in PSI in the HIGH condition from variant to WT. Statistics: Pearson correlation and p-value are provided. (e) Left: Location at which 1, 2 or 3 UGCs were added in the context of a modified WT sequence in which the BP and Py tract were displaced together either 10 or 20 nts upstream to their initial position. Right: Variation of ss_dist, ppt_scr and PSI in the HIGH condition of each variant with respect to its corresponding WT upon addition of 1, 2 or 3 UGCs in the 10 or 20 nts downstream to the BP and its associated Py Statistics: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction (ns: non significant).
Extended Data Fig. 5 Intronic core splicing elements are relevant for regulation by SRRM4.
(a) The heatmap depicts the effect of mutations with respect to WT [ΔPSI (VAR-WT)] in four experimental conditions (GFP and LOW, MID, HIGH expression of SRRM4) in the 3′ ss of MEF2C-HsaEX0038702 at positions overlapping the SNP (rs1057521717 T > C). Sequences of the last 20 nts of the 3′ss of the WT and the different variants are reported below the heatmap, with the SNP indicated in red. (b) Effect of mutagenesis scan as in Fig. 5a for CS (red), their corresponding short version (CSrem, orange) and other CS (CSlong, grey) events. For the exonic sequence, only the impact of mutating the first (1) or last three exonic nucleotides (n-2) is reported (full set in Extended Data Fig. 3d). The positions of the mutations are indicated along the x-axis according to the nomenclature described in Fig. 5a. (c) The heatmaps depict the mean ΔPSI (VAR-WT) upon mutations (Fig. 5a and Extended Data Fig. 5a) in the 3′ss for LS, HS, CS, their shorter versions (CSrem), and other CS (CSlong) events in conditions of expression of GFP or HIGH level of SRRM4. The positions of the mutations are indicated along the x-axis according to the nomenclature described in Fig. 5a. (d, e) The lines represent an estimate of the central tendency and the corresponding 95% confidence interval of the ΔPSI (VAR-WT) between BP variants (d) or Py variants (e) and their corresponding WT in four experimental conditions (GFP vs LOW, MID, HIGH expression of SRRM4). In d, the best predicted BP was mutated to either 2 strong (stg) or 2 weak BP motifs. Plain lines represent the impact of the mutations on CS events while the dashed lines represent the effect on their corresponding short versions. (f) The lines represent an estimate of the central tendency and the corresponding 95% confidence interval of the PSI in four experimental conditions (GFP vs LOW, MID, HIGH expression of SRRM4) for both LS and HS WT constructs or the corresponding variants in which a UCUC motif was added to the WT sequence when absent (left) or removed when present (right). The total number of events for each type of mutation (n) is indicated in the legend. No significant difference between WT and mutated PSI distributions was observed for any condition (Mann-Whitney-Wilcoxon test). (g) Distributions of the ΔPSI (VAR-WT) in GFP vs MID, HIGH expression of SRRM4 between variants and their corresponding WT for either LS (top left), HS (top right), CS (bottom left) and their corresponding short versions (bottom right) events. Variants include deep mutagenesis at position -3 in the 3′ss and positions 1, 2, 3 at the 5′ end of the exon.
Extended Data Fig. 6 The strength of the 5′ss contributes to splicing and sensitivity to SRRM4.
(a) Scheme of the mutations introduced to decrease complementarity to U1 snRNA or increase complementary to U6 snRNA. The top square represents the exon and its last three nts, the line represents the intron with the splice site “gu” and the following six nts. The sequence of U1 snRNA is provided as well the mutations performed to generate the variants named GT > GC (whereby position +2 is the intron is replaced from U to C) and U1weak (whereby 9 nts involved in base pairing to U1 snRNA have been replaced by the weak 5′ss associated with exon HsaEX6033277). In addition, positions +5 to +9 (U6cons_a) or +5 to +8 (U6cons_b) were mutated to enhance complementarity to U6 snRNA (whose sequence is provided above). (b) The lines within the violin represent the interquartile range of the maximal entropy score for 5′ss (MAXENT5) for the LS and HS events for the WT constructs and their corresponding U6cons_b variants. The median is represented by the dashed line, each dot represents an event. Statistics: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction. (c) The lines represent the PSI across four experimental conditions (GFP or LOW, MID, HIGH expression of SRRM4) for LS (left) or HS (right) events that are either WT or harbor mutations enhancing complementarity to U6 snRNA (U6cons_a, top panel or U6cons_b, bottom panel). Statistics: Mann-Whitney-Wilcoxon test two sided. (d, e) The lines depict the PSI across four experimental conditions (GFP vs LOW, MID, HIGH expression of SRRM4) for PLS3-HsaEX0048492 (d) and NRBP1-HsaEX0043752 (e). The nature of the variant (WT or extended U1cons (U1cons_a)) is indicated by the color. The corresponding RT-PCR assays are provided in Fig. 6e. (f) RT-PCR assays showing the splicing patterns of different minigenes under control condition (-) or under expression of human SRRM4 (+) in HEK 293 cells. Each minigene was generated using the sequences from the endogenous events (upstream/downstream exons and introns). The variant name and the corresponding sequence of its 5′ss is indicated on top of the gel (3 last exonic nucleotide | 6 first intronic nucleotide). The mutations were engineered to either increase or decrease the complementarity to U1 snRNA and the corresponding maximal entropy score for 5′ss is reported (MAXENT5). The results correspond to a single replicate of the experiment. Each condition of DAAM1-HsaEX0018410 and apbb1-DreEX00138456/HsaEX0005055 was conducted in three biological replicates, while asap1-DreEX0015055/HsaEX0006155 was performed in four biological replicates, for which the means and standard deviations (std) are provided below each lane. (g, h) On the left side, the lines within the violin represent the interquartile range of the maximal entropy score for 5′ss (MAXENT5) for the LS and HS events for the WT constructs and their corresponding GU > GC (g) or U1weak (h) variants. Each dot represents an event. Statistics: Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction. On the right side, the lines represent the PSI across four experimental conditions (GFP or LOW, MID, HIGH expression of SRRM4) for LS (left) or HS (right) events for WT and GU > GC (g) or WT and U1weak (h) variants. Statistics: Mann-Whitney-Wilcoxon test two sided. (i) Alignment of ITSN1-HsaEX0032608 in 20 species spanning from primates to sauropsids for the last 93 nucleotides of the upstream intron, the microexon and 25 nucleotides of the downstream intron (as cloned in the context of the library). The level of conservation is depicted at the bottom with the black histogram and the nucleotides are counted from 1 to 133 at the top. The PSI values quantified across the four experimental conditions (GFP and LOW, MID, HIGH expression of SRRM4) are provided in the heatmap on the right side. The mutation from A to G in sus_scrofa, bos_taurus and tursiops_truncatus is indicated by the arrow. (j) The heatmap represents the effect of “likely pathogenic SNP” (ClinVar) mutations in four experimental conditions (GFP and LOW, MID, HIGH expression of SRRM4) in the 5′ ss of the CASK-HsaEX0012545 HS event at position 1 (rs1556000257 G > A) and in the 5′ss of the CS event SNAP25-HsaEX3077184 at position 2 (rs2123019421 T > G). (k) Distribution of ΔPSI (VAR-WT) values between variants and their corresponding WT in GFP vs MID, HIGH expression of SRRM4 for LS, HS, CS and their corresponding shortened version events. Variants include deep mutagenesis of sequences involved in base pairing to U1 snRNA including positions -3, -2, -1 at the 3′ end of the exon and positions 3 to 6 of the intron.
Extended Data Fig. 7 Basal spliceosomal recruitment in vitro and ex-vivo.
(a) Fit of Model I to the data. Note that the model cannot explain the differences in the GFP condition among sequence groups. The model is the line, the dots the median of the data, and the shaded area the interquartile range. Same fit procedure as in Fig. 7c–e, but assuming that groups differ in c4 instead of c2. (b) Distribution of MSE (Mean Squared Error) for the best parameter sets found in each of 500 independent optimization runs, assuming groups differ in either c2 or c4, when the model was fitted either to the experimental data or to 9 random datasets with synthetic data constructed by permuting the median experimental PSI values across experimental conditions. (c) Frequency of the distance between the SRRM4 CLIP tags and the dinucleotide AG. (d) Number of events with or without SRRM4 CLIP tags in the 200 nts preceding the microexons. Statistics: two-sided Fisher’s Exact test. (e) SRRM4 CLIP tag densities in the last 150 nts of the upstream intron of endogenous HS and LS events, data from9. (f) Representation of the experimental approach used to investigate the efficiency of A spliceosomal complex formation in vitro. Briefly, RNA corresponding to the sequences cloned in Fig. 1c were transcribed and incubated in HeLa cells nuclear extracts under splicing conditions10. The H and A spliceosomal complexes were resolved by electrophoresis and RNA contained in A complex was isolated. The bands cut from the gels are highlighted by the red squares. Both the input and isolated RNA population (output) were amplified by RT-PCR assays and analyzed by deep-sequencing. Primers used for both input and output amplifications are depicted with the green arrows and amplicons in green line. (g–i): For 71 LS and 73 HS events: (g) PSI in GFP condition, (h) MAXENT3, MAXENT5 entropy scores and (i) SpliceAI scores of acceptor and donor sites. (j–m) For wild-type sequences of 71 LS microexons, 73 HS microexons and 27 42 nts long CS exons Mean ESRseq score (j), Number of ESE (k) or ESS (l) elements normalized by the exonic length and of ISR (m) elements identified in the 93 last nucleotides of the upstream intron and 25 first nucleotides of the downstream intron normalized by the total intronic length. Fisher tests for contingency tables: For ESE: LS-vs-HS, P = 0.035; LS-vs-CS = 1.3e-03; HS-vs-CS = 0.108. For ESS: LS-vs-HS, P = 0.181; LS-vs-CS = 0.010; HS-vs-CS = 2.5e-4. (n–p) The boxes represent the square root of the peak intensities of U2 snRNP binding in the 93 nts preceding the LS and HS microexons (Fig. 7g) as a result of either SF3A3_Flag_IP-seq (n), RBM5_RNP-seq (o) or RBM10_Flag_IP-seq (p). (q) Expression of genes (in cRPKMs) presented in (Fig. 7f, g, and Extended Data Fig. 7g, h). (r) U2 snRNP peaks detected at the full transcript level. Statistics in (a–c, e, g–p): Mann-Whitney-Wilcoxon test two sided, in (q, r): Mann-Whitney-Wilcoxon test two-sided with Bonferroni correction (ns: non significant).
Supplementary information
Supplementary Information
Supplementary Notes 1–4, Supplementary Methods, description of the Supplementary Tables and Source Data.
Supplementary Tables
Supplementary Table 1: List of the 234 events (EventID from https://vastdb.crg.eu/) selected in the first library (T0) and their membership as LS, HS, CS, CR or control exons. Supplementary Table 2: List of the 43 events selected for libraries T1, T2, T3 and T4 and their membership as LS, HS, CS and CR. Yes/No indicates whether the events were considered in the respective library including T0. HsaEX0002213, present in each library, was discarded from the analyses. Supplementary Table 3: List of species from which sequences of microexons and surrounding intronic sequences were retrieved, together with the time of split from the human lineage in MYA, acronyms used in the variant names and the corresponding nodes. Supplementary Table 4: List of the 36-nt-long sequences used to elongate microexons. The names, sequences, alternative names, resources used for the design and libraries in which each sequence was used are provided. ESRseq scores43 were used to design sequences with the lowest possible score. In addition, sequences bearing AG, GT, TGC and tandem sequence repeats of three or more nucleotides were discarded. Supplementary Table 5: Description of the type of mutations performed for variant generation. Classification is based on the location where the mutation is in the pre-mRNA: at the 3′ss (UPINT), the exon (VAR_SEQ) or the 5′ss (DOINT). Combinations of mutations in different parts of the pre-mRNA are also reported. Extension in variant name and description of the mutation are provided for each type. Supplementary Table 6: ESR and BP sequences used for variant generation. SVM_BP scores are included for each BP sequence. Supplementary Table 7: Resources (software and websites) used to recover information related to splicing features, evolution and GitHub resources (VastDB, vast-tools, MAXENT3, MAXENT5, SVM-BP and GitHub for MaPSy analyses in this study and mathematical model). Supplementary Table 8: PSI of endogenous events as provided by vast-tools under nine experimental conditions (HeLa SRRM3 and HEK 293 SRRM4) with the classification of LS (71 events), HS (73 events), NR or others (33 events), CR (30 events) and CS (27 events). Columns with ‘*-Q’ contain quality information for each sample, as provided by vast-tools (https://github.com/vastgroup/vast-tools?tab=readme-ov-file#combine-output-format). Missing values are reported as NA. Samples used were generated by Juan-Mateu et al.4 (HeLa FLip IN cells expressing GFP or Srrm3), by Torres-Méndez et al.10 (HEK 293 Flp-In T-REx expressing GFP) and in this study (CL_HEK293_SRRM4_CONT, CL_HEK293_SRRM4_LOW, CL_HEK293_SRRM4_MID, CL_HEK293_SRRM4_HIGH). Supplementary Table 9: List of the publicly available RNA-seq datasets used in this study. Supplementary Table 10: List of the tissues from the different species represented in Extended Data Fig. 1d. Supplementary Table 11: Table related to the selection of the events from MaPSy T0 to T1, T2, T3 and T4. Supplementary Table 12: Information of all the variants used in each of the five libraries. The columns correspond to event, variant, oligo (the oligo sequence is provided without 20 nt of Mega or Moe added for ordering to TWIST Bioscience (5′ end with Mega, 5′-GATGGGATAAGACGGTAGGC-3′; 3′ end with Moe, 5′-TCGTAGCACGTCACGGTTGG-3′), UPINT (93 nt of the upstream intron), VAR_SEQ (sequence of the exon), DOINT (25 nt of the downstream intron), VAR_LEN (exonic length) and LME (Library number the variant corresponds to). CS events bearing ‘-rem’ in the variant names correspond to shortened versions of the original CS events. Supplementary Table 13: Pairwise matching of variants with identical sequences in library T4, belonging to different mutation groups. This information was used to reassign PSI in the final PSI table on the basis of PSI for VARIANT (reported in the first column). Supplementary Table 14: For each library, the total number of WT events, number of unique oligo sequences and median number of barcodes per variant (calculated from the set of unique barcode–variant associations used to quantify the inclusion of each variant) are presented. Supplementary Table 15: Primer name, sequence and their use in this study. Supplementary Table 16: Total number of reads per sample deposited to GEO (GSE276143). Supplementary Table 17: PSI of a given VARIANT from libraries T0, T1, T2, T3 and T4 in four experimental conditions (GFP expression and low, mid and high expression of SRRM4). Missing values are reported as NA. Supplementary Table 18: ES of A complex formation performed under CTR condition for variants in libraries T1 and T2. Missing values are reported as NA.
Source data
Source Data Fig. 1
Raw data.
Source Data Fig. 2
Unprocessed gels.
Source Data Fig. 2
Raw data.
Source Data Fig. 3
Raw data.
Source Data Fig. 4
Raw data.
Source Data Fig. 4
Unprocessed gels.
Source Data Fig. 5
Raw data.
Source Data Fig.6
Raw data.
Source Data Fig. 6
Unprocessed gels.
Source Data Fig. 7
Raw data.
Source Data Extended Data Fig. 1
Raw data.
Source Data Extended Data Fig. 2
Raw data.
Source Data Extended Data Fig. 2
Unprocessed gels.
Source Data Extended Data Fig. 3
Raw data.
Source Data Extended Data Fig. 4
Raw data.
Source Data Extended Data Fig. 4
Unprocessed gels.
Source Data Extended Data Fig. 5
Raw data.
Source Data Extended Data Fig. 6
Raw data.
Source Data Extended Data Fig. 6
Unprocessed gels.
Source Data Extended Data Fig. 7
Raw data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bonnal, S., Bajew, S., Martinez-Corral, R. et al. Core splicing architecture and early spliceosomal recognition determine microexon sensitivity to SRRM3/4. Nat Struct Mol Biol (2025). https://doi.org/10.1038/s41594-025-01634-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41594-025-01634-1