Introduction

Alternative splicing represents a critical layer of gene regulation that greatly diversifies the expression of transcript and protein isoforms from a limited repertoire of genes. Transcripts from over 95% of human multi-exonic genes undergo alternative splicing1,2, and more than half of these genes encode alternative splicing events that are subject to pronounced condition (e.g., cell/tissue type) dependent regulation3. These events are typically controlled by combinations of RNA binding proteins that interact with cis-elements in precursor (pre-) RNA to promote or repress the proximal assembly of small nuclear ribonucleoprotein particles and additional factors that form active splicing complexes (spliceosomes)4,5,6. While significant progress has been made towards understanding the regulatory networks that control alternative splicing, a major challenge confronting the postgenomic era is to determine the functions of the large repertoires of uncharacterized isoforms generated by alternative splicing and other post-transcriptional processes that are associated with normal and disease physiology.

Functional characterization of an individual alternative splicing event can require years of effort and typically involves tailored assays that lack scalability. To begin to address this challenge, recent studies have described the development and application of CRISPR-Cas editing strategies in screen formats to assess the impact of exon deletion on cell fitness7,8,9. For example, ‘paired guide RNAs for alternative exon removal’ (pgFARM) has been used to assess the function of premature termination codon (PTC)- exons that elicit nonsense mediated mRNA decay on cell growth8. We previously described ‘Cas HYbrid for Multiplexed Editing and scReening Applications’ (CHyMErA) and applied this system to assay the impact of deleting thousands of alternative exons in two different cell lines on cell fitness7. CHyMErA employs simultaneous expression of Cas9 and Cas12a nucleases, together with hybrid guide (hg)RNAs, which are fusions of Cas9 and Cas12a guides expressed from a single U6 promoter and processed into pairs of individual Cas9 and Cas12a gRNAs that direct genomic fragment deletion. CHyMErA has recently been expanded to a larger screen revealing that ‘fitness exons’ are concentrated in highly expressed essential genes, including important examples such as an alternative exon in the TAF5 subunit that regulates assembly of the TFIID transcription initiation complex and global gene expression9. Also recently, gRNA-directed RNA targeting of deactivated (d)Cas13d (dCasRx) fused to splicing effectors has been used to direct the specific and efficient activation or repression of endogenous exons10,11, although this strategy has not yet been employed on a large scale.

To date, splice isoform-resolution functional genomic assays have not been applied in the context of developmental biology, nor on a single cell level, yet these applications are critical for determining the functions of alternative splicing events linked to cell fate decisions, differentiation programs, as well as cell and tissue maintenance. They are also needed to understand the roles of splice variants and other isoforms in pathological contexts, for example, the numerous forms of cancers and brain disorders in which alternative splicing is dysregulated and in which individual alternative exons are known to contribute to pathophysiology5,6,12. These challenges are particularly relevant to a program of evolutionarily conserved, 3-27 nucleotide-long ‘microexons’, which are predominantly included in the nervous system13,14.

Neuronal microexons are enriched in genes that have important roles in nervous system biology and disorders, and they are often misregulated in the brains of individuals with idiopathic autism spectrum disorder (ASD)13. Mice haploinsufficient for the ‘master’ regulator of brain-specific microexons, the Serine/Arginine-repetitive matrix-4 (Srrm4) protein (previously known as nSR10015), which promotes the splicing of the majority of alternative microexons during neurogenesis, exhibit multiple ASD-like phenotypes, including altered social behaviors, hypersensitivity to environmental stimuli, and altered synaptic transmission16,17. Moreover, focused characterization of individual brain-specific microexons has revealed important roles in the regulation of neuronal-specific gene expression18,19, translation20, mRNA deadenylation21, neuritogenesis16,22,23, and synaptic biology24,25. Mice deficient of some of these microexons exhibit neurodevelopmental and behavioral deficits, including ASD-like phenotypes20,21. However, despite the importance of neuronal microexons revealed by these studies, hundreds of these alternative splicing events await characterization.

In this study, we have developed CHyMErA-seq, an enhanced CHyMErA system that enables scalable isoform-resolution perturbation screens coupled to transcriptomic read-outs. We apply CHyMErA-seq to determine roles of neuronal microexons in genes with critical functions in neurodevelopmental processes, and that are implicated in brain disorders. Through applying this system in the context of neurogenesis, we have identified a subset of microexons that convergently control signaling pathways and developmental timing of transcriptomic signatures responsible for critical stages of neuronal maturation and function. Overall, our results introduce CHyMErA-seq as versatile platform for isoform-resolution functional genomics in the contexts of developmental and single cell biology.

Results

A platform for exon-resolution functional genomics at a single cell level

To establish a system enabling the profiling of single cell transcriptomic phenotypes as a consequence of exon perturbation, we coupled CHyMErA-directed deletion of microexons to split-pool single nuclear (sn)RNA-sequencing using the sci-RNA-seq3 protocol26. Specifically, we adapted the ‘CRISPR droplet sequencing’ (CROP)-seq lentiviral vector27 by integrating hgRNA (refer to Introduction) expression cassettes within the vector’s 3ʹ Long Terminal Repeat (LTR). Following genomic integration and LTR duplication, the modified CROP-seq vector expresses hgRNAs from an upstream RNA pol III-transcribed cassette that directs genome editing. The second (duplicated) hgRNA cassette is transcribed by RNA pol II within the 3´ untranslated region (UTR) to generate polyadenylated (polyA) transcripts. When polyA+ scRNA-seq is performed, both cellular and hgRNA transcripts are thus simultaneously profiled, allowing detected transcriptomic changes to be linked to specific microexons targeted for deletion (Fig. 1a).

Fig. 1: A single-cell exon-resolution functional genomics screen.
figure 1

a Overview of the CHyMErA-seq platform, highlighting integration of a modified CROP-Seq vector expressing Cas9/Cas12a hybrid guide (hg)RNAs directing exon deletion, and as polyadenylated (polyA) RNA for linking exon deletions to polyA+ single nuclear RNA sequencing-detected transcriptomic phenotypes. b Schematics showing the composition of different vectors tested for exon deletion efficiency in CHyMErA exon deletion assays (left). Boxplots indicating percent genomic deletion of test exons achieved with each Cas12a construct when expressed with Cas9 (n = 6 hgRNAs targeting 3 microexons). The boxes mark the first and third quantile and the lines inside the boxes mark the median, whiskers extend from the box to the farthest point lying within 1.5 times the inter-quartile range. c Genomic PCR assays comparing efficiency of Ptk2(A) microexon deletion following expression of Cas12a constructs shown in (b) and lentiviral delivery of hgRNAs directing deletion (cons 1 and 2) versus non-targeting guides (‘nt’). Size range of edited bands indicated on the right. Editing efficiencies were estimated from length-normalized quantification of ratios of edited and total band intensities, and are consistent (i.e., within 10%) with estimates determined by quantification of Oxford Nanopore sequencing reads mapping specifically to unedited and edited genomic fragments (Methods). Gel electrophoresis of edited bands performed once. d PCR of edited genomic loci in embryonic stem cells (ESC) and neurons (day in vitro 7), following lentiviral delivery of hgRNAs targeting Ptk2(A) (upper) and Camta1 (lower) microexons. Gel electrophoresis of edited bands performed once. e Overview of selection criteria for CHyMErA-seq analyzed microexon targets (refer to main text and Methods). f Percent spliced in (PSI) profiles of microexon targets selected for the CHyMErA-seq analysis during in vitro neuronal differentiation. g Schematic of hgRNA library design. Thick bars represent cut sites and lines represent the intervening regions deleted for each Cas9/Cas12a guide pair. h Schematic representation of CHyMErA-seq screen. CGR8 murine embryonic stem cells stably expressing Cas9 and optimized Cas12a (see 1b) were transduced with a library of hgRNAs expressed from a modified CROP-seq vector. Transduced cells were then subjected to in vitro glutamatergic neuronal differentiation and isolated nuclei analyzed using sci-RNA-seq3. Source data are provided as a Source Data file.

To initially test and optimize the efficiency of this system, CROP-seq vectors expressing six different hgRNAs targeting three microexons were introduced into a set of CGR8 murine embryonic stem cell (mESC) lines engineered to stably express Cas9 in addition to a series of Lachnospiraceae bacterium (Lb)Cas12a or Acidaminococcus sp. (As)Cas12a constructs. Each set of LbCas12a and AsCas12a constructs incorporated different configurations of nuclear localization signals (NLSs), which were evaluated for nuclear localization efficiency (Yuxi Xiao and J.M., unpublished data), and mutations shown previously to increase the activity of Cas12a28 (Figs. 1b, c; and Supplementary Figs. 1a and 1b). We observe that mutationally-enhanced (en)AsCas12a29 fused to two tandem, carboxy-terminal c-Myc NLSs, expressed together with Cas9, affords the highest overall efficiency of editing compared to the original CHyMErA system, achieving on average a 16% higher editing efficiency (Fig. 1b). Moreover, proportions of edited cells remain constant during neuronal differentiation of the mESC lines, indicating that the microexon deletions, and more generally the introduction of genomic double-stranded breaks, did not significantly affect cell viability (Fig. 1d). The resulting enhanced CHyMErA system was therefore used in downstream experiments to target a larger set of neuronal microexons and analyze the effects of their deletion on differentiating neural cell transcriptomes using snRNA-seq.

A microexon-targeting hybrid guide RNA library

Microexons targeted for analysis by CHyMErA-seq were selected on the basis of their: (1) sequence conservation between human and mouse, (2) exhibiting consistent alternative splicing patterns during in vitro and in vivo neurogenesis30,31, (3) location in genes that have important functions in nervous system biology (Supplementary Fig. 1c), and (4) the availability of multiple gRNA targeting sites in the flanking introns (Fig. 1e, f). All of the targeted microexons are frame-preserving and expected to insert 3-9 amino acids into the coding sequence. Cas9 and Cas12a gRNAs were designed such that at least 100 base pairs (bp) of upstream and downstream sequence flanking the microexon were targeted for deletion, to ensure removal of both the coding sequence and critical splicing signals, while maintaining a distance of at least 150 bp (median distance = 2023 bp) from the upstream and downstream neighboring exons to avoid disruption of their splicing (Fig. 1g). We scored potential Cas9 gRNAs using CRISPOR32 and Cas12a guides using CHyMErA-NET7 (Supplementary Fig. 1d). To ensure robust phenotype detection within a single experiment, for each microexon of interest we selected nine distinct pairs of gRNAs with the highest overall combined scores for targeting. For comparison and as positive controls for possible phenotypic change, we included Cas9 and Cas12a gRNAs that target constitutive exons in the same set of genes. As additional positive controls, we included gRNAs targeting constitutive exons in Srrm4 and its paralog, Srrm3. As negative controls, we included two gRNA pairs designed to cut within intronic sites that are at least 300 bp from the target exon 3’ and 5’ splice sites, and which are therefore not expected to disrupt microexon splicing. Additional negative controls included 13 non-targeting gRNAs, and 13 gRNAs designed to cut within intergenic regions. Collectively, the resulting library consisted of 500 gRNAs targeting 37 microexons in 32 genes, with four genes (Clasp1, Ptk2, Ptprf, and Vav2) containing more than one targeted microexon. We transduced CGR8 mESCs expressing the enhanced CHyMErA system with the lentiviral hgRNA library at an approximate multiplicity of infection of 0.2, such that each cell expressed on average a single hgRNA. To confirm library representation and uniformity, we sequenced the integrated hgRNA cassette from transduced cells after selection. This revealed 100% library representation, with 98.8% of guide sequences having read counts falling within 10-fold of the mean, indicating high uniformity of library coverage (Supplementary Fig. 1e).

Detection of microexon deletion-associated neuronal transcriptomic signatures

To investigate the functions of the targeted microexons during neurogenesis, mESCs expressing the CHyMErA-seq system were differentiated into glutamatergic neurons, and at day in vitro 7 (DIV7) scRNA-seq libraries were generated (Fig. 1h). ~21,000 quality-filtered nuclei expressed targeting or control gRNAs, representing 36 of the 37 microexons targeted by the hgRNA library (Methods). Each pair of microexon-targeting gRNAs was represented by a median of 44 cells, and each microexon target was represented by a median of 492 cells (Supplementary Figs. 2a, and 2b). We detected diverging differentiation trajectories in the transcriptomic profiles and categorized cells into distinct lineages using CellRank233, an analytical framework that computes cell-fate transition probabilities and terminal states from gene expression profiles in scRNA-seq data. Mapping the single-nuclei transcriptomes from an in vivo snRNA-seq dataset from developing mouse embryos26 onto our screen revealed that most cells represent the neuronal lineage, while smaller subsets of cell populations represent early mesodermal and endodermal lineages (Fig. 2a). We clustered cells and analysed marker gene expression within these trajectories to reveal the presence of expected neuroepithelial, radial glial, and neuronal clusters within the neuronal lineage (Fig. 2b, c Supplementary Figs. 3a and 3b), with cells expressing microexon targeting guides for 24/36 detected microexon targets represented by at least 25 cells in each cluster within the neuronal lineage (Supplementary Fig. 3c).

Fig. 2: Microexon deletion results in dysregulated transcriptomic signatures associated with neurogenesis.
figure 2

Uniform manifold approximation and projection (UMAP) representation of transcriptomic profiles of cells assigned to microexon deletion hgRNAs and colored according to a assigned lineage, b broad cell cluster, and c associated deleted microexon in ‘knockout’ and non-targeting cells, with numeric subcluster labels indicated. d Enrichment of ‘knockout’ cells compared to non-targeting cells within each subcluster, corresponding to deletion of Ralgapb (n = 105), Ptprf (microexon A; n = 158), Ptk2 (microexons A, B and C; n = 112), Med23 (n = 91), Gfra1 (n = 103), Clasp1 (microexon A; n = 130), Camta1 (n = 34) and Bin1 (n = 180) microexons (bottom). Significantly enriched subclusters are outlined (P < 0.05, Benjamini-Hochberg corrected two-sided Fisher’s exact test). Subclusters are ordered by pseudotime (top, shading represents one standard deviation from the mean). Scaled average marker gene expression (Neuroepithelial markers = Otx2, Sox2; Radial glia markers = Pax6, Vim; Neuronal markers = Mapt, Rbfox3) for each subcluster are indicated (middle). e Heatmap of top 10 (sorted by FDR adjusted p value) gene ontology (GO, biological processes) terms for differentially expressed genes (FDR < 0.05) resulting from microexon deletions. GO terms are grouped according to similarity and colored according to parent GO terms. Outline indicates Benjamini-Hochberg corrected P < 0.05, two-sided hypergeometric test. Gray indicates ‘NA’. f Protein-protein interaction network for genes containing neuronal specific microexons enriched in neuron projection gene ontology term (GO:0043005, FDR = 8.27 × 10−15). Microexons targeted in the CHyMErA-seq screen (light orange) and detected as screen hits (i.e., with pronounced transcriptomic phenotypes) (red) are indicated. Edge weights represent STRING interaction score. g Empirical cumulative distribution functions (eCDF) for differentially expressed genes (FDR < 0.05) associated with microexon deletions in Bin1 (n = 156), Camta1 (n = 10), Clasp1 (microexon A; n = 109), Gfra1 (n = 115), Ptk2 (microexons A-C; n = 90), Ptprf (microexon A; n = 45), Med23 (n = 31), and Ralgapb (n = 101) according to their associated neuronal lineage correlation ranking compared to non-differentially expressed gene (dashed line) (Methods). Benjamini-Hochberg corrected p values calculated by two-sided Wilcoxon rank sum test. Source data are provided as a Source Data file.

To identify transcriptomic signatures associated with microexon deletion, we analyzed differential gene expression (DGE) profiles of cells representing subclusters within the neuronal lineage (Supplementary Fig. 4a). Specifically, we used a pseudobulk approach34 to calculate pairwise DGE values between subclusters of cells expressing gRNAs directing microexon deletion and non-targeting control gRNAs. Since we expected to only observe transcriptomic signatures in sub-populations of cells with effective gRNA editing, we developed an analytical pipeline to discriminate profiles that likely represent edited (‘knockout’) from unedited (‘non-perturbed’) cells based on these signatures (Methods) (Supplementary Fig. 4b). From this analysis, ten of the analyzed microexons are associated with transcriptomic phenotypes, represented by at least three different pairs of gRNAs targeting the same microexon (Methods) (Supplementary Fig. 4c). Supporting the reliability of these results, benchmarking of our analysis pipeline against a published single-cell CRISPR screen dataset35 reveals that 12/15 (80%) of perturbations called by our method have significant levels of reduction in expression of the target gene (P < 0.05, Bonferroni-corrected Wilcoxon rank-sum test).

To further assess specificity of the effects of deleting targeted microexons, we quantified the degree of correlation between the transcriptomic signatures detected in cells expressing distinct gRNAs targeting the same microexon and compared this to correlations between these gRNAs and either intergenic negative controls, or non-targeting controls. Importantly, cells with guides targeting the same gene exhibited significantly higher gene expression correlation compared to both non-targeting (P = 7.50x10-12, Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test) and intergenic-targeting controls (P = 5.85x10-9, Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test) (Supplementary Fig. 4d). This degree of correlation is to a large extent independent of the number of cells recovered for each guide (Supplementary Fig. 4e). We also observe that the expression levels of microexon-harboring genes are not significantly altered upon microexon deletion, indicating that the observed phenotypes are due to microexon deletion rather than loss of gene expression (Supplementary Fig. 4f). To assess the specificity of microexon-deletion phenotypes, we performed pairwise comparisons of microexon deletion-associated transcriptomic signatures between cells expressing microexon-targeting hgRNAs or cells expressing guides targeting constitutive exons in the same gene, with cells expressing non-targeting guides (Methods) (Supplementary Fig. 4g). This analysis reveals that transcriptomic signatures derived from microexon-deletions are distinct from those detected in response to guide-directed perturbation of the corresponding genes. However, we do detect a similar rate of transcriptomic perturbations for gene (i.e., constitutive exon) knockout gRNAs, although these changes involve a different subset of genes from those showing microexon deletion-dependent perturbations (Supplementary Fig. 4h). Moreover, gRNAs targeting Srrm3, and simultaneously Srrm3-Srrm4, also result in a transcriptomic phenotype, consistent with the role of these factors in the widespread regulation of the neuronal microexon program. Intergenic control guides did not generate a phenotype, and negative control intronic deletion gRNAs yielded only a single significant transcriptomic profile change, which may relate to the partial retention of the target intron (in the Fbxo25 gene) during neurogenesis. Overall, these results support the efficacy of CHyMErA-seq for the discovery of specific transcriptomic phenotypes as a consequence of deletion of neuronal microexons during neurogenesis.

Microexon deletions impact transcriptomic signatures associated with neurogenesis

To further investigate the functional consequences of microexon disruption, we performed higher-resolution cell clustering and observed enrichments of perturbed cells at different stages of neurogenesis (Figs. 2c and 2d). This revealed that deletion of a subset of microexons results in altered transcriptomic profiles enriched in subclusters of cells corresponding to late neuroepithelial stages (Bin1, Clasp1(A), Gfra1, Med23, Ptprf(A), Ralgapb), radial glia (Ptprf), and subsequent stages of neuronal differentiation (Bin1) (Fig. 2d). We observe that differentially expressed genes associated with these microexon deletions are enriched in gene sets involved in nervous system development and neuron projection guidance (Fig. 2e). Notably, several of the microexons with transcriptomic perturbations in this study (Bin1, Clasp1, Gfra1, Ptk2 and Ptprf) are located within a network of microexon-containing genes enriched in functions relating to neuron projection development (Fig. 2f). These microexon-containing genes include Ptk2, for which we observe an overall transcriptomic signature associated with deletion of its microexons. Ptk2 encodes focal adhesion kinase (FAK), a non-receptor protein tyrosine kinase that has been shown to mediate focal adhesion dynamics that regulate cell migration and survival in neurons and other cell types36,37,38. The Ptk2 microexons (denoted as ‘A’, ‘B’ and ‘C’), which display coordinated splicing patterns in vivo, are situated proximal to phosphorylated regions of FAK and have been shown previously to promote FAK phosphorylation39,40,41. These results imply that microexons have important roles in neuronal differentiation through modulating key proteins involved in morphological changes associated with neurogenesis.

To enable the evaluation of transcriptomic signatures in the context of neurogenesis, we utilized a CellRank2-derived score for each gene according to the correlation of its expression with progression along the neuronal lineage33,42 (Methods). Gene Set Enrichment Analysis (GSEA) reveals that high-scoring genes are enriched in GO terms associated with synapse maturation and related neuronal processes, whereas low scoring genes are enriched in proliferation-related terms, such as chromosome segregation and catabolism (Supplementary Fig. 4i). To assess how microexon deletion effects neurogenesis, we compared the distributions of scores for up- and down-regulated genes for each perturbation, or in the case of Ptk2, pooled scores for genes affected across the three microexon perturbations, due to the relatively low cell numbers and sparse data recovered for each individual Ptk2 perturbation. Notably, compared to background genes (i.e., non-differentially expressed, FDR ≥ 0.05), we observe that upregulated genes associated with altered transcriptomic signatures are significantly enriched for higher CellRank2 scores for microexon deletions in the Bin1, Clasp1, Gfra1, Med23, Ptprf and Ralgapb genes (P < 0.05, Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test) (Fig. 2g). This observation suggests that deletion of these microexons results in premature activation of gene expression programs associated with neurogenesis.

The Gfra1 microexon regulates neuron projection development

We next further explored the functional consequences of deletion of specific microexons that result in the upregulation of neurogenesis gene expression patterns. We initially focused on a microexon in the gene encoding the Glial derived neurotrophic family receptor alpha-1 (GFRA1), which has a prominent role in controlling neuronal differentiation, maturation and signaling. GFRA1 is a glycosylphosphatidylinositol (GPI)-linked surface co-receptor for glial derived neurotrophic factor (GDNF) and known to mediate various signaling pathways through interactions with different surface-exposed co-receptors, including the ‘Rearranged during transfection’ (RET) receptor tyrosine kinase and Neural cell adhesion molecule (NCAM)43,44. GFRA1 contains a single microexon located within a loop region between its first and second GDNF-binding domains45. This region has been associated with reducing GDNF-mediated signaling46, an event that is critical for neurite formation43,44, thus highlighting the importance of characterizing the function of the Gfra1 microexon during neurogenesis.

Analysis of the Gfra1 microexon deletion-associated transcriptomic signature reveals a premature upregulation of programs of genes involved in neurogenesis, including those associated with neuronal projection development (Fig. 3a and Supplementary Fig. 5a). To validate and extend these findings, we generated independent mESC lines harboring homozygous deletions of the microexon (Supplementary Fig. 5b) and differentiated these cells into neurons. From an analysis of bulk RNA-seq from DIV7 neurons generated from these cell lines, we observe a significant positive correlation of log-fold changes in expression when comparing measurements derived from bulk RNA-seq and those derived from the CHyMErA-seq data (R = 0.14, p = 0.0057, Spearman correlation) (Supplementary Fig. 5c). Furthermore, we observe a significant overlap between the sets of Gene Ontology (GO) terms enriched for upregulated genes in response to Gfra1 microexon deletion in both bulk RNA-seq and CHyMErA-seq data (odds ratio = 20.4, P < 2.2x10-16, Fisher’s exact test) (Fig. 3b). We observe that most of the overlapping GO terms are related to processes involved in cellular morphogenesis and neuron differentiation. Additionally, when ranking differentially expressed genes in the bulk RNA-seq with the same CellRank2-derived neuron differentiation score as was used to analyse the CHyMErA-seq signatures, we observe a significant increase in score ranking for upregulated genes (P = 2.9x10-4, two-sided Wilcoxon rank sum test) (Fig. 3c). Therefore, to investigate whether the Gfra1 microexon deletion affects neuritogenesis, we visualized and measured neurites by immunofluorescence microscopy using anti-MAP2 antibody to stain neuronal microtubules at DIV3 and DIV7, which correspond to time points representing immature and maturing neurons, respectively (Fig. 3c, and Supplementary Fig. 5d). Notably, we observe formation of longer neurites in DIV7 Gfra1∆MIC/∆MIC neurons, compared to wildtype neurons (Fig. 3d; P = 0.0056, two-tailed unpaired t-test). Collectively, these results support a role for the Gfra1 microexon in suppressing a transcriptomic program to prevent premature activation of neuritogenesis.

Fig. 3: A neuronal microexon in Gfra1 regulates neuron projection development.
figure 3

a A generalized additive model (GAM) was fitted to scaled expression scores for genes significantly upregulated (logFC > 0, FDR < 0.05) in Gfra1 microexon deletion cells (KO, n = 103; red) and non-targeting controls (NT, n = 278; gray) across pseudotime (adjusted R² = 0.72; KO vs. NT estimate = 0.39; P = 6.7 × 10⁻⁹) (Methods). Colored lines represent the smoothed estimate of mean z-score across pseudotime. Shaded areas represent the 95% confidence interval of the fit. b Heatmap of the log(odds ratio) of the top ten (sorted by odds ratio) significantly enriched GO terms in bulk RNA-seq and CHyMErA-seq following Gfra1 microexon deletion. Benjamini-Hochberg corrected P < 0.05 shown (two-sided hypergeometric test). c Empirical cumulative distribution function (eCDF) of differentially expressed genes (FDR < 0.05; n = 1481) associated with Gfra1 microexon deletion compared to wild-type neurons are plotted according to their associated neuronal lineage correlation (Methods). P value calculated by two-sided Wilcoxon rank sum test. d Immunofluorescence microscopy detection of microtubule associated protein 2 (MAP2) in wild-type (‘WT’) and Gfra1 microexon-deleted differentiating neurons at day in vitro 7. Nuclei are stained with DAPI. Scale bar = 15 μm. e Quantification of neurite length in wild-type (‘WT’) (n = 123) and Gfra1 microexon deletion neurons (n = 212) (p = 0.0056, two-tailed unpaired t-test). The boxes mark the first and third quantile and the lines inside the boxes mark the median, whiskers extend from the box to the farthest point lying within 1.5 times the inter-quartile range (outliers not shown). Source data are provided as a Source Data file.

Functions of a neuronal microexon in neurogenesis via modulation of FAK signaling

We next performed a focused follow-up investigation of a previously uncharacterized microexon in the Cytoplasmic linker associated protein 1 (Clasp1) gene, the deletion of which resulted in a pronounced transcriptomic signature in the CHyMErA-seq screen (Fig. 2e). CLAPS1 is a microtubule-associated protein that stabilizes the plus ends of growing microtubules to facilitate directional migration47. In humans, CLASP1 has been genetically linked to ASD through the presence of missense mutations along its coding sequence48. To confirm and extend the results from the CHyMErA-seq screen, we compared altered transcriptomic profiles detected following deletion of the Clasp1 microexon with those detected following direct manipulation of the exon in transcripts using a recently described11 dCasRx-artificial splicing factor fusion protein, dCasRx-RBM25. We have demonstrated that direct gRNA-targeting of dCasRx-RBM25 to downstream intronic sequences proximal to alternative exons leads to their efficient inclusion, whereas targeting dCasRx-RBM25 to exons inhibits their splicing. mESCs stably expressing dCasRx-RBM25 were transduced with gRNAs directing activation or repression of the Clasp1 microexon in mESCs (DIV -8), and during differentiation (Fig. 4a). Bulk polyA+ RNA-seq data was generated and analyzed for the mESC and DIV7 samples (Supplementary Fig. 6a).

Fig. 4: A neuronal microexon in Clasp1 regulates neurogenesis by attenuating FAK signaling.
figure 4

a Representative RT-PCR analysis of transcripts including or skipping Clasp1(A) microexon following tethering of dCasRx-RBM25 to promote exon inclusion or skipping at mESC (days in vitro) (DIV -8) and maturing neuronal timepoints (DIV3, DIV7). n = 2 independent experiments. Empirical cumulative distribution function (eCDF) of differentially expressed genes (DEGs, FDR < 0.05) associated with Clasp1(A) microexon exclusion compared to inclusion for mESCs (n = 2243) (b) and neurons (n = 235) (c) according to their associated neuronal lineage correlation ranking (Methods). P values calculated using two-sided Wilcoxon rank sum test. Scatterplot comparing log2(fold-change) in gene expression patterns in CHyMErA-seq from Clasp1(A) microexon deletion and bulk RNA-seq data from dCasRx-RBM25-directed skipping of Clasp1(A) microexon, in (d) mESCs and (e) DIV7 neurons. Pearson correlation values are calculated from differentially expressed genes (n = 104) (FDR < 0.05; red points). f Heatmap of the log(odds ratio) of the top five (sorted by odds ratio) significantly enriched GO terms in bulk RNA-seq and CHyMErA-seq following Clasp1(A) microexon exclusion compared to inclusion in mESCs and neurons. Benjamini-Hochberg corrected P < 0.05 shown (two-sided hypergeometric test). Western blot for (g) (pY397) FAK and (h) (pT202/pY204) ERK following dCasRx-RBM25 promoted Clasp1 microexon exclusion (‘-‘) or inclusion (‘+’) in mESCs (left) or neurons (right). Tubulin is blotted as a recovery and loading control. Quantification of relative phosphorylation (Rel. Phos.) compared to cells expressing non-targeting (‘NT’) guides is indicated below. Experiment performed once. i Proposed mechanism for Clasp1(A) microexon mediated differential gene expression. In the absence of Clasp1 microexon ‘A’, microtubule (MT) associated Clasp1 promotes increased focal adhesion turnover thereby inducing focal adhesion kinase (FAK) Y397 phosphorylation, which activates downstream ERK signaling and results in the dysregulation of neurogenesis genes. This effect is stronger in neurons, possibly due to the coordinated inclusion of FAK microexons during neurogenesis. This indicates that inclusion of this microexon in neurons attenuates FAK-ERK signaling to maintain appropriate timing of gene expression to facilitate morphological and developmental changes associated with neurogenesis. Source data are provided as a Source Data file.

Interestingly, we observe premature upregulation of neuronal genes upon forced skipping of the microexon in mESCs compared to when the microexon is subject to forced inclusion (P = 6.7×10-6, two-sided Wilcoxon rank sum test) (Fig. 4b). This DGE signature becomes more pronounced at DIV7 (P = 9.3×10-27, two-sided Wilcoxon rank sum test) (Fig. 4c), and significantly correlates with the DGE signature detected following deletion of the Clasp1 microexon in the CHyMErA-seq screen (Fig. 4d; RmESCs = 0.55, P = 1.3×10-9, Fig. 4e; RDIV7 = 0.27, P = 0.0049. Pearson correlation). We further observe a significant overlap of GO terms enriched among genes that are upregulated in the CHyMErA-seq gene signature and in mESCs (odds ratio = 29.3, P < 2.2 × 10−16, Fisher’s exact test), and DIV7 neurons (odds ratio = 52.1, P < 2.2 × 10−16, Fisher’s exact test) (Fig. 4f). These GO terms are associated with functions related to the regulation of nervous system development, morphogenesis and cell-signaling.

These findings led us to hypothesize that the Clasp1 microexon may be impacting neurogenesis by modulating cytoskeletal remodeling and associated signaling pathways. CLASP proteins, through stabilizing peripheral microtubules with the extracellular matrix, facilitate focal adhesion (FA) turnover by severing cell-matrix connections49,50. Given the connectivity of microexon-containing genes to the FAK-Src signaling pathway (Fig. 2g), we investigated whether the Clasp1 microexon modulates FA dynamics via FAK. FAK mediated signaling is achieved via initial phosphorylation of FAK at Y397, which promotes intermolecular interactions with Src-homology (SH)2 domain containing proteins51. This in turn results in the activation of downstream signaling events, including ERK phosphorylation. Phosphorylation of ERK subsequently promotes molecular changes associated with cell motility, differentiation or cell survival, depending on cellular context51,52. Interestingly, we observe an increase in FAK phosphorylation when the Clasp1(A) microexon is completely skipped in ESCs and this increase becomes much more pronounced in differentiating neurons (Fig. 4g), resulting in increased ERK phosphorylation (Fig. 4h). In contrast, forced inclusion of the microexon has little effect on FAK phosphorylation in ESCs or neurons, although it does increase ERK phosphorylation in ESCs, and to a lesser extent in neurons (Fig. 4h), indicating a potential alternative signaling pathway activated by selective expression of the neuronal isoform. Taken together, these data reveal a previously unknown role for a Clasp1 microexon in modulating neurogenesis. In particular, the Clasp1(A) microexon-skipped isoform promotes FAK activation compared to the microexon-included isoform, which induces downstream ERK signaling, an event that is critical for the progression of neurogenesis53 (Fig. 4i). Similar to the loss of function of neurodevelopmental disorder genes such as SYNGAP154 and PCDH1255, we observe that perturbation of the Clasp1(A) microexon results in the dysregulation of cytoskeletal signaling, as reflected through a significantly altered transcriptome and accelerated neuronal maturation.

Beyond the Clasp1(A) microexon, we applied dCasRx-RBM25 to manipulate the splicing of microexons in Ptprf(A), Camta1, and Ptk2(A) (Supplementary Figs. 6b and 6c). To investigate whether differential gene expression signatures arising from these perturbations correspond to neuronal differentiation gene expression programs, we correlated detected changes in DGE profiles with gene ranking by CellRank2. We observe significant positive correlations for Ptprf(A), Clasp1(A) and Gfra1 microexon exclusion and deletions, and negative correlations for Camta1 microexon exclusion, for both the bulk RNA-seq and CHyMErA-seq data (Supplementary Fig. 6d), confirming roles of these microexons in regulating neuronal gene expression timing. For the Ptk2(A) microexon deletion, we observe a weak positive correlation (R = 0.05, p = 0.02; Pearson correlation) from the CHyMErA-seq data, possibly due to sparse data, and a significant negative correlation (R = -0.21, p = 1.82 × 10−124; Pearson correlation) from bulk RNA-seq data. This suggests that this microexon may serve to promote neuronal differentiation, as part of an overall ‘push-and-pull’ mechanism involving different microexons within the same network of proteins that, collectively, function to promote or attenuate timing of different stages of neurogenesis. Overall, these data confirm the dysregulation of gene programs associated with neuron morphology and development observed in the CHyMErA-seq screen and extend these observations by highlighting disruption of downstream signaling cascades as a result of microexon perturbation.

Neuronal microexons control developmental timing of gene expression signatures linked to autism

We next asked whether the transcriptomic signatures detected in the CHyMErA-seq screen data overlap with those in the brains of individuals diagnosed with ASD. To address this, we utilized ‘Single-Cell rEgulatory Network Inference and Clustering’ (SCENIC)56 to define and score ‘regulon’ activity (i.e., data-derived co-expression networks of genes attributed to co-expressed transcription factors with proximal binding motifs) for individual cells. We focused our analysis on regulons that in a recent study57 show dysregulation in excitatory neurons in ASD compared to non-ASD brains. We compared scores for these ASD-dysregulated regulons between microexon-deleted cells and non-targeting cells, and observed significant differences in activation of Sox6, Ets2, Egr1, Elk1, Foxp1 and Sin3a regulons (P < 0.05, Benjamini-Hochberg corrected two-tailed unpaired t-test) (Fig. 5a). Of these six dysregulated regulons associated with microexon deletion, three are downstream of MAPK-ERK signaling pathway (Ets2, Egr1 and Elk1), which has been implicated in core processes during neurogenesis58. Notably, in addition to the role of the Clasp1(A) microexon in augmenting MAPK-ERK signaling shown in this study, roles in MAPK signaling have been previously described for Bin159, Gfra143, Med2360,61,62, Ptprf63,64 and Ptk251 proteins (which also form a protein-protein interaction network, Fig. 2f) suggesting that the microexons in these proteins may convergently regulate this signaling pathway. Overall, we observe that 6 of 8 microexon deletion signatures show dysregulation of the Egr1 regulon, which is known to mediate the induction of p35, a neuron-specific activator of cyclin-dependent kinase 5 (Cdk5) downstream of ERK, an event that is essential for neurite outgrowth and neuronal differentiation65, thus highlighting a potential common pathway by which microexon-containing regulators of cytoskeletal processes regulate neurogenesis.

Fig. 5: Neuronal microexons control developmental timing of gene expression signatures linked to autism.
figure 5

a Heatmap of effect size (Cohen’s D) of the difference in average regulon scores for microexon deleted cells compared to non-targeting control cells for regulons shown to be dysregulated in excitatory neurons in ASD brains57. *P < 0.05. **P < 0.01, ***P < 0.001, ****P < 0.0001; Benjamini-Hochberg corrected two-tailed unpaired t-test. b Violin plot of SFARI gene module scores (Methods) for cells in the neuroepithelial cluster for Ralgapb (n = 103, P = 1.40×10-6), Ptprf(A) (n = 79, P = 7.70×10-4), Med23 (n = 85, P = 9.10×10-5), Gfra1 (n = 101, P = 9.00 × 10−12) Clasp1(A) (n = 129, P = 1.40×10-6), and Bin1 (n = 108, P = 2.70 × 10−10) microexon deletions or non-targeting control cells (n = 270). Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test. The boxes mark the first and third quantile and the lines inside the boxes mark the median, whiskers extend from the box to the farthest point lying within 1.5 times the inter-quartile range (outliers not shown). c Boxplot of SFARI module gene scores (Methods) for cells with microexon deletions that result in an upregulated neuronal gene expression signature (Bin1, Clasp1(A), Gfra1, Ptprf(A), Med23, Ralgapb) or non-targeting control cells grouped according to binned pseudotime (n = 132 cells per bin). Bin 3, P = 4.00×10-3; Bin 4, P = 4.60 × 10−2; Bin 5, P = 4.00×10-3; Bin 6, P = 2.10 × 10−2; Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test. The boxes mark the first and third quantile and the lines inside the boxes mark the median, whiskers extend from the box to the farthest point lying within 1.5 times the inter-quartile range, outliers visualized as points. d Forest plot of odds ratios (p < 0.05, Fisher’s exact test) for enrichment of upregulated (FDR < 0.05, log2(fold-change) > 0, red) and downregulated (FDR < 0.05, log2(fold-change) <0, blue) differentially expressed genes in ASD-associated datasets from bulk RNA-seq for the indicated microexon exclusion via Cas9-mediated deletion (Gfra1) or dCasRx-RBM25 mediated exclusion (Camta1 upregulated, n = 4290, Camta1 downregulated, n = 3595, Clasp1(A) upregulated, n = 1371, Clasp1(A) downregulated, n = 1107, Gfra1 upregulated, n = 481, Gfra1 downregulated, n = 1000, Ptk2(A) upregulated, n = 664, Ptk2(A) downregulated, n = 75, Ptprf(A) upregulated, n = 1896, Ptprf(A) downregulated, n = 2766). Error bars represent 95% confidence intervals. P values calculated by two-sided Fisher’s exact test and corrected for multiple comparisons (Benjamini-Hochberg). Source data are provided as a Source Data file.

To further investigate whether genes implicated in ASD exhibit dysregulation in response to microexon deletion, we examined the transcriptional profiles of microexon deletions with premature differentiation phenotypes (i.e., Bin1, Clasp1(A), Gfra1, Ptprf(A), Med23, Ralgapb). A ‘module score’ representing expression of genes with genetic links to ASD (using ASD association evidence scores 1-3 from Simons Foundation Autism Research Initiative (SFARI)66) was derived for each single cell, and pair-wise comparisons of these scores were performed for microexon deleted cells and non-targeting cells in the neuroepithelial cell subcluster to investigate potential early activation of ASD-linked gene signatures. We observe a significant increase in the module scores for all deletions examined (P < 0.001, Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test) indicating that skipping of these microexons promotes increased expression of these genes at an early developmental stage. To determine how these module scores deviate from non-targeted cells during development, we binned cells according to their calculated pseudotime and compared ASD-gene expression profiles between microexon-targeted (Bin1, Clasp1(A), Gfra1, Ptprf(A), Med23, Ralgapb microexons) and non-targeted cells (Fig. 5b). We observe that over pseudotime these premature maturation signatures include an early increase in expression of ASD-linked genes (pseudotime bins 3-6. p < 0.05, Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test) followed by a lagging increase in non-targeting cells (pseudotime bin 7, p < 0.05, Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test), before reaching a plateau (Fig. 5c). This suggests that these microexons may tune expression of ASD-risk genes within neuronal transcriptomes during transient periods of neurogenesis. We also observe enrichment of ASD risk genes identified in three studies66,67,68 among the differentially expressed genes in response to perturbation of Clasp1(A), Ptprf(A), Ptk2(A), Camta1 and Gfra1 microexons in our bulk RNA-seq (Fig. 5d).

The convergence of the transcriptomic signatures derived from microexon deletion during in vitro neurogenesis on regulons and that are disrupted in the brains of individuals with ASD, and the altered timing of ASD-linked gene expression, further suggests a more widespread role of neural microexon dysregulation in neurodevelopmental disorders. In this regard, from an analysis of BrainGVEX data from the PsychENCODE consortium69 we note that several of the microexons linked to transcriptomic perturbations in our study show dysregulated splicing in the brains of individuals with Bipolar Disorder (Bin1, Ptk2(A), Ptk2(C), Ptprf(A)) and Schizophrenia (Bin1, Ptk2(A), Ptk2(C), Ralgapb) (P < 0.05, Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test) (Supplementary Fig. 7). Taken together with the observations described above indicating functions for these microexons in controlling neuronal development, our results highlight a potential role for their dysregulation in contributing to the etiology of neurodevelopmental disorders.

Discussion

The CHyMErA-seq platform described in this study enables the capture of single cell transcriptomic phenotypes associated with scalable, splice isoform-resolution perturbation. By applying this platform to the functional analysis of conserved, switch-like neuronal microexons, we observe extensive roles for this class of alternative splicing event in the coordinated and temporal control of gene expression signatures that underlie differentiation transitions during neurogenesis. A subset of genes containing microexons with deletion-associated transcriptomic phenotypes are functionally linked to neuron projection development. This ‘network’ of microexon-containing genes, which includes microexons in the FAK-encoding gene Ptk2, converges on the FAK-Src signaling pathway. These transcriptomic effects are also reflected in the dysregulation of neuritogenesis upon the deletion of a 15 nucleotide microexon in the GDNF co-receptor Gfra1, and perturbed FAK phosphorylation and activation of downstream ERK signaling upon skipping of a 24 nucleotide microexon in Clasp1. Collectively, we observe that multiple neuronal microexons appear to function by ensuring robust and accurate timing of distinct stages of neurogenesis. In particular, several of the analyzed microexons have convergent effects on transcriptomes that suggest they serve as molecular ‘brakes’ during neurogenesis. This unexpected observation supports the conclusion that specific microexons may have important roles in modulating protein functions to meet context-specific requirements during neurogenesis. For example, the Gfra1 microexon reduces ligand-binding affinity46, and its deletion in this study leads to the increased expression of genes involved in neuronal differentiation and morphogenesis, along with enhanced neuritogenesis. In contrast, deletion of Ptk2 microexons (which normally enhance phosphorylation activity39,40) leads to downregulation of neurogenesis-related genes (Fig. 2g). This is opposite to the effect seen with deletion of the Clasp1 microexon, which promotes FAK phosphorylation and accelerates neurogenesis. Moreover, the overlap of the microexon-dependent gene expression signatures with dysregulated orthologous gene expression patterns detected in the brains of ASD-affected individuals provides new insight into possible convergent pathogenic mechanisms associated with brain disorders.

The results from our CHyMErA-seq screen complement and extend those from previous studies focusing on the functions of individual microexons. For example, we previously demonstrated that deletion of activity-regulated neuronal microexons in the translation initiation factors eIF4G1 and eIF4G3 results in the selective upregulation of numerous synaptic proteins, including those that control neuronal activity and plasticity20. In another study, altered ratios of expression of neuronal microexon-containing and lacking isoforms of the cytoplasmic polyadenylation element binding protein 4 (Cpeb4) was shown to affect the deadenylation and expression of a program of transcripts enriched in neuronal functions and ASD risk variants21. Perturbation of the eIF4G and Cpeb4 microexons was also found to affect higher-order cognitive functioning, including social behavior, and both were shown to function by altering the propensity of their host proteins to form condensates that impact translation and deadenylation, respectively20,21,70. More recently, it was reported that deletion of a neuronal microexon in Disheveled associated activator of morphogenesis 1 (Daam1) increases neurite length, mediated in part by hyperactivation of RHOA/ROCK signaling cascades, and further results in learning and memory deficits, reduced long term potentiation, and abnormal neuronal morphology in a mouse model71. Additional studies in mice and zebrafish have implicated neuronal microexons in the control of neurite formation, including Unc13b16, Zfyve27/Protrudin72,73, and the GTPase activity-regulating proteins Evi5b, Itsn1 and Vav274. Transcriptomic analysis of zebrafish larvae harboring microexon deletions have revealed differential regulation of genes involved in core neuronal processes such as synapse and neuron projection, including genes implicated in ASD74. In contrast to these previous studies, through the development of the CHyMErA-seq system the results of the present study revealed new and unexpected functions of microexons in the control of single cell transcriptomes during neurogenesis. Moreover, consistent with previous studies demonstrating that heterozygous deletion of alternative exons can have a significant impact on transcriptomic regulation75,76, the results of our CHyMErA-seq screen and follow-up experiments using dCasRx-RBM25-directed exon manipulation further demonstrate that induced partial skipping of microexons can lead to significant phenotypes, an observation that may also have relevance for interpreting the significance of partial exon skipping levels detected in patient brain samples.

During the past several years, CRISPR-based screening approaches have been developed for the systematic interrogation of gene function in the context of different phenotypic read-outs, including cell fitness, gene expression signatures, cell organization, and other morphological features77. A subset of these approaches has enabled targeted or genome-wide CRISPR gene perturbations coupled to the analysis of single cell transcriptomes. Such ‘perturb-seq’ strategies have been applied in the contexts of in vitro78,79,80,81,82,83 and in vivo models84,85, providing important insights into the roles of specific genes in cell lineage specification and maturation during nervous system development. However, these systems simultaneously perturb all isoforms transcribed from targeted genes, and therefore miss detection of distinct phenotypes conferred by subgenic regions, including those regulated by alternative exons. For example, distinct phenotypes arising from differential splicing (and other forms of isoform regulation) may be due to altered protein localization, protein-protein and other ligand interactions, and in some cases dominant-negative or gain-of-function forms of proteins4,5,6.

The CHyMErA-seq system described in the present study enables for the first time the impact of splice isoform-level perturbations on single cell transcriptomic read-outs to be measured on a large-scale. This work is complemented by a similar study applying exon-resolution single-cell CRISPR screens in cancer cell lines86. Applying this system to neuronal microexons uncovered alternative splicing-dependent gene expression signatures associated with the regulation of developmental timing during neurogenesis, and further illuminated an additional connection between microexon dysregulation and neurodevelopmental disorders. The application of CHyMErA-seq to the systematic functional interrogation of transcript level variation across a broader range of biological contexts, coupled to different phenotypic read-outs, is expected to further provide valuable insights into gene function at an isoform-level resolution.

Methods

Cell culture

CGR8 mouse embryonic stem cells (mESCs) were cultured on gelatin-coated plates in GMEM supplemented with 100 μM β-mercaptoethanol, 0.1 mM nonessential amino acids, 2 mM sodium pyruvate, 5000 units ml–1 of penicillin/streptomycin, 1000 units ml–1 of recombinant mouse LIF (all Life Technologies) and 10% ES fetal calf serum (Wisent). Cells were passaged using Trypsin (Life Technologies) and maintained at subconfluent conditions at 37 °C and 5% CO2. Cells were regularly monitored for mycoplasma infection.

In vitro neuron differentiation

In vitro differentiation of CGR8 mESCs into neurons was performed as previously described87. At day in vitro (DIV) -8, 5.1 × 106 cells were seeded on 15 cm non-tissue-culture-treated plates containing 30 mL of serum-free differentiation media (GMEM containing 5% KnockOut Serum replacement (Thermo Fisher Scientific), 0.1 mM nonessential amino acids, 1 mM sodium pyruvate, 5000 units ml–1 of penicillin/streptomycin and 100 μM β-mercaptoethanol). Cells were incubated for 3 days undisturbed. 50% media changes were conducted every 48 h. At DIV -5 the media was supplemented with 6 μM retinoic acid (RA) with subsequent media changes also containing 6μM RA. At DIV 0, cellular aggregates were dissociated with TrypLE Express (Gibco) for 5 minutes at 37 °C. Trypsinization was halted with soybean trypsin inhibitor (Thermo Fisher Scientific) and aggregates were homogenized by trituration with a 1 mL micropipette. Cells were pelleted for 5 minutes at 300xg. Neural progenitors were washed in N2 medium (Neurobasal-A medium (Gibco) with 1x N2 supplement: 2 mM glutamine and 5 000 units ml–1 of penicillin/streptomycin (Gibco)), counted using a hemocytometer, and plated at a density of 1.5 × 105cells/cm2 onto poly-D-Lysine (Sigma-Adlrich) and laminin (mouse, Sigma-Aldrich) coated dishes. The plated neural progenitors were washed with N2 medium after 24 h to remove residual serum and non-adherent cells. At DIV2, the N2 medium was replaced with B27 medium (Neurobasal-A supplemented with antibiotics, 2 mM glutamine and 1xB27 supplement (Gibco)). Subsequently, differentiating neurons were provided with full medium changes with B27 on DIV4, and every 3 days onward.

Lentiviral transduction

HEK293T cells were seeded in DMEM (Wisent) supplemented with 10% FBS (Gibco) and 500 U mL-1 penicillin/streptomycin at a density of 5 × 105 cells per 6 well plate. 24 h later the cells were transfected with packaging vectors psPAX2 (Addgene: 12260), pMD2.G (Addgene: 12259) and the target vector at a 1:1:1 molar ratio. The next day the media was changed to high-BSA harvest media (DMEM supplemented with 1.1% (w/v) BSA and 5000 U mL-1 penicillin/streptomycin). 24 h later, virus containing media was collected, spun down at 500 rpm to pellet any packaging cells, and either immediately used for transduction, or snap-frozen in liquid nitrogen and stored at -80 °C. To generate the hgRNA library, transfections were scaled up to a 10 cm plate. Transductions were carried out by adding virus containing media and 8 μgmL-1 polybrene to the intended cells.

Cas12a variant vectors

Neomycin resistance was subcloned from the plenti-Lb-Cas12a-2xNLS vector (Addgene: 155046) into pRDA_112 (Addgene: 136475) via restriction cloning to allow for multiple drug selection of cells transduced with Lenti-Cas9-2 A-Blast (Addgene: 73310). Addition of c-myc nuclear localization signals (NLSs) to the c-terminus, or removal of the n-terminal SV40 NLS of each vector, was achieved by site directed mutagenesis (Q5 Site-Directed Mutagenesis Kit (NEB)).

Cell line generation

CHyMErA cell lines

CGR8 cells were transduced with lentivirus carrying the Cas9-2A-BlasticidinR-expressing cassette (Addgene, no. 73310) and selected with blasticidin (6 µg ml-1) for 10 d. Cas9-expressing cell lines were then transduced with lentivirus carrying enAs- or Lb-Cas12a-2A-NeoR expression cassettes and selected with G418 (500 µg ml–1) for 14 d.

Gfra1 microexon-deletion clones

Cas9 guides targeting regions flanking the Gfra1 microexon or intergenic regions were cloned into px459 (Addgene: 62988) and transfected into CGR8s. Transfected cells were selected with 1 μgmL-1 puromycin selection for 3 days and plated by limiting dilution into 96-well plates for single-cell isolation. Isolated clonal cells lines were genotyped by genomic PCR of the edited region and analyzed by RT-PCR to confirm expression of the microexon excluded isoform.

dCasRx-RBM25 lines

Guide RNAs were designed either overlapping the microexon or within the downstream intron, to promote skipping or inclusion of the exon, respectively, and were scored using the TIGER algorithm88. Guide sequences were cloned into pLentiRNAGuide (Addgene: 138150) and transduced into CGR8 lines stably expressing dCasRx-RBM2511 (Addgene: 221002) and selected with 1 μgmL-1 puromycin for 3 days.

CROP-seq vector construction

Single Guides

The tracrRNA region of the original CROPseq-Guide-Puro (Addgene: 86708) was removed by site directed mutagenesis (Q5 Site-Directed Mutagenesis Kit). Guides were then cloned into this modified vector as previously described7 with minor modifications. Briefly, complementary oligonucleotides containing Cas9 and Cas12a spacer sequences separated by a buffer sequence were ordered, annealed and ligated into Esp3I-digested modified CROP-seq vector. To introduce the intervening Cas9 tracrRNA and Cas12a direct repeat sequences, a golden gate assembly was performed with Esp3I and a TOPO vector carrying Cas9 tracrRNA and either the As- or Lb- Cas12a direct repeat over 12 cycles ((1) 37 °C, 30 min; (2) 16 °C, 30 min; (3) 24 °C, 60 min; (4) 37 °C, 15 min; (5) 65 °C, 10 min; steps 1–3 were repeated for 11 cycles) using a vector/insert ratio of 1:25. Ligations were transformed into Stbl3 chemically competent cells (Invitrogen) and successful transformants were confirmed by Sanger sequencing.

Screen library

For construction of hgRNA libraries, Cas9 and Cas12a gRNA sequences were cloned into a lentiviral vector via two rounds of cloning (as in ref. 7) with slight modifications. Oligo pools (Twist Biosciences) were designed carrying 20-nt Cas9 and 23-nt Cas12a guide sequences separated by a 32-nt buffer sequence flanked by Esp3I restriction sites, all flanked by short sequences containing BfuAI restriction sites. Oligonucleotides were amplified by PCR over ten cycles using NEB Q5 Ultra polymerase (NEB) ((1) 98 °C, 30 s; (2) 98 °C, 10 s; (3) 53 °C, 30 s; (4) 72 °C, 10 s; (5) 72 °C, 2 min; steps 2–4 repeated for nine cycles). Amplified oligos were purified on a PCR purification column (GeneJET PCR purification kit; Thermo Fisher Scientific), and an aliquot was run on a 2% agarose gel to check purity. The modified CROP-seq vector backbone was digested with Esp3I (NEB) overnight at 37 °C. The digested backbone was dephosphorylated with rSAP (NEB) for 1 h at 37 °C and gel purified (GeneJET gel extraction kit; Thermo Fisher Scientific). Amplified oligos were digested with BfuAI (NEB) and ligated into the digested modified CROP-seq backbone using T4 ligase (NEB) in a combined reaction overnight at 16 °C using a 3:1 insert to vector ratio. The purified ligation reaction was used to transform Endura competent cells (Lucigen) by electroporation (1-mm cuvette, 25 uF, 200 Ω, 1600 V), and a sufficient number of cells were plated on 15-cm ampicillin Luria–Bertani (LB) agar plates to reach a library coverage > 1000-fold. Bacterial colonies were scraped from the plates, pooled and pellets were collected. The Ligation 1 library plasmid was extracted and purified (GeneJET Maxi-Prep Kit; Thermo Fisher Scientific).

In the second step, Cas9 tracrRNA and the Cas12a direct repeat were inserted into the pooled library. The Ligation 1 plasmid library was digested overnight using Esp3I (NEB) and dephosphorylated using rSAP (1 h, 37 °C) and purified on a PCR purification column. A TOPO vector carrying Cas9 tracrRNA and the Cas12a direct repeat was digested using Esp3I and subsequently ligated into the digested pLCHKO-Ligation 1 vector overnight over 12 cycles ((1) 37 °C, 30 min; (2) 16 °C, 30 min; (3) 24 °C, 60 min; (4) 37 °C, 15 min; (5) 65 °C, 10 min; steps 1–3 were repeated for 11 cycles) using a vector/insert ratio of 1:25. The purified ligation reaction was used to transform Endura competent cells by electroporation (1-mm cuvette, 25 uF, 200 Ω, 1600 V), and a sufficient number of cells were plated on 15-cm ampicillin LB agar plates to reach a library coverage of > 1000-fold. Bacterial colonies were scraped from the plates, pooled and pellets were collected. The Ligation 2 library plasmid was extracted and purified (GeneJET Maxi-Prep Kit; Thermo Fisher Scientific).

CHyMErA efficiency testing

CGR8 mESCs were transduced with lentivirus carrying an hgRNA-expressing modified CROP-seq vector targeting microexons or a non-targeting control. The day after transduction, cells were subjected to 1 μgml−1 puromycin selection for 3 days to ensure complete selection. Genomic DNA (gDNA) from cells was purified (PureLink Genomic DNA Mini Kit; Themo Fisher Scientific) and the targeted region was amplified by PCR. PCR products were resolved on agarose gels and band intensity was quantified with ImageJ. Percent editing was calculated as the quotient of the size normalized band intensities:

$$\%{{\rm{Deletion}}}=100*\frac{\left(\frac{{DeletionBan}{d}_{{intensity}}}{{DeletionBan}{d}_{{length}}}\right)}{\left(\frac{{UncutBan}{d}_{{intensity}}}{{UncutBan}{d}_{{length}}}+\frac{{DeletionBan}{d}_{{intensity}}}{{DeletionBan}{d}_{{length}}}\right)}$$
(1)

To independently determine approximate editing efficiencies of the eCHyMErA method, amplified PCR products shown in Fig. 1c were analyzed by Oxford Nanopore long-read sequencing (performed by Plasmidsaurus Inc.). Raw reads were aligned to sequences corresponding to the amplified regions. Editing efficiency was quantified by comparing the relative proportions of reads mapping to Cas9/Cas12a deleted regions versus adjacent unedited sites. Editing efficiencies estimated by quantification of gel band intensities and long-read sequencing were within 10% of each other.

Library validation

gDNA from transduced CGR8s was harvested and integrated hgRNA sequences were amplified using Q5 Ultra polymerase (NEB) ((1) 98 °C, 30 s; (2) 98 °C, 10 s; (3) 65 °C, 30 s; (4) 72 °C 20 s; (5) 72 °C 2 min; steps 2-4 were repeated 25 times) primers:

CROP-Cas9-F_s[N]:

ACACGACGCTCTTCCGATCT[N]CTTGTGGAAAGGACGAAACACC

CROP-Cas12a-R_s[N]: CTGGAGTTCAGACGTGTGCTCTTCCGATCT[N]GTGTCTCAAGATCTAGTTACGCCAAGC

Where [N] indicates intervening nucleotides to generate staggered amplicons to increase library diversity.

These amplicons were then subject to a second round of PCR ((1) 98 °C, 30 s; (2) 98 °C, 10 s; (3) 55 °C, 30 s; (4) 72 °C 20 s; (5) 72 °C 2 min; steps 2-4 were repeated 14 times) to add sequencing adapters. The resulting amplicons were gel purified and paired-end sequenced on an Illumina MiSeq instrument using the following cycling parameters: 22 dark cycles, read 1: 27 cycles, index 1: 8 cycles, index 2: 8 cycles, 35 dark cycles, read 2: 27 cycles. Raw fastq files were aligned to an index of hgRNA sequences using Bowtie89 to determine library representation.

Single nuclear RNA-seq library preparation

DIV7 neurons were harvested with Hank’s Balanced Salt Solution (Gibco) supplemented with 20 Uml−1 Papain (Worthington), 5 mM MgCl2 and 5 μgmL−1 DNase I (Worthington), counted and pelleted and immediately snap-frozen in liquid nitrogen and then stored at −80 °C.

Nuclei extraction and fixation were performed as previously described26, except for the use of a modified CST lysis buffer90 plus 1% of SUPERase In RNase Inhibitor (AM2696). Nuclei quality was checked with DAPI and Wheat Germ Agglutinin (WGA) staining. Sci-RNA-Seq3 libraries were generated as previously described91 using three-level combinatorial indexing. The final libraries were sequenced on an Illumina NovaSeq with read cycles as follows: read 1: 34 bp, read 2: 69 bp, index 1: 10 bp, index 2: 10 bp.

Hemi-nested hgRNA enrichment PCR

Enrichment PCR was adapted from92 for capture of Cas9 and Cas12a spacer sequences. hgRNA sequences were sub-amplified from the barcoded library using three rounds of PCR. In the first round, primers annealing upstream of the hgRNA sequence and to the P5 barcode were used to amplify the region of the guide-barcode transcript. In the second round, primers annealing upstream of the Cas9 and Cas12a sequences were added to sub-amplify the spacer sequences and provide handles for Illumina adaptors. Illumina adaptors were added in a final round of PCR. After each PCR reaction, libraries were purified with Ampure XP beads (Beckman Colter), and product sizes were confirmed by agarose gel electrophoresis. Final libraries were sequenced on an Illumina NovaSeq with the read cycles: read 1: 34 bp, read 2: 69 bp, index 1: 10 bp, index 2: 10 bp.

Immunofluorescence and neurite length analysis

Gfra1 microexon deletion clones or wild-type CGR8 mESCs were differentiated into neurons (see In vitro Neuron Differentiation). At DIV0 neural progenitors were plated on Laminin (Sigma-Aldrich) and Poly-D-Lysine (Sigma-Aldrich) coated coverslips in 12-well plates. At DIV3 and DIV7, cells were washed with PBS and fixed in 4% PFA for 15 min. Cells were washed with PBS and permeabilized in 0.25% Triton X-100 for 10 min. Cells were then washed 3 times in PBS, blocked in PBS with 2.5% BSA for 1 h, then incubated overnight with 1:2000 chicken anti-MAP2 antibody (Abcam; 5392) in PBS with 2.5% BSA. The next day cells were washed 3 times with PBS and incubated with goat anti-chicken IgG Alexa-Fluor 647 conjugated secondary antibodies and 1:1000 DAPI in PBS with 2.5% BSA for 1.5 h. Slips were washed and mounted on glass slides and imaged on an Axiovert 200 m inverted fluorescence microscope (Zeiss). Image IDs were randomized to blind the researcher to the cell type analysed, and neurites from MAP2 stained neurons were quantified with the NeuronJ plugin93 for ImageJ.

Immunoblotting

For analysis of FAK and ERK phosphorylation, CGR8 mESCs or DIV7 neurons were washed in ice-cold PBS and lysed in ice-cold RIPA buffer (10 mM Tris-HCl, 150 mM NaCl, 0.1% SDS, 0.5% sodium deoxycholate, 1% Triton X-100, 1 mM EDTA) supplemented with cOmplete protease inhibitor cocktail (Roche) and PhosSTOP phosphatase inhibitor (Roche). Lysates were collected in 1.5 mL Eppendorf tubes, sonicated, and protein content was determined using the Bradford assay (BioRad). Samples were mixed with Laemmli buffer and 25 μg of protein was resolved on a polyacrylamide SDS-PAGE gel before overnight transfer to a PDVF membrane. Membranes were blocked for 1 h in either 5% milk or 5% BSA in TBS-T for endogenous and phospho-antibodies, respectively. Membranes were then washed 3 times in TBS-T and incubated with 1:1000 anti-FAK (Cell Signaling, 3285) or 1:1000 anti-p44/42 MAPK (Erk1/2) (Cell Signaling, 9102) in 5% milk TBS-T, or 1:1000 anti-Phospho-FAK (Y397) (Cell Signaling, 3283) or 1:1000 anti-Phospho-p44/42 MAPK (Erk1/2) (Thr202/Tyr204) (Cell Signaling, 9101) in 5% BSA TBS-T overnight at 4 °C. Membranes were then blotted for tubulin (Sigma-Aldrich, T-6074) as a recovery and loading control. Detection was achieved using anti-rabbit/mouse IgG HRP-linked secondary antibodies (Invitrogen) and chemiluminescence (Perkin Elmer, NEL120001EA).

RT-PCR and RNA-seq

Total RNA was extracted using the RNeasy Mini Kit (QIAGEN) as recommended by the manufacturer. For analysis of microexon splicing, RT-PCR was performed with the OneStep RT-PCR kit (Qiagen) and products were resolved on 3.5-4% agarose gels stained with ethidium bromide. Exon inclusion and exclusion were quantified by ImageJ. For RNA-seq, RNA was similarly isolated and treated with on-column DNAse (Qiagen). Libraries were then generated for polyA+ stranded RNA-seq with the NEB Next Ultra II Directional kit and 150 nucleotide paired-end reads were sequenced on an Illumina NovaSeq 6000 at an average depth of 25 million reads.

Computational procedures

Microexon-targeting library design

For the microexon targeting hgRNA library, mouse microexons were selected according to the following criteria. 1) they correspond to an orthologous human exon. 2) they exhibit strong differential inclusion during in vitro neuronal differentiation30, with ≥ 50% increase in inclusion level between undifferentiated cells (DIV -8) and neurons (DIV 7-28), (3) they are spliced in transcripts with expression ≥ 5 corrected Reads Per Kilobase-pair and Million mapped reads (cRPKM), and (4) they exhibit differential inclusion in an in vivo analysis of mouse brain development31. Microexons from this set were further selected on the basis of their location in genes with defined roles in neurogenesis, and or that have been implicated in ASD, using ASD association evidence scores 1-3 from the Simons Foundation Autism Research Initiative (SFARI).

From this set of microexons we extracted and scored Cas9 (CRISPOR32) and Cas12a (CHyMErA-NET7) guides in the flanking intronic regions, ensuring minimally a 150 bp distance from annotated upstream and downstream exon splice sites, and a 100 bp distance from the microexon splice sites. The highest scoring pairs of guides were used to direct hgRNA library design. These selection steps resulted in 37 microexons with introns that could accommodate multiple pairs of upstream and downstream targeting guides. Intronic guides were designed in a similar fashion, except that both Cas9 and Cas12a guides were restricted to either the upstream or downstream intron, leaving a distance of at least 300 bp from any splice site to ensure that the splicing of the microexon or neighboring exons would not be affected. Guides targeting constitutive exons were designed using CRISPICK94,95, and non-targeting and intergenic control guides were derived from a previously published library7.

Single nuclear RNA-seq data analysis

Raw data processing

Raw sequencing reads were demultiplexed based on i5/i7 PCR barcodes and processed using a modified pipeline from26 implemented in a Snakemake framework. First, barcodes and unique molecular identifiers (UMIs) were extracted from read1 of FASTQ files followed by trimming of read 2. Reads were then aligned to the mouse genome (mm10) with STAR short-read aligner (v 2.6.1 d) using Gencode vM12 gene annotations and quality filtered. Reads were deduplicated based on UMIs, and summarized into a count matrix of M genes x N nuclei, which was converted into AnnData object in Scanpy (v1.9.3)96.

Quality filtering and normalization

Doublets were removed with Scrublet97 and cells with a detected number of genes, total RNA counts, or mapping rate greater than five absolute deviations from the median were discarded, as were cells with > 1% mitochondrial RNA. Total counts per cell were normalized and log transformed.

Dimensionality reduction and clustering

The top 2000 most variable genes were extracted and used in principal component analysis (PCA) to reduce the dimensionality of the data. Neighborhood graphs were constructed using 30 principal components and data were visualized with Uniform Manifold Approximation and Projection (UMAP). Clustering was performed using the Leiden algorithm (leidenalg v 0.10.2) with a resolution of 0.1 to identify broad cell clusters and 0.8 to identify subclusters. A specific subcluster (‘15’) was noted to have aberrantly high RNA counts, and expression signatures consistent with doublets, and therefore was excluded from downstream analysis. Clusters were manually annotated based on expression of known cell type marker genes and label transfer using an embryonic mouse dataset and named accordingly (see Cell-type label transfer).

Diffusion map and pseudotime analysis

A diffusion map was computed using sc.tl.diffmap, and a root cell was identified based on the minimum value in the third diffusion component corresponding to cells with transcriptomes indicative of early stages of differentiation. Pseudotime analysis was performed using the diffusion pseudotime (DPT) method (sc.tl.dpt)98.

Cell-type label transfer

Preprocessed data were loaded into a Seurat (v 4.3.0)99 object. Reference data from Cao et al. (ref. 26)or Pijuan-Sala et al. (ref. 100) underwent similar preprocessing, including normalization, scaling, and dimensionality reduction via PCA. To transfer labels from reference data, integration of the query and reference datasets was performed using Seurat’s anchor-based label transfer approach. Anchors were identified between the datasets using the first 30 principal components, and cell-type annotations from the reference dataset were transferred to the query dataset.

CellRank2 analysis

To explore neurodevelopmental trajectories and cellular dynamics, pseudotime analysis coupled to CellRank2 was employed to infer the progression of cells along a continuous developmental path. Transition matrices were computed using the PseudotimeKernel from CellRank2 (v2.0.4)33,42, followed by embedding projections. The macrostate analysis and terminal state probabilities were calculated with the ‘Generalized Perron Cluster Cluster Analysis’ (GPCCA) estimator (pygpcca v1.0.4)101, and coarse-grained transitions were visualized. Lineage driver genes were identified and analyzed by computing their correlation with specific lineage states. Upregulation of neuronal lineage ‘driver genes’ was calculated by two-sided Wilcoxon rank sum test comparing up- or down-regulated gene ranking (FDR < 0.05) with background genes that were not differentially expressed, for either single-nuclei or bulk RNA-seq.

Guide assignment

Sequencing data from hgRNA libraries were processed using a Snakemake pipeline. Raw FASTQ files were processed to attach UMIs using custom scripts adapted from ref. 26, incorporating ligation and reverse transcription barcodes. Reads were aligned to a non-redundant hgRNA index using Bowtie102. Aligned reads were filtered to retain high-quality mappings. hgRNA-UMI pairings were considered chimeric if they represented less than 80% of the reads for a given hgRNA-UMI pairing within a given cell, and were removed using a custom Python script, and the resulting non-chimeric BAM files were indexed. Deduplication was performed using UMI-tools103 to retain unique molecules.

Aligned reads with UMIs were processed to assign guides to individual cells. Reads were grouped by cell and guide, and UMIs were counted to determine the number of unique reads per cell and guide. Guides were assigned to cells by prioritizing the guide pair with the highest combined UMI counts or, for non-redundant guides, where possible, the single guide with the highest UMI count. Perturbation targets were annotated based on guide metadata, and final assignments were compiled.

Perturbation calling

Normalization to non-targeting cells

To normalize individual perturbation signatures to cells at a similar stage in development, gene expression data was normalized to the average expression in the \({{\rm{k}}}=\) 30 nearest non-targeting cells (using a similar strategy described in ref. 35). Principal component analysis (PCA) was performed, and the 30 nearest neighbor non-targeting cells were calculated for each cell based on Euclidean distance from the top 40 PCs. We calculated a normalized gene expression vector for each cell \(\left({\widetilde{x}}_{i}\right)\) subtracting the average expression of the neighboring cells \(\left({\bar{x}}_{{N}_{i}}\right)\) for each gene:

$$\begin{array}{c}{\widetilde{x}}_{i}={x}_{i}-{\bar{x}}_{{N}_{i}}\end{array}$$
(2)

where:

$$\begin{array}{c}{\bar{x}}_{{N}_{i}}=\frac{{\sum }_{j\in {N}_{i}}{x}_{j}}{k}\end{array}$$
(3)

Perturbation assignment

Raw count data for each perturbation were aggregated into pseudo-bulk profiles by cluster and guide RNA assignment. Differentially expressed genes were identified by pairwise comparison of pseudobulk expression profiles for cells expressing guides targeting the same microexon, and non-targeting cells. For each comparison, we used the likelihood ratio test (LRT) from edgeR (v3.38.4) to calculate log2(fold change) and FDR-adjusted p values. Only pseudobulks comprising at least 5 aggregated cell expression profiles, and perturbations with at least 25 total cells were considered. For each perturbation, the normalized expression values calculated above (Normalization to non-targeting cells) corresponding to the DEGs (minimum of 5 differentially expressed genes; FDR < 0.05) were used as input for classification.

To classify cells as perturbed or non-targeting we applied quadratic discriminant analysis (QDA), implemented in sci-kit learn (v1.2.2), to calculate classification assignments based multivariate Gaussians fit on normalized expression values \({\widetilde{x}}_{i}\). QDA models each class as a multivariate Gaussian with its own class-specific covariance matrix \({\Sigma }_{k}\) and mean \({{{\rm{\mu }}}}_{k}\) and assigns classifications using Bayes’ rule:

$$\begin{array}{c}{{\rm{P}}}\left({y}_{i}=k,|,{\widetilde{x}}_{i}\right)=\frac{P\left({\widetilde{x}}_{i},|,{y}_{i}=k\right)\cdot P\left({y}_{i}=k\right)}{P\left({\widetilde{x}}_{i}\right)}\end{array}$$
(4)

for which we calculate the class conditional likelihood as:

$$\begin{array}{c}{{\rm{P}}}\left({\widetilde{x}}_{i},|,{y}_{i}=k\right)=\frac{1}{{\left(2{{\rm{\pi }}}\right)}^{d/2}{\left|{\Sigma }_{k}\right|}^{1/2}}\exp \left(-\frac{1}{2}{\left(\widetilde{x}-{{{\rm{\mu }}}}_{k}\right)}^{{{\rm{\top }}}}{\Sigma }_{k}^{-1}\left(\widetilde{x}-{{{\rm{\mu }}}}_{k}\right)\right)\end{array}$$
(5)

Priors were inferred from class proportions in the data. QDA was performed across 1000 bootstrapped replicates. During each bootstrap iteration, cells were randomly resampled with replacement from the perturbed and non-targeting groups to fit the model and predictions were made for all cells. The bootstrapped classifications were aggregated to compute an overall classification accuracy for each cell.

Statistical significance of these classifications was assessed using a permutation-based approach. Labels for perturbations were shuffled to create a null distribution of classification accuracies. For each cell, an empirical p value was calculated by comparing its observed classification accuracy to the null distribution:

$$\begin{array}{c}{p}_{{valu}{e}_{i}}=\frac{1+{\sum }_{n=1}^{N}I\left[{\hat{p}}_{i}\le {p}_{i}^{{\mbox{null}},n}\right]}{N+1}\end{array}$$
(6)

where \({\hat{p}}_{i}\) is the observed classification accuracy of cell \(i\), and \({{p}_{i}}^{{null},n}\) is the accuracy in the nth permutation.

These p values were corrected for multiple comparisons (Benjamini-Hochberg), and cells with a corrected p-value < 0.05 were classified as ‘knockout’, and cells with p > 0.05, or perturbations with fewer than 25 cells were classified as ‘non-perturbed’.

Guide correlation analysis

To investigate how transcriptional signatures between guides correlate compared to non-targeting cell populations, we compared pseudobulks of DEGs (FDR < 0.05) for each microexon deletion. We computed a Pearson correlation between log-normalized expression values from cells expressing guides targeting the same microexon, and those expressing non-targeting, or intergenic targeting guides. Significance was assessed by Benjamini-Hochberg corrected two-sided Wilcoxon rank sum test.

Comparison of microexon-deletion and gene knockout cells

For each microexon deletion that resulted in a detectable transcriptomic phenotype, we identified differentially expressed genes (FDR < 0.1) and performed principal component analysis (PCA) on their normalized expression values across single cells (see ‘Normalization to non-targeting cells’ in the Methods). Cells were embedded in PCA space using the top 40 principal components, or fewer if limited by sample size. To assess transcriptomic differences, we performed pairwise comparisons using Hotelling’s T² test between microexon-targeting cells and those expressing microexon deletion, gene knockout guides, or intergenic negative control guides, relative to non-targeting controls. This approach enabled statistical testing of whether transcriptomic signatures induced by microexon deletion were distinct from those induced by gene knockout or negative control perturbations.

Temporal analysis of differential gene expression

Gene signature scores were computed per cell by summing normalized log-expression values (log1p-transformed) of genes significantly upregulated in Gfra1 microexon deletion (KO) versus non-targeting (NT) cells (logFC > 0, FDR < 0.05). Scores were z-normalized across all cells. A generalized additive model (GAM) was then fitted to the scaled scores as a function of pseudotime using a smooth interaction with guide type. Model fits and 95% confidence intervals were computed using the mgcv (v 1.9_3) package in R104.

Gene ontology enrichment and overlap analysis

GO enrichment was performed using g:Profiler (v 0.2.3) implemented in R105. Differentially expressed genes were used as input with a custom background set of genes comprising all genes involved in DEG testing for each pair-wise comparison.

To assess concordance between GO enrichments for CHyMErA-seq and bulk RNA-seq gene expression signature, we identified significantly enriched GO terms (adjusted P < 0.05) and constructed contingency tables of term overlap. Fisher’s exact test was applied to determine the statistical significance of overlap and odds ratios from both platforms were used to compute combined enrichment scores. GO terms were reduced using semantic similarity matrices using the rrvgo106 (v 1.18.0) package in R, with grouping based on parent terms at a threshold of 0.8.

Gene set enrichment analysis (GSEA)

GSEA was performed using the fgsea package in R107. For analysing single cell expression signatures, genes were ranked by their log2(fold-change) values from edgeR. Gene sets were retrieved from MSigDB108,109, filtered for GO terms comprising between 15 and 150 genes. For analysing the CellRank driver genes, genes with a significant correlation (p < 0.05) were ranked according to their correlation with the neuronal trajectory, and gene sets were filtered for GO terms comprising between 15 and 500 genes.

SCENIC analysis

Gene regulatory network (GRN) inference and downstream analysis were performed using the pySCENIC pipeline (v 0.12.1)56,110 using preprocessed data (see Single Nuclei RNA-seq Data Analysis). For regulatory network analysis, the gene expression data were converted into a Loom file format, which was then used as input for GRNBoost2111, to infer gene regulatory networks. After generating the adjacency matrix of inferred gene regulatory interactions, regulon pruning was performed to reduce redundancy and focus on co-expression of genes with transcription factors with proximal binding motifs. AUCell110 was then used to calculate regulon activity for each single cell.

Regulons previously identified as being dysregulated in the brains of individuals with ASD57 were analysed for differences in average scores between microexon perturbed cells and non-targeting control cells. Regulon scores were z-normalized and p values were calculated for each pairwise comparison by two-tailed unpaired t-test followed by Benjamini-Hochberg correction.

ASD gene-expression timing analysis

Expression modules for individual cells in the neuroepithelial cluster were calculated using Seurat’s AddModuleScore function with ASD-linked genes in the SFARI gene dataset that were differentially expressed in at least 1 microexon deletion (FDR < 0.05) as the input genes. These scores were then compared with a population of ‘non-targeting’ cells in the same cluster and significance was assessed by Wilcoxon rank-sum test. To test the developmental timing of ASD-linked gene dysregulation, cells from microexon deletions with pre-mature differentiation phenotypes (Bin1, Clasp1(A), Gfra1, Ptprf(A), Med23, Ralgapb microexon deletions) or non-targeting cells, were binned according to pseudotime and pairwise comparisons of SFARI gene module score for each pseudotime bin were calculated by Wilcoxon rank-sum test. P values were adjusted by Benjamini-Hochberg correction.

ASD gene enrichment analysis

Enrichment was calculated for genes from ASD datasets in up- and downregulated genes (FDR < 0.05) for each microexon exclusion directed by dCasRx-RBM25, or for clonal line deletion of the Gfra1 microexon. Odds ratios were calculated by Fisher’s exact test and p values were adjusted by Benjamini-Hochberg correction.

Bulk RNA-seq differential gene expression analysis

Clonal cell lines harboring Gfra1 microexon deletions, or simultaneously prepared clonal cell lines without genetic perturbation, were differentiated into neurons in vitro (as above), RNA was extracted at DIV7, and RNA-seq was performed (see RT-PCR and RNA-seq methods above). For dCasRx-RBM25 lines, cell lines expressing microexon inclusion or exclusion gRNAs were differentiated into neurons in vitro and RNA from two biological replicates for mESCs or DIV7 neurons was collected and RNA-seq performed. Gene-level quantification was performed with Salmon (v1.10.3)112. Differentially expressed genes were calculated with the quasi-likelihood F-test in edgeR (v3.38.4) comparing Gfra1ΔMIC/ΔMIC lines with wild type clones or samples expressing microexon exclusion guides compared to microexon inclusion guides for dCasRx-RBM25 lines.

PPI network analyses

We used the STRING database (v12.0) to identify potential physical interactions among 145 mouse proteins that contain neuronal microexons exhibiting switch-like inclusion both in vivo and in vitro during neurogenesis (see Microexon-targeting library design). To construct the interaction network, we included all detected protein-protein interactions with at least medium confidence (score ≥ 0.4). We then clustered the network using the Markov Cluster Algorithm (MCL) implemented in Cytoscape, applying a granularity parameter of 2 to balance cluster specificity and connectivity. We performed functional enrichment analysis on each cluster using STRING.

Microexon quantification from PsychENCODE

We quantified inclusion levels of microexons from RNA-seq collected from human brain samples available through PsychENCODE69. Microexon quantification of BrainGVEX bulk RNA-seq samples data was performed using MicroExonator31 with the following parameters: filter_mode = “unbiased”, use_corrected_PSI = True, min_PSI = 0.1, and min_reads_PSI = 5. Novel microexon discovery was skipped (skip_discovery = True), and quantification was limited to microexons annotated in VastDB3. Microexon inclusion levels were aggregated by primary diagnosis, and statistical comparisons were performed using the two-sided Wilcoxon rank sum test.