Main

The splicing of pre-mRNA into mature mRNA in eukaryotic cells consists of excising introns and ligating exons through two transesterification reactions catalyzed by the spliceosome1,2. This large ribonucleoprotein complex is composed of five uridyl-rich small nuclear RNAs (snRNAs) that are essential for spliceosome assembly and function and differ according to the type of excised intron. The major spliceosome processes the majority (>99%) of introns containing GU–AG splice sites (U2 type) and is composed of snRNAs U1, U2, U4, U5 and U6 (ref. 3). Each snRNA has unique sequence motifs and secondary structures that allow it to interact precisely with its target sites. U1 and U2, respectively, bind to the 5′ splice sites (5′SS) and branch points, while U4, U5 and U6 form the tri-small nuclear ribonucleoprotein particle (snRNP) complex that is recruited to assemble a precatalytic spliceosome complex. U6, initially paired with U4 in an inactive conformation, activates upon dissociation to interact with U2 and form the catalytic site4. U5 aligns exons by binding the 5′ and 3′ splice sites, ensuring precise ligation5.

Spliceosomal snRNAs are encoded by distinct single-exon genes and ubiquitously transcribed by RNA polymerase II or III6. Human genomes contain multiple gene copies of snRNAs U1, U2, U4, U5 and U6, some of which are functional and others are pseudogenes7,8. After transcription, snRNAs undergo essential processing steps, including 5′-capping, 3′-end processing, nuclear export, Sm protein binding, nuclear re-import and nucleotide modifications (2′-O-methylation, pseudouridylation) guided by small Cajal body-specific RNAs9,10,11,12.

A recent landmark discovery has implicated de novo variants in RNU4-2, one of two functional genes encoding U4, as the cause of ReNU syndrome (OMIM 620851), an unexpectedly frequent neurodevelopmental disorder (NDD)13,14. This discovery was facilitated by the recurrence of a single base insertion (n.64_65insT) representing 78% of pathogenic variants, enriched in the Genomics England (GEL) NDD cohort15 but absent from gnomAD16 and highly depleted in UK Biobank17. Genome sequencing is necessary to detect these variants, as they are typically not yet captured by exome sequencing. No similar enrichment was found in 28 other brain-expressed snRNA genes, although 14 regions of 13 genes appear more evolutionary constrained13. This raises questions of whether variants in other snRNA genes may underlie other rare diseases and how to accurately classify variants in these genes.

In this study, we investigated 50 Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC)-approved snRNA genes in a French cohort of 23,649 patients with rare disorders and collected data for additional patients via international collaborations. Using these data, we implicate two further snRNAs in NDDs and more comprehensively define ReNU syndrome. We also identify ReNU syndrome-associated transcriptional and epigenetic signatures through RNA-sequencing (RNA-seq) and DNA methylation studies.

Results

RNU4-2 variants in cohorts of patients with rare diseases

We investigated de novo variants in RNU4-2 (NR_003137.2) and/or rare variants (<10 alleles in gnomAD v4.1.0) located in the 18-bp critical region defined in ref. 13 in the Plan France Médecine Génomique 2025 (PFMG2025) cohort comprising 23,649 patients with rare disorders (15,073 with NDD)18. This analysis revealed 75 patients with de novo RNU4-2 variants. Among the patients for whom parental analysis was not possible, four had variants previously reported as de novo in another unrelated individual, and one patient had a single-nucleotide deletion (n.76del) within the critical region.

In parallel, we collected data for 70 previously unreported patients with RNU4-2 variants identified through genome sequencing data reanalysis (30 patients) or targeted sequencing (40 patients, including one monozygotic twin pair). Variants occurred de novo in 55/56 cases for whom both parents were available. One patient had a variant (n.72_73del) inherited from an affected father, which had occurred de novo in another unrelated patient.

Altogether, 150 individuals (73 males and 77 females, including the twin) had 22 distinct RNU4-2 variants, of which 106 patients (71%) had the recurrent n.64_65insT insertion (Fig. 1 and Supplementary Table 1). Seven other variants were recurrent—n.76C>T (n = 10), n.66A>G (n = 5), n.67A>G (n = 5), n.65A>G (n = 3), n.77_78insT (n = 3), n.70T>C and n.72_73del (n = 2 patients each). Fourteen de novo variants were identified in a single patient. All but three variants clustered within the highly conserved 18-bp critical region spanning nucleotides 62–79 (chromosome 12 (hg38) (chr12(hg38)): 120,291,825–120,291,842)13 (Extended Data Fig. 1a). This region overlaps four distinct domains in the U4/U6 structure4: stem I (U4 n.62), T-loop (also known as quasi-pseudoknot; n.63–67), RBM42 interaction region (n.68–70) and stem III (n.72–79). We classified 18 variants (in 146 individuals; 145 probands) as pathogenic (P) or likely pathogenic (LP) and four as variants of uncertain significance (VUS) using American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) criteria19,20 (Methods; Supplementary Note). We observed no difference in CADD PHRED scores or nucleotide conservation (verPhyloP) between LP/P variants and RNU4-2 variants present in gnomAD v4.1.0. LP/P variants had a greater predicted effect on U4/U6 interaction compared to variants observed ≥10 times in gnomAD (Supplementary Fig. 1 and Supplementary Table 2), but due to the large overlap, these cannot be used to predict variant pathogenicity.

Fig. 1: Overview of RNU4-2 variants identified in this study.
Fig. 1: Overview of RNU4-2 variants identified in this study.
Full size image

a, Two-dimensional predicted structure of the interaction between U4 (red) and U6 (orange) snRNAs showing distinct domains. Arrowheads indicate variants identified in this study; P and LP in black, and VUS in gray. The numbers in black within the zoom-in box represent the count of patients with each variant for nucleotide changes that occur more than once. Red and orange numbers refer to the numbering of nucleotides from each snRNA. Red-shaded region, 18-bp critical region; gray-shaded regions, Sm sites. b, Organization of the U4–U6 duplex at the tri-snRNP stage (PDB ID: 6QW6) and close-up views of stem III, RBM42 binding and quasi-pseudoknot regions. Interactions stabilizing these structures, as well as LP/P variants potentially affecting their stability, are represented. Ψ, pseudouridine; m, 2′-O-methyl residues; m6, N6-methyladenosine; 2,2,7m3Gppp, 2,2,7-trimethylguanosine cap; mpppG, 5′ guanosine triphosphate cap with γ-monomethyl phosphate.

De novo analysis reveals variants in brain-expressed U5 genes

We next analyzed de novo variants in 49 additional HGNC-approved genes encoding snRNAs (Supplementary Table 3) in the PFMG cohort. This analysis revealed rare de novo variants in 17 genes in 36 unrelated patients (Supplementary Table 4). Six patients already had P/LP variants. All other patients remained unsolved after genome analysis (Supplementary Note). Fifteen variants (in 18 patients) were located in genes encoding U5 (six in RNU5A-1/NR_002756.2, seven in RNU5B-1/NR_002757.3, two in RNU5E-1/NR_002754.2 and three in RNU5F-1/NR_002753.5). As RNU5A-1 and RNU5B-1 are the main genes encoding U5 in the brain (Extended Data Fig. 2), we focused our analysis on these two genes.

In the 100,000 Genomes Project (GEL) and National Health Service (NHS) Genomic Medicine Service (GMS) cohorts available within GEL, six NDD probands had five de novo variants in RNU5B-1, compared to a single de novo variant in a non-NDD individual (6/12,724 versus 1/30,058; Fisher’s P = 0.0036; Supplementary Table 5). Furthermore, five patients with de novo RNU5B-1 variants were identified across additional cohorts, including the Broad Center for Mendelian Genomics (two patients), the BCH Epilepsy Genetics Program, the Australian Undiagnosed Diseases Network (UDN-Aus) and Care4Rare Canada (one patient each).

Altogether, 18 NDD probands had de novo variants in RNU5B-1. Seven of these variants (14 patients) clustered within a small region spanning chr15(hg38): 65,304,713–65,304,720, corresponding to the highly conserved U5 5′ loop I (Fig. 2 and Supplementary Table 6), which is depleted in variants in gnomAD v4.1.0 (Extended Data Fig. 1b) and UK Biobank13. Extending the analysis of this critical region to patients analyzed as singletons or duos, we identified three additional patients with variants in GEL/NHS, bringing the total to eight patients with NDD with variants in this region, compared to none in the non-NDD cohort (8/12,724 undiagnosed NDD versus 0/30,058 non-NDD in GEL/NHS-GMS; Fisher’s P = 6.1 × 10−5). In total, 17 NDD individuals had variants in the critical region of RNU5B-1. Three variants (n.39C>G, n.42_43insA and n.44AG) were recurrent, identified in six, three and four patients, respectively. These variants were absent from all databases and classified as LP.

Fig. 2: Overview of RNU5A-1, RNU5B-1, RNU5E-1 and RNU5F-1 variants identified in this study.
Fig. 2: Overview of RNU5A-1, RNU5B-1, RNU5E-1 and RNU5F-1 variants identified in this study.
Full size image

a, Two-dimensional predicted structure of U5 (light blue) snRNA showing distinct domains. Arrowheads indicate variants identified in this study—pink, RNU5A-1; dark blue, RNU5B-1; green, RNU5E-1 and yellow, RNU5F-1. P and LP variants are indicated with a filled color, while VUS are marked with a white dot inside the arrowheads. Numbers near the arrowheads represent the count of patients with each variant for nucleotide changes that occur more than once. Nucleotide differences between RNU5A-1, RNU5B-1, RNU5E-1 and RNU5F-1 are shown using International Union of Pure and Applied Chemistry (IUPAC) codes, except for the highly variable 3′ stem loop II, for which separate loops are displayed. Light blue numbers refer to the numbering of nucleotides from each snRNA. The N at position 79 corresponds to a gap in RNU5F-1. Blue-shaded region, critical region. Gray-shaded region, Sm site. b, The 5′ exon recognition by the U5 stem loop I at the B-complex stage (PDB ID: 8Q7N). Interactions stabilizing these structures, as well as LP/P variants potentially affecting their stability, are represented.

Pathogenic RNU variants mainly occur on the maternal allele

We investigated the parental origin of RNU4-2 variants in available genome data by phasing de novo variants and informative SNPs in the flanking regions (Methods; Supplementary Fig. 2). The parental origin of mutations was reliably determined in 50 trios and one mother–patient duo. Variants were assigned to the maternal allele in 47 cases and to the paternal allele in four instances. Notably, all n.64_65insT insertions (n = 38) were phased to the maternal allele, consistent with previous observations13. Of the four variants assigned to the paternal allele, two (n.62T>C and n.68A>C) were classified LP/P and two (n.76del and n.92C>G) were VUS.

Among the phaseable RNU5B-1 variants located in the U5 5′ loop I, five (n.39C>G, n.42_43insA and three n.44A>G) were phased to the maternal allele, while two (n.39C>G and n.37G>C) were on the paternal allele. The two RNU5B-1 de novo variants located outside of the conserved 5′ loop I (n.24G>C and n.74T>C) were also phased to the paternal allele. Both n.40_41insA variants in RNU5A-1 occurred de novo on the maternal allele.

RNU4-2 variants in the T-loop and stem III differ in severity

Clinical data were available for 143 patients with P/LP RNU4-2 variants (69 males and 74 females, excluding the monozygotic twin with an identical phenotype to her sister; Supplementary Tables 7 and 8). The median age at study entry was 9 years (range = 4 months to 45 years). All patients had NDD with variable degrees of intellectual disability (ID), ranging from mild (7.1%), moderate (27.7%) to severe/profound (65.2%).

We investigated genotype–phenotype correlations in RNU4-2-related disorders. Unsupervised clustering of clinical features revealed two separate clusters differing in severity (Extended Data Fig. 3a). Most RNU4-2 variants in stem III (63%, 12/19) were in the mild phenotype cluster, whereas most variants in the T-loop and RBM42 interacting region were in the high severity cluster (98%, 121/123). Principal component analysis (PCA) confirmed this result, with variants in stem III and in the T-loop separating on the first principal component (PC) axis, accounting for 12.3% of the variance (Fig. 3a,b and Extended Data Fig. 3b–d). These results suggest that phenotypic variability largely depends on the location of RNU4-2 variants within U4 functional domains.

Fig. 3: RNU4-2 variants in the T-loop and stem III associate with different phenotype severity.
Fig. 3: RNU4-2 variants in the T-loop and stem III associate with different phenotype severity.
Full size image

a, PCA of 44 phenotypic features in 143 patients showing the separation of variants with respect to their location within distinct U4:U6 domains. Labels with the nucleotide change appear for variants other than n.64_65insT. RNU4-2 variants are colored according to their location within the distinct U4:U6 domains; stem I (n = 1) in light blue, quasi-pseudoknot (n = 119) in orange, RBM42 interaction region (n = 4) in blue and stem III (n = 19) in green. Triangles, P (n = 128) variants; squares, LP (n = 15) variants. b, Contributions of the clinical features to the PCA. c, Comparative analysis of 14 phenotypes related to RNU4-2 n.64_65insT and n.76C>T variants. The P values were calculated using Fisher’s exact tests (two-sided; 2 × 2, 2 × 3 or 2 × 4 contingency tables) to compare 41 phenotypes between patients with n.64_65insT variants and those in the other three variant groups. Multiple comparisons were adjusted for using Bonferroni correction. The percentage of patients with the feature, followed by the numerator (number of affected patients) and denominator (total assessed), is shown directly in the bars. Full details of all tests and patient numbers can be found in Supplementary Table 8.

Among the 103 patients with RNU4-2 c.64_65insT and clinical data available (Table 1 and Extended Data Fig. 4), prenatal findings were observed in 55 of 92 cases (60%) and predominantly consisted of intrauterine growth restriction (IUGR; 30%) and/or cerebral abnormalities (33%; ventriculomegaly, 19.5%); 38% of fetuses showed isolated abnormalities, while 62% had two or more signs. Neonatal findings (91%), mainly hypotonia (71%) and feeding difficulties (57%), were frequent. Congenital microcephaly was present in 28% of individuals, while microcephaly at the time of last examination was present in 74% of individuals (Supplementary Fig. 3). In total, 60% of individuals had short stature. All 85 patients older than 3 years exhibited developmental delay. Most could walk, with a median walking age of 30 months (range = 13 months to 12 years), but 13% did not reach this milestone. Most patients were nonverbal (61%) or could only speak a few words (34%). The majority had severe/profound ID (78%), with 21% having moderate ID and one patient having mild ID. Behavioral disturbances were common, with autistic features and/or midline stereotypies reminiscent of Rett syndrome in 84% of patients. Epilepsy affected 56%, with an additional 8% experiencing a single seizure. Seizure onset ranged from the neonatal period to 13 years (median = 32 months), and seizures were usually generalized, rare, fever-sensitive and responsive to antiepileptic medications. However, 5 patients were diagnosed with developmental and epileptic encephalopathy, 14 experienced status epilepticus and 7 had drug-resistant epilepsy. Brain magnetic resonance imaging (MRI) abnormalities were prevalent (91%), with the most common findings being enlarged ventricles (84%) and corpus callosum abnormalities (85%). Less common findings included heterotopia (n = 7), delayed myelination or hypomyelination (n = 11) and abnormal gyration (n = 5). In total, 38% of cases had skeletal abnormalities, including osteopenia or fractures (n = 20) and hip dysplasia (n = 10). Dysmorphic features suggested Pitt–Hopkins syndrome (Fig. 4). Strabismus and drooling were common. Feeding difficulties affected 69%, failure to thrive 55% and constipation 57%. Acrocyanosis or vasomotor disorders (Extended Data Fig. 5) were present in 16 patients, blood count anomalies in 13 patients and hypothyroidism in 8 patients.

Table 1 Clinical features of individuals with RNU4-2 variants according to the location of the variants in the different U4 functional domains
Fig. 4: Facial photographs from 22 patients with the recurrent RNU4-2 c.64_65insT variant.
Fig. 4: Facial photographs from 22 patients with the recurrent RNU4-2 c.64_65insT variant.
Full size image

av, The main facial features include a large mouth, a short philtrum, downturned corners of the mouth, thick lips, deep-set eyes, sparse eyebrows and strabismus. Older individuals also showed facial asymmetry. av correspond to unrelated patients except p and q who are monozygotic twins. Consent forms have been obtained for the publication of the facial photographs.

The phenotype of the patients with other variants in the T-loop and RBM42 interaction region was indistinguishable from that of patients with n.64_65insT. Individuals with the recurrent n.66A>G (n = 5) and n.67A>G (n = 5) variants had a similar phenotype, characterized by neonatal hypotonia (5/5 and 3/5), microcephaly (5/5 for both), epilepsy in about half (3/5 and 2/4) and similar dysmorphic features. All patients had severe developmental delay and severe ID, except for one case with moderate ID. Notably, all patients were nonverbal.

Patients with RNU4-2 n.76C>T variant (n = 10) exhibited a distinct clinical profile from patients with n.64_65insT (Table 1 and Fig. 3c). They had less neonatal findings (Fisher’s P = 2.41 × 10−4), especially hypotonia (P = 1.67 × 10−3), presented less severe ID (P = 8.57 × 10−5) and developmental delay (P = 2.05 × 10−5), were more proficient in their language abilities (P = 1.32 × 10−8) and rarely showed brain MRI abnormalities (P = 2.47 × 10−3). All patients could walk, with four of them achieving this milestone at a normal age (median walking age = 19 months (12–33 months)), and all could speak, with simple sentences (n = 6) or normal language skills (n = 4). Microcephaly was noted in three of ten patients, and short stature was noted in only one of ten patients. Two of five patients had autistic features. Six patients had fever-sensitive generalized epilepsy, well-controlled with antiseizure medication, while four others had a single febrile seizure. None had nystagmus, and only one had ataxia. Brain MRI was normal in six of eight cases. Dysmorphic features were distinct from those seen in patients with the recurrent variant (Fig. 5a).

Fig. 5: Facial photographs from 13 patients with other variants in RNU4-2 and two patients with variants in RNU5B-1.
Fig. 5: Facial photographs from 13 patients with other variants in RNU4-2 and two patients with variants in RNU5B-1.
Full size image

a, Individuals with other variants in RNU4-2. (i), n.65A>G; (ii)–(iv), n.66A>G; (v) and (vi), n.67A>G; (vii), n.68A>C; (viii)–(x), n.76C>T; (xi), n.77_78insG; (xii) and (xiii), n.77_78insT. b, Individuals with the RNU5B-1 n.39C>G variant (i and ii). Consent forms have been obtained for the publication of the facial photographs.

Similarly, patients with other variants in the stem III (n = 9) exhibited a mild/moderate phenotype compared to patients with the n.64_65insT variant, showing less severe developmental delay (P = 1.90 × 10−2) and ID (P = 7.79 × 10−5) and with improved language abilities (P = 3.20 × 10−6). All patients could walk and speak, with varying degrees of language development (normal language, two; simple sentences, six; few words, one). ID was mild in three and moderate in six, with autistic features in two of five cases. Fever-sensitive epilepsy was common (7/9) but well-controlled with antiseizure medication. Brain MRI was normal in five of seven patients.

RNU5 variants lead to NDD with variable malformations

Detailed clinical data were available for nine of 15 patients with NDD with RNU5B-1 LP variants (Fig. 5b, Supplementary Tables 9 and 10 and Supplementary Fig. 4). Six had severe developmental delay, one had moderate developmental delay and one had normal cognition but attention difficulties. All nine patients showed brain MRI abnormalities, but only one had epilepsy. Three had pectus excavatum, two of whom also had marfanoid habitus. Three had ocular abnormalities, such as congenital glaucoma (n = 1), small papillae with retinal vascular tortuosity (n = 1) and severe myopia (<−12.25 D). Other malformations included pulmonary issues (n = 2), sacrococcygeal abnormalities (n = 2), tooth agenesis or fusion (n = 2) and cardiac malformation (n = 2). Acquired microcephaly was noted in three individuals with n.44A>G, whereas two individuals with n.39C>G had macrocephaly. Human phenotype ontology terms enriched in RNU5B-1 cases from GEL include seizures, macrocephaly and eye anomalies (Supplementary Table 11).

The three patients with RNU5A-1 variants for whom clinical data were available also had NDD with variable congenital malformations. One had postaxial polydactyly, dental agenesis and talus feet due to oligohydramnios. Another had anal malposition, sacrococcygeal dimple and caudal appendix, thin and incomplete corpus callosum and septal agenesis. The third had cardiac malformations and marfanoid habitus. The two patients with n.40_41insA had seizures. Head circumference (HC) was normal in all.

Pathogenic variants lead to specific splicing defects

We previously reported specific alternative 5′ splice site (5′SS) abnormalities in the blood of individuals with RNU4-2 variants13. To confirm and extend this observation, we conducted RNA-seq on lymphocyte cultures from 19 individuals with RNU4-2 variants and 21 controls with other NDDs (Supplementary Fig. 5a). Using rMATS-turbo21, we identified significant aberrant splicing events (Supplementary Tables 1216). We extracted percent spliced in (PSI) values of significantly altered exons for each splicing category and performed PCA using matrices with samples as columns and PSI values as rows. PCA revealed that the most pronounced effect was for the signal originating from 111 altered 5′SS, with distinct clustering patterns of affected individuals (Fig. 6a and Extended Data Fig. 6a–d). Severe phenotypes (associated with variants n.64_65insT, n.67A>G, n.68A>C and n.70T >C) formed a distinct cluster, while mild phenotypes (n.72_73del, n.75C>G and n.76C>T) appeared intermediate between severe cases and controls. This suggests a common 5′SS usage signature associated with RNU4-2 pathogenic variants, with distinct profiles correlating with disease severity.

Fig. 6: RNA-seq identifies an alternative 5′SS signature that differentiates severe from mild RNU4-2-related phenotypes.
Fig. 6: RNA-seq identifies an alternative 5′SS signature that differentiates severe from mild RNU4-2-related phenotypes.
Full size image

a, PCA based on PSI values from 111 significant 5′SS events detected using rMATS, comparing 19 patients with RNU4-2 variants (6 mildly affected in teal and 13 severely affected in red) to 21 controls (purple). Triangles, n.64_65insT; circles, other variants. Yellow symbols correspond to three test samples—one variant of uncertain significance (VUS; n.45_46insT), one VUS that could be reclassified as LP and the recurrent variant n.64_65insT from a patient with a milder phenotype. b, Box plot showing raw spliceAI scores of the decreased 5′SS site and the increased 5′SS for the 50 events shared between mild (n = 6) and severe individuals (n = 13) and the 19 events only detected in severe individuals. SpliceAI scores for severe and shared 5′SS were not statistically different for decreased sites (P = 0.476) but were significant for increased sites (P = 0.014) using the two-sided Mann–Whitney U test. Box plot elements are defined as follows: centerline, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range; points, outliers. c,d, Sashimi plots showing isoform shifts in MAP4K4 (c) and AKNA (d). Aggregated coverage and splicing-supporting reads from patients with RNU4-2 variants with mild (n.75C>G, n.76C>T and n.72_73del; n = 6) or severe (n.64_65insT, n.67A>G, n.68A>C and n.70T>C; n = 13) phenotypes and controls (n = 21) are shown. The MAP4K4 event is present in both patient groups, while the AKNA abnormality appears only in severe cases. e, Consensus nucleotide sequence of decreased and increased 5′SS for 50 shared events (left) and for the 19 severe-only events (right), in comparison to the consensus sequence of all 5′SS from MANE transcripts (top).

We then applied the 5′SS signature to three individuals with a VUS or atypical clinical presentation. The mildly affected proband with n.64_65insT clustered with more severely affected carriers of the same variant (Fig. 6a). The n.62C>T variant was reclassified as LP due to clustering with mild phenotypes, whereas n.45_46insT showed an intermediate profile between patients and controls and remained a VUS.

We characterized and visually inspected 69 5′SS events using Integrative Genomics Viewer (IGV), including 50 shared by patients with mild and severe phenotypes and 19 unique to severe phenotypes (Fig. 6b–d, Supplementary Fig. 5b and Supplementary Table 17). Decreased 5′SS consistently shows high spliceAI scores (shared sites, median = 0.93; severe-only sites, median = 0.92), indicating that alternative 5′SS usage is not restricted to weak sites (Fig. 6b). In contrast, increased 5′SS events associated with severe phenotypes have significantly lower SpliceAI scores (median = 0.52; Mann–Whitney U test, P = 0.014) compared to shared sites (median = 0.80). Additionally, only five 5′SS events were absent from controls (mean supporting reads in controls < 3; 4/69, 5.8%), suggesting that the main effect of U4 variants is a shift in existing alternative isoforms rather than the use of new cryptic splice sites.

Analysis of the 5′SS usage patterns revealed a consistent trend—A/A/G nucleotides at positions +3/+4/+5 were frequently replaced by C/T (Fig. 6e). This change was accompanied by an increased reliance on A/G nucleotides at positions −2/−1, particularly evident in 5′SS events that were reduced in severe phenotypes. Indeed, 2 of 19 severe variants had AG at these positions, compared to 26 of 50 shared variants (two-tailed Fisher’s test, P = 0.0008). These findings indicate that 5′SSs used exclusively in patients with severe phenotypes tend to depend more on the exonic sequence (end of the exon) and less on the intronic sequence.

Of the 69 5′SS events, 21 (30%) were out-of-frame, indicating that these transcripts were not degraded by nonsense-mediated mRNA decay (NMD). To test whether NMD masked additional out-of-frame 5′SS events, we repeated the splicing signature analysis on puromycin-treated samples. This revealed only nine significant 5′SS defects not present in controls, including one in KDM6A (Supplementary Table 18). These results suggest that reduced mRNA expression is not a key factor in RNU4-2 variant pathogenicity.

To explore the underlying pathophysiology, we examined 5′SS events affecting known NDD-associated genes. We identified 14 such genes: DPM1, KIF2A, P4HTM, HNRNPH1, KMT2A, KMT2C, KMT2D, POMT1, MADD, TRIO, EIF4A2, SYNCRIP, THOC2 and PPP1CB. Notably, three genes belonging to the KMT2 family of H3K4 methyltransferases, KMT2A (Wiedemann–Steiner syndrome), KMT2C (Kleefstra syndrome) and KMT2D (Kabuki syndrome), are related to syndromic NDDs.

Finally, we analyzed samples from six patients with either RNU5B-1 (two n.39C>G and two n.44A>G) or RNU5A-1 (n.40_41insA and n.39del) variants for splicing effects. These variants did not share the RNU4-2 5′SS signature (Extended Data Fig. 6e), and unlike RNU4-2 variants, RNU5 variants lacked a shared 5′SS or 3′SS signature. An individual analysis against 21 controls, supplemented with 20 additional controls and 19 RNU4-2 patients, revealed possible variant-specific effects—RNU5B-1 n.39C>G mainly affected 5′SS, while RNU5B-1 n.44A>G primarily impacted 3′SS (Extended Data Fig. 7 and Supplementary Tables 19 and 20). RNU5A-1 n.39del may also alter 3′SS, while RNU5A-1 n.40_41insA seems to affect both 5′SS and 3′SS (Extended Data Fig. 8 and Supplementary Tables 21 and 22). These results, although obtained from a limited number of patients, may underline the clinical variability observed for patients with RNU5 variants.

Identification of a specific RNU4-2 episignature

KMT2A and KMT2D variants are associated with specific DNA methylation profiles (episignatures). To determine whether an episignature could also be identified for ReNU syndrome, we compared genome-wide methylation profiles of 35 patients with P/LP RNU4-2 variants with those of 45 healthy age-matched controls. Adjusting for age, sex and blood cell composition, we identified 147 differentially methylated positions (P < 10−7 and |Δβ| >5%; Supplementary Fig. 6). PCA and heatmap representations clearly separated patients from controls (Fig. 7). The strength of the episignature correlated with disease severity and variant localization. Variants associated with mild phenotypes showed similar levels of hypermethylation as moderate-to-severe phenotypes but exhibited intermediate hypomethylated signals in the lower heatmap cluster (Supplementary Table 23). The first PC axis, separating patients from controls, captured 52% of the methylation dispersion, comparable to ATRX, KMT2D or KMT2A episignatures22. After fivefold cross-validation, the overall sensitivity was 0.91 (31/35, 95% binomial confidence interval (CI; 0.77–0.98)), reaching 100% (24/24, 95% binomial CI (0.86–1.00)) for n.64_65insT carriers, with a specificity of 0.98 (44/45, 95% binomial CI (0.88–0.999); Fig. 7b). Two patients with RNU5A-1 n.40_41insA did not share the RNU4-2 signature but clustered together at the boundary of the control group, suggesting these variants may have their own distinct episignature (Extended Data Fig. 9). The RNU4-2 episignature is entirely distinct from the KMT2A and KMT2D episignatures (Extended Data Fig. 10).

Fig. 7: Identification of an episignature that discriminates patients with ReNU syndrome from controls and correlates with phenotypic severity.
Fig. 7: Identification of an episignature that discriminates patients with ReNU syndrome from controls and correlates with phenotypic severity.
Full size image

a, PCA of adjusted methylation levels at differentially methylated positions (n = 147), after correcting for expected methylation based on age, sex and estimated blood cell counts (n = 80 individuals in total). Different variants are represented by different shapes, while phenotype severity is indicated by color—purple for controls (n = 45), teal for mild phenotypes (n = 5) and red for moderate-to-severe phenotypes (n = 30). The percentage of variance explained is provided for each axis. b, Pathogenicity scores for each variant were obtained by fivefold cross-validation using a support vector machine predictor. Purple, controls; teal, RNU4-2 variants, mild phenotypes; red, RNU4-2 variants, moderate-to-severe phenotypes; yellow, RNU5A-1 variants. c, Heatmap of adjusted methylation levels displays hierarchical clustering of controls and patients with RNU4-2 LP/P variants. Blue indicates hypomethylated positions (n = 89), while red indicates hypermethylated positions (n = 58) with respect to expected methylation levels at equivalent age, sex and blood cell composition. Variants are colored according to the location within the distinct U4:U6 domains (stem I, light blue; quasi-pseudoknot, orange; RBM42 interaction region, blue; stem III, green). Phenotype severity is indicated by color (purple for controls, teal for mild phenotypes and red for moderate-to-severe phenotypes).

Discussion

Despite extensive genetic testing, 40–60% of NDD cases with suspected genetic origins remain unsolved. The recent discovery of RNU4-2 variants as a major cause of NDDs, overlooked until 2024, underscores the role of noncoding genes in undiagnosed cases. Here we analyzed 50 HGNC-approved snRNA genes in a large French cohort of patients who underwent genome sequencing as part of routine diagnosis. This led to identifying 76 (likely) pathogenic variants in RNU4-2 (0.5% of NDD participants) and eight (0.05%) with variants in RNU5A-1 or RNU5B-1. Combining these data with 80 additional patients from other cohorts, we observed that pathogenic variants typically cluster in evolutionarily conserved regions of U4 and U5 critical for splicing. RNU4-2 variants cluster in the T-loop/quasi-pseudoknot and stem III, while RNU5B-1 and RNU5A-1 variants cluster in the conserved 5′ loop I, which pairs with the exon adjacent to the 5′SS5. De novo variants in other domains or snRNA genes were also identified, but their clinical relevance remains unclear so far.

This study provides a comprehensive overview of RNU4-2-related phenotypes, revealing distinct clinical outcomes based on variant location. Variants in the T-loop, including n.64_65insT, are associated with severe phenotypes, while variants in stem I/stem III, including n.62T>C, n.76C>T and c.72_73del, lead to milder forms. This supports a continuum of RNU4-2-related phenotypes with inherited variants also possibly contributing to NDD etiology. At the severe end, prenatal manifestations, mainly cerebral abnormalities (corpus callosum anomalies and enlarged ventricles) and/or IUGR, were observed in 60% of cases, highlighting the importance of genome sequencing or targeted RNU4-2 analysis in prenatal genetic testing.

A striking observation in line with previous findings13 is the predominant maternal origin of RNU4-2 variants, possibly explained by the negative selection of variants severely affecting splicing in the male germline. However, paternal transmission of less severe variants is possible, as evidenced by four cases. The mechanism underlying the high recurrence of RNU4-2 and its potential link to maternal origin remain unclear. Interestingly, recurrent insertions in RNU4-2 n.64_65insT and RNU5A-1 n.40_41insA occur at 2′-O-methylation sites11,12, although any connection to maternal inheritance or recurrence is yet to be established.

We provide definitive evidence that RNU4-2 pathogenic variants lead to specific alternative 5′SS anomalies in the blood cells of affected patients, with detected events correlating with phenotype severity. Variants in the T-loop and stem I/stem III indeed show distinct, partially overlapping transcriptional signatures, which could aid in the interpretation of VUS. Furthermore, DNA methylation exhibited a similar pattern, revealing a shared global episignature, albeit with more pronounced and distinct alterations associated with severe NDD phenotypes linked to T-loop variants. Given the widespread use of exome sequencing in routine diagnostics, these transcriptional and epigenetic signatures could help diagnose additional ReNU syndrome cases worldwide. This analysis also revealed that pathogenic variants in snRNAs lead to widespread but mild splicing abnormalities, mainly characterized by a shift in existing isoforms. Although this analysis was performed in lymphocytes rather than neuronal cells, the data suggest that ubiquitously expressed genes, such as KMT2A, KMT2C, KMT2D and KDM6A, which encode lysine methyltransferases and a lysine demethylase involved in chromatin remodeling, may also be altered in the brain of affected individuals. Interestingly, a recent study demonstrated that USP39 deficiency disrupts the assembly of the U4/U6.U5 tri-snRNP, resulting in 5′SS abnormalities similar to those observed in patients with RNU4-2 pathogenic variants. This disruption leads to the accumulation of misfolded proteins in proteotoxic aggregates, triggering endoplasmic reticulum stress and subsequent cell death23.

Although U4 variants could disrupt spliceosome function at various stages (U4 snRNP biogenesis, U4/U6 di-snRNP, U4/U6.U5 tri-snRNP assembly and spliceosome activation), our results strongly suggest disruption of the U4/U6 duplex organization at the tri-snRNP stage, affecting the 5′SS introduction into the spliceosome’s active site. The 5′SS, initially paired with U1 in the prespliceosome, is transferred to the U6 ACAGAGA box and U5 stem loop 1 in the U4/U6.U5 tri-snRNP. These interactions maintain the 5′SS in the active site during catalysis, marking the start of each intron1,24. When the 5′SS is transferred, it pairs with the U6 ACAGAGA box to ensure its correct identification25, which triggers molecular events that lead to the formation of the active site by Brr2. Before this transfer, the U6 ACAGAGA box is held as a flexible loop between the stem III and quasi-pseudoknot, and this organization is further stabilized by Snu66, SNRP27K and RBM42 (refs. 4,26,27,28). Pathogenic RNU4-2 variants in—or close to—the quasi-pseudoknot possibly weaken or disrupt its structure and compromise its ability to maintain the ACAGAGA box at the right position for 5′SS recognition. Pathogenic variants in the stem III possibly alter Watson–Crick base pairs formed by U4 A78, C76 and C75 with U6 G34, G33 and U31 and affect U4/U6 stem III’s stability. The stem III likely enhances 5′SS recognition fidelity by creating an energy barrier to extending the U6/5′SS helix after initial pairing with the ACAGAGA sequence4,5. Stem III disruption, necessary for Brr2 loading and active site formation, would weaken this barrier, allowing suboptimal 5′SS to extend the helix more easily and activate the spliceosome. In severe cases, increased 5′SS usage suggests a loss of specificity for intronic 5′SS motifs, potentially compensated by greater reliance on exonic 5′SS motifs via U5.

U5 loop I is crucial for 5′SS transfer. By interacting with the exonic sequences adjacent to 5′SS, it helps align the 5′ exon with the branch site and the 3′ exon during both steps of splicing catalysis29,30. Mutations in yeast U5 loop I result in aberrant 5′SS splicing31. However, our findings suggest variant-specific impacts, with RNU5B-1 n.39C>G affecting 5′SS, RNU5B-1 n.44A>G affecting the 3′SS and RNU5A-1 n.40_41insA possibly altering both. These results align with the clinical variability observed in patients with RNU5B-1 variants, including opposing head growth phenotypes associated with n.39C>G and n.44A>G. However, further studies on larger patient series are needed to confirm these observations and elucidate the underlying mechanisms. Overall, our findings underline the critical role of U4 and U5 RNA structures in maintaining splicing fidelity by preventing weak SS activation, with their destabilization reducing splicing accuracy.

Finally, this study focused on 50 HGNC-approved snRNA genes, while the hg38 reference genome includes 1,901 snRNA genes, most annotated as pseudogenes. A recent study suggests that RNU2-2P, annotated as a pseudogene, may be functional and linked to a new NDD32. While confirming the functionality of snRNA pseudogenes requires experimental validation8, this discovery suggests that more snRNAs could contribute to genetic disorders.

In conclusion, this work emphasizes the critical role of de novo variants in snRNAs, particularly RNU4-2, in unsolved NDD. Moreover, we identify RNU5B-1 and RNU5A-1 as new NDD genes and provide valuable insights into fundamental aspects of spliceosome function.

Methods

Inclusion and ethics statement

This study complies with the ethical standards of each of the participating countries. Informed consent was obtained for all patients included in this study from their parents or legal guardians. A specific consent form was obtained from the families who consented to the publication of photographs. Patients/participants/samples were pseudonymized for the genetic study at each participating center. We collected information on the sex (but not gender) of the patients from the patients’ clinical file. Grenoble-Alpes University Hospital (CHU Grenoble-Alpes, research 19814188) is the promoter of this research for the hospitals associated with the Auragen laboratory. Assistance Publique-Hôpitaux de Paris (AP-HP) is the promoter of this research for the hospitals associated with the SeqOIA laboratory (project ID: APHP241333). The study has received approval from the Ethics Committee of University Hospital Essen (reference 24-12010-BO) and approval from the Comité Éthique et Scientifique pour les Recherches, les Études et les Évaluations dans le domaine de la Santé (CESREES; reference 21082803 Bis/2038764). AP-HP has obtained authorization from the Commission Nationale de l’Informatique et des Libertés (reference HGTHGT/MFIMFI/AR2426865; request 924924336666) for the data processing activities related to this project. Part of the study has been approved by the CHU de Nantes Ethics Committee (number CCTIRS 14.556). Part of this research was ethically approved by CPP Ouest V (File 06/15) on 4 August 2015 (Ref MESR DC 2017 2987). For methylation analysis, DNA from all individuals (patients and controls) had been collected previously in the context of genetic analysis in a medical setting, following signature of a written, informed consent that includes a query on the use of leftovers in a research setting. Healthy controls consisted of individuals without NDD who underwent presymptomatic testing for other conditions and were found to be noncarriers or unaffected relatives of patients with a genetic disease among noncarriers of pathogenic variants. Samples used for the methylation study were stored within the genetics biological collection of the CRBi, Rouen, France, declared as DC 2008-711 (access authorization MCRBi/2024/02). The analysis of methylation profiles based on previously stored DNA in these conditions was approved by the CERDE ethics committee (notification E2023-13) from the Rouen University Hospital. Researchers and clinicians from participating centers contributing either data or intellectual input were involved at all stages of the study, from design and implementation to drafting and revising the manuscript, and are coauthors of the article.

List of snRNA genes and variant nomenclature

A list of 50 official gene symbols encoding functional snRNAs (Supplementary Table 3) was established from the HUGO gene nomenclature committee (https://www.genenames.org/). Information was retrieved from HGNC in December 2023 by applying the advanced filtering ‘gd_locus_type = “RNA, small nuclear”’ and restricting to genes with approved symbols. The coordinates (start and end positions) of genes and transcripts were in parallel retrieved from the National Institutes of Health Reference Sequence (curated subset downloaded from the University of California, Santa Cruz (UCSC) genome browser) and Ensembl. Of note, the start positions of transcripts from both entities differ for certain snRNA genes (for example, RNU5F-1), implying that variant nomenclature may vary depending on the transcript used to report them.

Patient cohorts

We initially identified the n.64_65insT variant in a single patient with developmental epileptic encephalopathy. This variant was prioritized because it was the strict de novo variant with the highest CADD score and was submitted to GeneMatcher33. Following the publication of the preprint by ref. 34 on 8 April 2024, we investigated variants in RNU4-2 and 49 other snRNA genes in several diagnostic and research cohorts. Our inclusion criteria were as follows: (1) a de novo variant in any of the 50 snRNA-encoding genes with less than ten heterozygotes in gnomAD v4.1.0, or (2) a heterozygous variant in RNU4-2 located within the critical 18-nucleotide region as defined in ref. 13. We then narrowed our search to RNU5A-1 and RNU5B-1 and investigated (3) de novo variants with less than ten heterozygotes in gnomAD v4.1.0 and/or (4) heterozygous RNU5B-1 variants located in the U5 5′ loop I.

The main cohort is composed of 23,649 patients with rare disorders, including 15,073 patients with NDD and their parents, when available, who underwent genome sequencing as part of the diagnostic process in France (PFMG2025)18 on one of the two national clinical sequencing laboratories, SeqOIA (https://laboratoire-seqoia.fr/) and Auragen (https://www.auragen.fr/) between 2019 and 2024. All de novo variants were visualized on IGV. The analysis of RNU4-2 variants in this main cohort identified 80 patients. Furthermore, we collected data of 70 additional patients with de novo and/or pathogenic RNU4-2 variants identified in either diagnostic or research contexts through national networks, established collaborations or GeneMatcher33. These additional cohorts included 42 patients from France, 20 from Germany, 5 from Canada, 1 from the Netherlands, 1 from Spain and 1 from the US. Thirty patients had genome sequencing, whereas in 40 patients, the variant was identified or confirmed by a targeted method—Sanger sequencing (n = 35) or next-generation sequencing of amplicons (n = 5). Among the patients diagnosed by Sanger sequencing, two had previously inconclusive exome analyses and were included in SOLVE-RD. Reads supporting the presence of n.64_65insT were identified in the exome data. None of the patients included in this study had been previously published, and we also checked that there were no duplicates for individuals with the same variant based on the individual’s year of birth and initials.

The analysis of de novo variants in the other 49 snRNA genes in the PFMG cohort identified 36 patients with de novo variants in 17 genes (Supplementary Table 4 and Supplementary Note). A targeted search for variants in RNU5B-1 and RNU5A-1 in the GEL dataset (including both the 100,000 Genomes cohort (v18) and NHS-GMS (v3) cohort) identified five additional individuals with rare (<10 occurrences in gnomAD) de novo variants, five of 8,841 undiagnosed NDD probands and one of 21,816 non-NDD probands. In addition, three probands analyzed in duo had a rare variant located in the U5 5′ loop I absent from the single parent analyzed. In addition, five de novo variants in RNU5B-1 were collected from the Broad Centre for Mendelian Genomics, UDN-Aus, the BCH Epilepsy Genetics Program and Care4Rare Canada.

Variants were reviewed using Alamut Visual Plus v1.11 (Sophia Genetics) and MobiDetails35 (https://mobidetails.iurc.montp.inserm.fr/MD/).

Sanger sequencing

Sanger targeted sequencing was performed to screen for variants in RNU4-2 and/or to perform segregation analysis. PCR amplification of RNU4-2 was performed using the HotStarTaq Master Mix Kit (Qiagen, 203445) with the following primers: forward, 5′-AAATACGGCTGGTGGAGTGG-3′; reverse, 5′-TCACAGTACCCGCACAGAAC-3′, according to the manufacturer’s instructions. Forward and reverse sequencing reactions were performed using the BrilliantDye Terminator v1.1 Cycle Sequencing Kit (Nimagen, BRD1-1000) or the BigDye Terminator v3.1 Sequencing Kit (Life Technologies, 4337457). ExoSAP-Purified sequencing products (ExoSAP-IT; Applied Biosystems, 78205) were run on Pop-7 polymer (Life Technologies, 4335615) using an ABI 3730 or 3730XL automated sequencer (Applied Biosystems). Sequences were analyzed using Geneious Prime 2019 (Biomatters) or Seqscape v2.6 software (Applied Biosystems).

Variant classification

We classified variants according to the ACMG/AMP criteria19 using recommendations from ref. 20. The PS2 (or PM6 for patients who underwent targeted sequencing) criteria were applied for cases with de novo inheritance. ‘PM2 supporting’ was applied for variants absent or very rare in gnomAD v4.1.0; PM1 for variants located in mutational hotspots—chr12(hg38): 120,291,825–120,291,842 for RNU4-2 and chr15(hg38): 65,304,713–64,304,720 for RNU5B-1. We applied ‘PS4 supporting’ for variants identified in at least three patients, and ‘PS4 moderate’ for those found in at least six patients, either in this study or in ref. 13. PS4 was only applied for n.64_65insT. Finally, PS3 was applied when RNA-seq and/or methylation analyses supported pathogenicity.

Clinical data analyses

Clinical data were retrospectively collected from the referring physician using an anonymized Excel sheet. For patients aged 0–3 years, sitting and walking items were noted as ‘too young’ unless the clinician specifically noted their achievement. ID was noted as ‘too young’ unless the clinician assessed it as severe. Autism spectrum disorders were also noted as ‘too young’ unless the clinician could confirm or rule out the diagnosis. Categorical data for 44 selected clinical features from 143 patients with P and LP RNU4-2 variants and 12 patients with RNU5A-1 or RNU5B-1 variants were converted to a 0–1 scale, with 0 representing a more favorable phenotype presentation and 1 representing a more severe phenotype. Hierarchical clustering was performed using the pheatmap R package, performing z score scaling for each row (across different patients), and ward.D2 clustering method keeping missing values. PCA was generated after replacing missing data with 0 and performing variable scaling. Microcephaly was defined as HC measurements less than the third percentile. We used charts established in ref. 36 to calculate the HC percentile at birth and define congenital microcephaly. Corresponding plots were generated with the ‘Plotter: Preterm growth charts, 22–50 weeks’ from the Canadian Pediatric Endocrine Group (https://cpeg-gcep.shinyapps.io/prem2013/). For additional HC measurements, reference chart data points were obtained from ref. 37. Male patients older than 21 years were plotted at age 21, and female patients older than 20 years were plotted at age 20, corresponding to the maxima for each sex. Fisher’s tests (two-sided; 2 × 2, 2 × 3 or 2 × 4 contingency tables) adjusted for multiple comparisons using Bonferroni correction were used to compare clinical features in different U4 domains (n.64_65insT versus n.76C>T and n.64_65insT versus the other variant groups) for 41 clinical features.

Conservation and in silico predictions

The highest homologs to the human RNU4-2 and RNU5B-1 were obtained for Ciona intestinalis, Ciona savignyi, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio and Mus musculus by using BLAT on each of these genomes in Ensembl Release 112 (ref. 38). RNA sequences from RNU4-2 and RNU5B-1 were aligned to (1) their respective sequence homologs and (2) the sequence(s) of other U4- and U5-encoding genes expressed in the brain using Geneious Prime 2019 (Biomatters). The threshold for consensus was set to 100% identity, highlighting positions with 100% agreement between all sequences.

CADD PHRED scores and conservation in vertebrates (verphyloP) were calculated for P and LP patient variants and gnomAD v4.1.0 variants with CADD (v1.7)39. For each variant, in silico-mutated U4 RNA sequences were generated with seqkit mutate40. Bifold41 was used to generate the multiple U4:U6 interactions and calculate the minimum free energy. Comparisons were performed by applying the Mann–Whitney U test, two-sided.

Expression of snRNAs in brain tissues

We used small RNA data for different human embryonic brain regions to inspect the expression level of selected snRNAs. These data were generated by the ENCODE Consortium42—diencephalon (GSE78292), temporal lobe (GSE78303), occipital lobe (GSE78298), frontal cortex (GSE78293), parietal lobe (GSE78299) and cerebellum (GSE78291). Tracks show unique read signals for plus and minus strands from the default anisogenic replicate. Expression of these genes in the brain using BrainVar was previously investigated13.

RNA-seq

Peripheral blood mononuclear cells were isolated from 2 to 4 ml of EDTA-anticoagulated blood within 48 h of collection using UNISEP+ tubes (Eurobio Scientific, U-04). Cells were cultured in six-well plates (5.0 × 105 to 2.0 × 106 cells per well) in lymphocyte-stimulating medium (chromosome medium P; Euroclone, EKAMTB100) for 48–72 h at 37 °C (5% CO2). After incubation, one well per sample was treated for 4–5 h with 1 mg ml−1 puromycin (Invivogen, ant-pr-5), an indirect NMD inhibitor. RNA was extracted using the NucleoSpin RNA Plus extraction kit (Macherey-Nagel, 740984.50) according to the manufacturer’s instructions.

Stranded RNA-seq libraries were prepared from 100 ng of total RNA on the Magnis NGS Prep System (Agilent) using the SureSelect XT-HS2 kit (Human All Exon V8 capture probes, G9774C) with 12 and 10 PCR cycles for precapture and postcapture amplifications, respectively. RNA-seq was sequenced on an Illumina’s NextSeq 550 (16 samples on HighOutput 2 × 75 bp) to obtain 25–30 million paired-end reads per sample.

Fastq files were aligned on the GRCh38 reference genome with STAR (v.2.7.11a) in two-pass mode using Ensemble transcripts (v.106). Quality control was performed with fastqc (v.0.11.3) and fastp (v.0.23.4). CIBERSORTx (v1.0) was used to estimate the relative abundance of blood cells using the LM22 signature matrix file43. One RNU4-2 sample was removed because of a low proportion of activated T CD4+ cells (1/38; Supplementary Fig. 5a). To generate the RNU4-2 splicing signature, we used lymphocytes from 19 RNU4-2 samples and 21 controls not treated by puromycin by using rMATS-turbo (v.4.3.0)21 with the following parameters: -t paired –anchorLength 1 –libType fr-firststrand–novelSS –variable-read-length –allow-clipping. Controls were matched on the following criteria: library preparation kit, sequencing flow cell and culture time. Python scripts were used to filter rmats output files with the following filters: mean coverage >7, false discovery rate (FDR) < 0.1, deltaPSI > 0.05. PCA was performed using the sklearn Python library using PSI values from significant alternatively spliced exons, keeping only (1) the most significant call when several were called impacting the same exon and (2) events affecting genes with approved HGNC symbols. The significant calls from the 5′SS signature were used to perform an additional PCA with three testing samples (n.45_46insT, n.62C>T and n.64_65insT). The study of NMD impact was performed by comparing nine patients with RNU4-2 P/LP variants (five n.64_65insT, n.67A>G, n.70T>C, two n.76C>T) against 22 controls, all treated with puromycin. The splicing study of the RNU5B-1 (two n.39C>G and two n.44A>G) and RNU5A-1 (n.40_41insA and n.39del) variants was performed by comparing each variant to the same 21 controls as for RNU4-2 using the same rMATS-turbo parameters except for the singletons variants n.40_41insA and n.39del for which an FDR threshold of 0.01 was used. To ensure the specificity of the signal, we included additional 20 controls, 19 RNU4-2 and other U5 variants, not involved in the statistical analysis, for PCA visualization. Raw spliceAI scores were obtained from MobiDetails35,44. Sashimi plots were made using rmats2sahimi and boxplots with seaborn (v0.13.2). Consensus nucleotide sequences were generated using Logomaker (v0.8). Scripts used for RNA-seq analysis are available on Zenodo using the following link: https://doi.org/10.5281/zenodo.13868501 (ref. 45).

Epigenome-wide analysis and DNA methylation signature

Genomic DNA was extracted from whole blood and subjected to bisulfite conversion. DNA methylation profile was then derived using Infinium MethylationEPIC v2.0 BeadChips (Illumina, 20087708), in accordance with the manufacturer’s protocol. Patients and negative controls were balanced across 24 arrays and within each array row to reduce technical biases. DNA methylation arrays were generated at the ASGARD-Rouen genomic platform (University of Rouen and Rouen University Hospital) on an Illumina NextSeq 550 scanner. Raw IDAT data were processed and normalized using the default Meffil R package protocol along with all other samples included in the 24 arrays to better estimate the variability of methylation signals within and across arrays46. One RNU4-2 sample failed default quality controls and was excluded from further steps. The remaining samples were functionally normalized together as advocated in the Meffil documentation, with random effect adjustment on array and sentrix row as well as fixed effect adjustment on the first two PCs, before computing β values.

Several predictions were obtained from methylation values to apply additional quality control and normalization steps. Sex predictions were extracted from the standard Meffil normalized object. No inconsistencies between reported and predicted sex were noted (Supplementary Fig. 6a). Blood cell counts were estimated with the meffil.cell.count.estimates function. PCA of predicted blood cell counts showed a good overlap of positive and negative controls in terms of overall blood cell composition (Supplementary Fig. 6b). DNA methylation age was predicted with the DNAmAge function from the methylclock R package47. The Horvath and skinHorvath clocks both displayed a very strong correlation with actual age at blood sample on our dataset (Pearson correlation r = 0.97; Supplementary Fig. 6c,d).

The set of differentially methylated probes was identified with the meffil.ewas function on the subset of controls and P or LP RNU4-2 variant carriers. To correct for well-known confounders, the differential analysis accounted for skinHorvath age, sex and predicted blood cell composition. Manhattan and volcano plots are given in Extended Data Fig. 9. After filtering the P value at 10−7 (Bonferroni-corrected threshold for an effective number of approximately 500,000 independent tests) and the average methylation difference between positive and negative controls at 0.05, adjusted methylation levels were visualized through PCA and heatmap representations (pheatmap package with the Euclidean distance and Ward aggregation method). Namely, a baseline methylation level model adjusting for skinHorvath age, sex and cell blood composition was fitted on negative control samples for each probe. Adjusted methylation levels were computed for each sample from this model by correcting each β value for the expected baseline level according to this model. Phenotype classification into mild/moderate and severe subtypes was derived independently from the episignature discovery and a posteriori added to these graphical representations.

Finally, the robustness of the signature was challenged through fivefold cross-validation. The dataset was split into five random and equal-sized blocks. Each block was used in turn as a validation set, while the remaining four blocks were used as a training set to run a new differential analysis based on controls and moderate-to-severe phenotypes. An SVM model was trained on each cross-validation training set and applied to the test set to derive unbiased sensitivity and specificity estimations overall, by phenotype class and variant type, along with 95% binomial CIs. For RNU5A-1 variants, the pathogenicity score was derived from a prediction model based on the complete training set.

Combined analysis of KMT2D, KMT2A and RNU4-2 signatures was done similarly to the main RNU4-2 analysis. Epic v1 and Epic v2 samples were imported and normalized separately with standard Meffil functions. Baseline methylation models were fitted separately on Epic v1 and Epic v2 positive and negative control samples. Adjusted methylation profiles of CpG positions belonging to the union of published KMT2D and KMT2A signatures, as well as the RNU4-2 signature, were then combined and represented on a heatmap using the pheatmap package with Euclidean distance and Ward aggregation method for columns.

Variant impact

Structural analysis of variants and corresponding figures was performed using the PyMol v3.0.0 visualization software48 on published coordinates of the human tri-snRNP structure—Protein Data Bank (PDB) IDs: 6QW6 (ref. 4) and 8Q7N ref. 49.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.