Main

While approximately2 million individuals worldwide are affected by retinitis pigmentosa (RP), it is estimated that 30% to 50% remain without a conclusive genetic diagnosis, even after exome or genome sequencing is performed1,2,3,4. This reflects high genetic heterogeneity, limited testing access and as-yet-unidentified disease genes, which in general carry pathogenic variants that are exceedingly rare in the control population5,6,7.

Noncoding RNAs are essential to many cellular processes, including pre-messenger RNA (pre-mRNA) splicing, which is ensured by the spliceosome, a macromolecular complex that in its major form is composed of five small nuclear RNAs (snRNAs), U1, U2, U4, U5 and U6, and ~300 proteins8. Each snRNA associates with a specific set of proteins to form a small nuclear ribonucleoprotein (snRNP), the functional unit of the spliceosome. Variants in RNU4-2, one of the two paralogs encoding U4, have been linked to a common neurodevelopmental disorder (NDD) known as ReNU syndrome (OMIM: 620851). These variants account for up to 0.4% of all NDD cases and lead to systematic misrecognition of donor splice sites by the spliceosome9,10,11. Likewise, RNU2-2 and RNU5B-1 have been recently associated with NDDs11,12,13.

Several spliceosomal proteins are also known to be involved in a wide range of hereditary diseases, including RP, as first noted by McKie and colleagues14. Specifically, of the ~100 genes that are currently associated with nonsyndromic RP5, the tri-snRNP splicing factor genes PRPF3, PRPF4, PRPF8, PRPF31 and SNRNP200 underlie the autosomal dominant form of the condition (adRP), with variants in PRPF31 accounting for 10–20% of all adRP cases3,15.

Here, we identify both inherited and de novo variants in RNU4-2 and four paralogs of RNU6, encoding the U6 snRNA, as the molecular cause of adRP in 153 individuals across 67 families. We demonstrate that all identified variants cluster within the U4/U6 duplex, in a region that binds directly to PRPF31 and PRPF3 and indirectly to PRPF6 and PRPF816,17. Furthermore, we show that such variants increase the association of U4 and U6 snRNAs with the splicing factors SART3 and PRPF31, suggesting impaired snRNP biogenesis.

Results

RNU4-2 variants underlie adRP

We initially examined a nonconsanguineous family with adRP (Family M1-A; Supplementary Fig. 1), in which seven of eight siblings (II:1–II:7) and their father (I:1) displayed classical RP features (Supplementary Fig. 2 and Supplementary Data 1). Genome sequencing was negative for pathogenic variants in known retinal disease-associated genes, but selective DNA variant filtering and shared haplotype analysis revealed a total of 55 variants that were absent from gnomAD v.4.17,18 and co-segregated with RP. Of these, none was predicted to impact splicing (SpliceAI > 0.2)19 and only one was evolutionarily conserved (GERP = 4.03 and phyloP-vertebrate = 3.18)20,21, a single-nucleotide insertion in the gene RNU4-2 (NR_003137.2:n.18_19insA; Fig. 1a, Supplementary Fig. 1 and Supplementary Tables 1 and 2). This DNA change was present in one individual from the All of Us database22.

Fig. 1: Structure of the U4/U6 duplex and rare variants found in RP cases and controls (gnomAD).
figure 1

a, Two-dimensional structure of the U4/U6 duplex, with recurrent variants identified in RP cases (in red for U4 and in green for U6), all clustering within the three-way junction. Nucleotides affected by variants previously observed in NDD cases are underlined. b, Rare variants affecting RNU4-1, defined as AF < 0.1% in gnomAD v.4.1, identified in RP cases and in controls. c, Same as in b for RNU4-2, with recurrent pathogenic variants displayed in red. d, Same as b for all five RNU6 paralogs combined, with recurrent causative variants displayed in green. Significant P values for variants enriched in RP cases versus controls from gnomAD are indicated (two-sided Fisher’s test with Bonferroni correction).

To find additional families, we first screened by Sanger sequencing a cohort of 1,891 individuals from the European Retinal Disease Consortium (www.erdc.info) with RP or Leber congenital amaurosis who remained undiagnosed after a large high-throughput screening using single molecule Molecular Inversion Probes23. This analysis led to the identification of three additional families comprising 15 affected individuals segregating the same pathogenic variant (Supplementary Fig. 1 and Supplementary Tables 1 and 2). The n.18_19insA allele was significantly enriched in the RP cohort compared with both the gnomAD and the All of Us databases (analyzed control genomes: 76,215 and 414,000, respectively; Bonferroni-corrected P values = 2.6 × 10−3 and 6.9 × 10−5, respectively, by two-sided Fisher’s test; Supplementary Table 3). Additional screening of the RNU4-2 sequence in the same cohort led to the identification of 28 other variants, one of which (n.56T>C) recurred in eight individuals from four families (Fig. 1a, Supplementary Fig. 1 and Supplementary Tables 1 and 2), was absent in controls and was significantly enriched in patients versus controls (Bonferroni-corrected P values = 6.4 × 10−5 (gnomAD) and 7.9 × 10−8 (All of Us); Supplementary Table 3).

Additional screening of 2,830 RP cases without previous genetic diagnosis from our respective institutions’ cohorts, the UK National Genomic Research Library (hosting data from the Genomics England 100,000 Genomes Project24 and from the NHS Genomic Medicine Service) uncovered an additional patient harboring n.18_19insA (for whom the variant was de novo) and six families (nine affected individuals) carrying the n.56T>C variant (Supplementary Fig. 1 and Supplementary Tables 1 and 2). Altogether, recurrent variants in RNU4-2 were identified in 41 affected individuals from 15 families (Supplementary Fig. 3 and Supplementary Tables 1 and 2). Of note, incomplete penetrance was observed for nine obligate carriers, without visual symptoms (Supplementary Fig. 1). One carrier of n.56T>C was asymptomatic, with subnormal electroretinogram, diffuse atrophic changes in the periphery and attenuated vessels. Another individual with the same variant showed no clinical signs of disease upon examination, and seven (among whom four were deceased) were not clinically evaluated to determine their disease status. Our combined screening of RNU4-2 also revealed 24 other unique rare DNA changes in 27 families, which were classified as variants of uncertain significance (VUS), as well as three benign changes (Supplementary Table 3).

Because U4 snRNA can also be transcribed from its paralog RNU4-1, which differs from RNU4-2 at only four positions (n.37, n.88, n.99 and n.113; Supplementary Table 4), we next examined its sequence in our initial cohort and identified 63 variants, none of which were significantly enriched in cases compared with controls; also, these changes did not include variants at sites corresponding to n.18_19 and n.56 of RNU4-2 (Fig. 1b and Supplementary Table 3). Notably, RNU4-1 appears to be more tolerant to variation compared with RNU4-2, as evidenced by the numerous and frequent variants that are present in genomes from the general population (cumulative allele frequency of 20.4% in RNU4-1 versus 1.2% in RNU4-2; gnomAD v.4.1) (Fig. 1b,c and Supplementary Fig. 4), as already noted previously9.

Variants in U6 paralogues also cause RP

In the di-snRNP and the tri-snRNP complexes of the major spliceosome, U4 binds to U6 to form the U4/U6 RNA duplex. We therefore hypothesized that variants in U6 could also underlie adRP and extended our analysis to all five identical paralogous genes producing the U6 snRNA, scattered across the genome (RNU6-1, RNU6-2, RNU6-7, RNU6-8 and RNU6-9; Supplementary Table 4). A screening of these genes by Sanger sequencing in our initial cohort of 1,891 RP families revealed 94 DNA changes in total. The n.55_56insG insertion recurred at the exact relative position in RNU6-2, RNU6-8 and RNU6-9 (four families per gene, 34 cases in total; Supplementary Fig. 1 and Supplementary Tables 1 and 2) and was significantly enriched in cases versus controls, who were all negative for this change (Bonferroni-corrected P value = 2.6 × 10−18 (gnomAD) and 5.1 × 10−27 (All of Us); Supplementary Table 3). Since this variant was identical in three U6 genes, we reasoned that the specific DNA change, rather than any particular paralog, was relevant to the etiology of the disease. We therefore repeated our analysis by collapsing the five RNU6 genes and detected 66 unique variants. Another insertion, n.56_57insG, was identified in two unrelated families (once in RNU6-2 and once in RNU6-9, four cases in total; Supplementary Fig. 1 and Supplementary Table 2) and found to be significantly enriched in cases versus controls (Bonferroni-corrected P value = 1.8 × 10−3 (gnomAD, a single RNU6-2 positive individual of unknown status) and 2.1 × 10−5 (All of Us, no positive individuals); Supplementary Table 3). We then extended our analysis to the same international cohorts of patients that were previously analyzed (n = 2,830) and identified 74 additional cases from 38 families who were positive for either n.55_56insG or n.56_57insG (Supplementary Table 2).

In total, these two variants were detected in 112 affected individuals from 52 families, involving all RNU6 paralogs except RNU6-7. The n.55_56insG insertion was present in most cases (102 individuals from 47 families), occurring in four of the five RNU6 paralogs: RNU6-1, RNU6-2, RNU6-8 and RNU6-9, while n.56_57insG was present in ten individuals from five families, in RNU6-1, RNU6-2 and RNU6-9 (Supplementary Tables 1 and 2 and Supplementary Figs. 1 and 3). Notably, n.55_56insG was confirmed to be a de novo event in eight individuals, clinically identified as sporadic cases. In 14 additional pedigrees, it was also observed in individuals born to unaffected parents, for which de novo inheritance was suspected but could not be confirmed, due to the lack of parental DNA. In contrast, no de novo events could be detected for n.56_57insG, which was identified exclusively in families with adRP (Supplementary Fig. 1). Similar to the screening of the RNU4 paralogs, our analysis of RNU6 paralogs revealed 66 VUSs and 23 benign variants, validated by Sanger sequencing (Supplementary Table 3).

In summary, we identified variants in RNU4-2 or RNU6 paralogs that underlie de novo or inherited dominant RP in 67 families. The overall phenotype across all cases was consistent with classical RP, based on clinical examination and electrophysiological testing, with symptomatic onset predominantly in adolescence (Supplementary Table 5). In addition, other concurrent ocular disease features were noted across individuals in the cohort: cystoid macular edema (55.9%), non-age-related lens opacities (23.6%) and various vitreomacular complications (30.6%) (Supplementary Table 5). Based on our data from these 4,722 RP cases, mostly of European descent and lacking a genetic diagnosis, we estimate that RNU4- and RNU6-associated RP could be responsible for ~1.4% of all molecularly undiagnosed individuals with this disease. Furthermore, considering that approximately 30% of RP diagnoses correspond to adRP25,26 and that our positive families include 24 isolated individuals, we can further infer that these variants may account for approximately 3.0% of undiagnosed adRP families.

Predicted effects of variants on the U4/U6 duplex

All RP variants are predicted to map in spatial proximity with each other, within the three-way junction delimited by stem-I and stem-II of the U4/U6 duplex and the 5′ stem-loop of U4 (Figs. 1a and 2a). In particular, they are located in a different region compared with those underlying NDD (Fig. 1a). In silico two-dimensional modeling of RNA secondary structure predicted as well that the RNU4-2 variant n.18_19insA inserts a nucleotide between stem-II and the U4 5′ stem-loop (Supplementary Fig. 5a,b), while n.56T>C disrupts the first base-pairing of the U4/U6 duplex within stem-I (Supplementary Fig. 5a,c). Both changes lead to the extension of the internal loop, an event that is predicted to impact the overall stability of the duplex. In addition, n.18_19insA slightly modifies the orientation of the 5′ stem-loop relative to stem-I and stem-II (Supplementary Fig. 5a,b).

In contrast, both n.55_56insG and n.56_57insG in RNU6 paralogs are predicted to extend the length of stem-I by three additional base pairs, reduce the size of the internal loop and drastically change the orientation of the 5′ stem-loop (Supplementary Fig. 5a,d,e). Interestingly, we observed that a benign insertion at the same position, n.55_56insT, was present in gnomAD v.4.1 in all five RNU6 paralogs with a cumulative frequency of 0.12% (n = 181) (Supplementary Fig. 5f). While these models provide a coherent structural rationale for the observed clustering, the precise effects of the variants on U4/U6 architecture remain to be experimentally verified.

Analysis of cryo-electron microscopy data (PDB 6QW6)27 confirmed that all RP variants identified reside in a region critical for binding of the U4/U6 duplex to the splicing factors PRPF31, PRPF3 and PRPF8, all previously associated with adRP16,17 (Fig. 2b). Specifically, this region first engages PRPF31 or the PRPF3/PRPF4 complex, initiating the assembly interface, and is subsequently stabilized in its native orientation upon the coordinated binding of additional tri-snRNP components, including PRPF6 and PRPF828. The mutated and neighboring U4 and U6 nucleotides detected in RP cases directly participate in the binding of PRPF31 and PRPF3 (Fig. 2c,d), via hydrogen bonds with eight and three residues of these proteins, respectively. Notably, by querying the ClinVar database29, we detected a missense variant affecting one of these residues, p.(Arg449Gly) of PRPF3, identified in a three-generation family with seven affected individuals having clinical features similar to those observed in most cases from our study30.

Fig. 2: Three-dimensional structure, from cryo-electron microscopy (PDB 6QW6), of the U4/U6 duplex and its interactions with neighboring splicing factors.
figure 2

a, Naked U4/U6 pairing, showing the proximity of the causative variants identified (red and green). b, Same as in a with interacting PRPF proteins. c, Direct interactions of nucleotides of the U4/U6 duplex with PRPF31, via hydrogen bonds. d, Same as c but for PRPF3.

Expression of RNU4 and RNU6 genes

Since the human genome contains several RNU4 and RNU6 pseudogenes31, we investigated whether any of these might be incorrectly annotated and could instead produce functional RNA, potentially contributing to the disease. In addition, we sought to understand why the various U4 and U6 paralogs appear to be differentially mutated, with RNU4-1 and RNU6-7 displaying none of the recurrent pathogenic variants. We used RNA sequencing (RNA-seq) data from human neurosensory retina (NSR), retinal pigment epithelium (RPE) and choroid that were enriched for small RNAs, applying stringent and paralog-aware bioinformatics analyses designed to mitigate the complexities associated with reads aligning against multiple paralogs and/or pseudogenes (Methods). RNU4-2 was more highly expressed than RNU4-1 in all tissues (average ratio: 1.63; Fig. 3a). Conversely, individual expression of RNU6 genes and pseudogenes in the retina could not be reliably quantified by RNA-seq, since their sequences are identical, except for the last nucleotide. Therefore, we compared the total expression of RNU4 and RNU6, regardless of their respective paralogs and pseudogenes. RNU6 expression was on average 3.39× higher across the three tissues, compared with RNU4 (Fig. 3b). Of note, NSR and RPE had higher expression of RNU4 (2.51×) and RNU6 (6.09×) with respect to the choroid, an ocular tissue not directly involved in vision, used as a control (Fig. 3b). This observation is in agreement with previous data showing that snRNA expression in the retina is approximately sixfold higher compared with muscle, testis, heart and brain32, indicating a high demand for snRNAs in these two retinal layers.

Fig. 3: Expression and markers of transcriptional activity of RNU4 and RNU6 genes.
figure 3

a, Expression of RNU4-1 and RNU4-2 from RNA-seq of human donor choroid (n = 13), NSR (n = 4) and RPE (n = 16). For these boxplots, the tick line within boxes indicates the median (also expressed numerically), boxes represent the first and the third quartiles and whiskers indicate the largest observation smaller than or equal to the first quartile − 1.5 × IQR and the smallest observation greater than or equal to the third quartile + 1.5 × IQR. b, Same as in a for RNU4 genes (RNU4-1, RNU4-2 and pseudogenes) and for all RNU6 genes (RNU6-1, RNU6-2, RNU6-7, RNU6-8, RNU6-9 and pseudogenes). c, ATAC-seq and H3K27ac signals for RNU4-1, RNU4-2, RNU4ATAC (red) and 105 RNU4 pseudogenes (black). d, Same as in c for five RNU6 genes and RNU6ATAC (red), as well as for 1,312 RNU6 pseudogenes (black). IQR, interquartile range.

In addition, we analyzed the Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) and H3K27ac chromatin immunoprecipitation followed by sequencing (ChIP–seq) data from retinal tissues33 in genomic regions spanning all RNU4 and RNU6 sequences. ATAC-seq assesses chromatin accessibility across the genome, while H3K27ac ChIP–seq reveals the presence of active enhancers. These data, combined, indicate potential active transcription at promoter regions. Our analysis showed clear transcription marks in all paralogous RNU4 and RNU6 genes in the retina (Fig. 3c,d). Conversely, these signatures were absent from the 105 U4 pseudogenes and the 1,312 U6 pseudogenes, except for RNU4-8P, which displayed strong signals, but probably by virtue of its close proximity to the ACTR1B promoter. Of note, RNU6-92P and RNU6-656P had high ATAC-seq signals but very low H3K27ac signals at their respective promoters (Fig. 3d).

We performed the same analysis for other snRNA genes present in the human genome, which revealed a similar trend: all RNU genes, with the exception of RNU5F-1, had marks of active transcription and only a few among the thousands of RNU pseudogenes displayed signals compatible with potential expression, therefore representing plausible candidate genes for retinal disease (Supplementary Fig. 6). In addition, RNU2-2 was recently implicated in NDD, yet without evidence of ocular involvement13. Interestingly, the same type of analysis, based on conservation and expression data from GTEx, was recently performed by others, showing similar results34.

For RNU6-7, both ATAC-seq and H3K27ac signals were within the same range as those observed for other RNU6 genes (Fig. 3d), and, therefore, the absence of pathogenic variants could not be explained by a potential differential expression. We thus analyzed the genetic landscape of variations in healthy individuals in all five U6 paralogs and observed that RNU6-7 displayed a lower number of variants, compared with the others (Supplementary Fig. 7). We also identified the recurrent variant n.55_56insG in RNU6-7 in six control individuals of African or African American ancestry in gnomAD v.4.1 (allele frequency (AF) = 0.014%) and in 14 individuals of African origin in the All of Us database (AF = 0.013%). These seemingly contradictory observations merit further investigations in future studies.

Transcriptome analysis in patients

We performed transcriptome analysis following the collection of RNA from circulating leukocytes in nine affected individuals carrying variants in RNU4-2, RNU6-1 and RNU6-9 (three individuals per gene), as well as from 14 healthy controls (Supplementary Table 6). To avoid systematic errors linked to the use of different collection kits35 across our cohort (Methods), we performed independent case–control tests for samples collected with PAXgene kits (three RNU4-2 cases, six controls) or Tempus kits (six RNU6 cases, eight controls). We identified 27 and eight differentially expressed genes in the two datasets, respectively, with no gene overlap (fold-change > 2 and false discovery rate (FDR) P value < 0.05; Supplementary Table 7), indicating no major differences in global gene expression in leukocytes in cases versus controls. We then further explored the data by investigating potential bias in pre-mRNA splicing, as performed in ref. 11. This analysis led to the identification of 107 upregulated and 67 downregulated 5′ splice sites in the PAXgene set, and 37 upregulated and 13 downregulated 5′ splice sites in the Tempus set, with only two sites in common between the datasets (in the genes CLEC2D and SNHG29; Supplementary Table 8). At these two sites the expression differences, although statistically significant, were below 10%.

We then examined the DNA sequences of the 224 differentially expressed splice sites, focusing on the occurrence of the ‘AG’ dinucleotide at positions −2/−1, which was previously reported to be enriched in sites with increased splicing in RNU4-2 in patients with NDD11. No differences in ‘AG’ frequency were observed between sites with increased versus decreased usage in either the PAXgene or Tempus samples (two-tailed Fisher’s test, P = 0.63 and P = 0.52, respectively). We also compared nucleotide frequencies at each position between upregulated and downregulated splice sites, separately for the two groups, and found no significant differences (two-tailed Fisher’s test with FDR correction, P < 0.05). A similar analysis of overlapping dinucleotides across the region flanking the 5′ splice site (for example, positions −4/−3, −3/−2, up to +7/+8) revealed no significant differences.

Functional effects of RP variants

Since PRPF variants associated with RP affect primarily spliceosomal assembly32, we investigated whether the same phenomenon could be driven by the variants detected in this work. Specifically, we immunopurified ectopically expressed U4 and U6 snRNAs containing the RP variants and analyzed their association with specific markers for the U6 snRNP (SART3), the U4/U6 di-snRNP (SART3 and PRPF31), the U4/U6.U5 tri-snRNP (PRPF31 and SNRNP200) and the U5 snRNP (SNRNP200). The combined results showed an increased association of snRNA constructs with RP variants with SART3 and partially with PRPF31, while the interaction with SNRNP200 was unchanged or reduced (Fig. 4). For comparison, we included in our assays the U4 n.64_65insT variant, which causes NDD, and observed no significant alteration in the association with any of the proteins tested, compared with wild type (Fig. 4a). Additionally, no significant differences were detected between NDD and RP variants, pointing to the need for targeted functional studies to delineate their respective impacts on spliceosome dynamics. Similarly, U6 RNA bearing the n.55_56insT and n.57T>G variants, observed in healthy control individuals, presumably did not affect spliceosome formation, since the low amount of protein associated with them implies that they entered the spliceosome assembly process only minimally (Fig. 4b). Taken together, the results indicate that RP pathogenic variants have potentially a specific dominant effect on snRNP biogenesis and delay the assembly process at the di-snRNP stage.

Fig. 4: Effects of RP variants in RNU4-2 and RNU6 on snRNP maturation.
figure 4

a,b, Immunoprecipitation of U4-MS2 (WT and variants) (a) and U6-MS2 (WT and variants) (b). snRNPs were immunoprecipitated via MS2-YFP by anti-GFP antibodies and co-precipitated proteins were detected by western blotting. The position of the MS2 loop (green) in snRNAs is indicated. Four independent experiments were quantified. Immunoprecipitated proteins are normalized to input and U4 or U6 WT controls. Middle bars indicate average values and error bars the s.e.m. Statistical significance was analyzed by the two-tailed unpaired t-test and the P values were adjusted using the Benjamini–Hochberg FDR method to control for false discoveries. P values ≤ 0.05 are indicated. Full-length blots and antibody validation are provided as Source Data. Ctrl, control; IP, immunoprecipitation; WT, wild type.

Source data

Discussion

The numerous genes associated with RP and allied diseases belong to a wide range of functional classes, from retina-specific biochemical pathways to ubiquitous cellular processes5. Yet, how these defects ultimately lead to retinal degeneration often remains unclear. The link between pathogenic variants in splicing factors of the tri-snRNP complex (RP-PRPFs), essential for survival in all eukaryotes, and RP, a phenotype limited to the eye, represents perhaps the most intriguing of these biological enigmas.

In this study, we identified recurrent heterozygous variants in RNU4-2, encoding U4 RNA, and in multiple paralogs of the U6 RNA as a cause of RP. Interestingly, these snRNAs are also an integral part of the di- and tri-snRNP and directly interact with some RP-PRPF proteins. In addition, similar to RP-PRPFs, they are also associated with the same specific phenotype: de novo or inherited adRP, with reduced penetrance for RNU4-2 variants. Importantly, the clinical presentation of patients with RNU4-2 and RNU6 variants overlaps with that of other spliceosome-related forms of adRP, particularly showing an earlier onset—contrasting with the generally milder prognosis observed in most other adRP types36,37—and a relatively high co-occurrence of features such as cataracts and cystoid macular edema, found in cases with PRPF3138,39, PRPF840 and SNRNP20041 variants. Prevalence estimations indicate that these snRNA pathogenic changes may account for an elevated number of undiagnosed cases, and it is therefore surprising that the RNU4 and RNU6 genes have escaped disease association until now. A partial explanation for this phenomenon is that mainstream sequencing approaches are biased towards DNA-capturing procedures that do not include snRNA genes. Furthermore, although genome sequencing is increasingly being adopted in routine diagnostics, variants in snRNA genes may have remained undetected because they affect noncoding transcripts, which are more challenging to interpret and are often overlooked or deprioritized by standard analytical pipelines.

An intriguing feature of pathogenic changes in RNU4-2 is their pleiotropy with respect to NDD (ReNU syndrome) and RP. Chen et al.9 described that more than half of the patients with ReNU also display some visual abnormalities, although only three were documented as having retinal phenotypes (one had an abnormal electroretinogram response, one had Leber congenital amaurosis and one presented with macular dysfunction). However, most cases were too young to display the typical symptoms of RP, which usually manifest during adolescence or early adulthood42. Although the exact mechanism for this phenotypic selectivity is unknown, RNU4-2 variants represent a clear and new allelic series involving noncoding RNA genes. A recent preprint highlighted a strong effect of ReNU variants in an RNU4-2 saturation genome editing experiment, while the RP variant n.56T>C in the same gene did not show any statistically significant effect43. In addition, the region of RNU4-2 containing RP variants showed function scores within the neutral range of the saturation genome editing assay43, suggesting a potentially milder pathogenic effect compared with ReNU changes. It is therefore plausible that the RP variants identified in RNU4-2 and RNU6 paralogs could lead to photoreceptor death and subsequent visual loss, while having no influence on the development of the brain. Additionally, ReNU variants are located in the stem-III and the T-loop of the U4/U6 duplex and interfere with the proper recognition of intronic 5′ splice signals, likely because these regions are involved in pairing pre-mRNA with U6 (ref. 9). In contrast, RP variants cluster in spatial proximity to the three-way junction, in regions not directly engaged in interactions with pre-mRNA but that are involved in the binding of various proteins, including RP-associated splicing factors.

Consistent with this evidence, we did not observe any major splicing anomalies in transcripts from patients with RP, with the only two significant events showing differences below 10% in expression. Conversely, our biochemical assays support a role for RP-associated variants in altering spliceosomal assembly. As the magnitude of the observed changes was in all instances rather moderate (less than 1.5-fold with respect to controls), we interpret these data as indicating that snRNA variants associated with RP are unlikely to prevent the assembly of spliceosomal complexes. Rather, they may cause a subtle alteration in snRNP dynamics, possibly affecting the efficiency of their biogenesis or recycling steps. In particular, the increased association with SART3 and, to a lesser extent, with PRPF31, together with unchanged or slightly reduced interaction with SNRNP200, may indicate a modest delay in the transition from the di-snRNP to the tri-snRNP form. Moreover, the pathogenic variants identified in this study lie within regions of the U4/U6 duplex that directly contact the PRPF3 and PRPF31 proteins, two splicing factors linked to adRP whose mutations also delay spliceosomal complex assembly32.

In terms of specific molecular effect, our functional data show that snRNAs bearing RP variants display enhanced interaction with di-snRNP protein markers, suggesting that pathogenesis could result from a gain-of-function or dominant-negative mechanism, rather than from haploinsufficiency. This hypothesis is strengthened by the evidence that molecularly similar but benign variants, commonly observed in the general population, seem not to bind efficiently to di-snRNP markers and potentially not to be incorporated into the spliceosome, supporting the idea that spliceosomal functions could be haplosufficient with respect to heterozygous and snRNA-depleting variants.

Although DNA changes associating RNU4-2 to ReNU syndrome have been primarily reported as de novo events9,10, in our study most families with RP (61%) bore RNU4-2 and RNU6 changes as inherited variants. In part, this difference can be explained by the reduced reproductive fitness associated with NDD versus RP. Unlike ReNU syndrome, symptomatic onset (night-blindness and peripheral vision loss) in nonsyndromic adRP begins later in life, with severe central vision loss usually occurring after the onset of reproductive age. Another difference involves the inheritance of dominant variants, which in ReNU seems to be almost exclusively of maternal origin9. We did not observe the same trend for RP, with variants being inherited from either of the parents, possibly indicating the absence of any sex-specific negative selection during gametogenesis or at the embryonic stage.

The human genome contains two RNU4 paralogs and five RNU6 paralogs. This indicates that, assuming equal expression within paralogs, the presence of only ~25% of mutant U4 (heterozygous genotype, over two copies) or ~10% of mutant U6 (heterozygous genotype, over five copies) is sufficient to lead to a disease phenotype, again in support of a gain-of-function or dominant-negative molecular mechanism. This could be a crucial consideration for the development of potential gene-based therapies, as gene-augmentation strategies may be suboptimal compared with gene correction or antisense oligonucleotide approaches. Our data also highlight the existence of mutational hotspots outside the coding regions of the human genome, emphasizing the need for further research into these parts of our genetic material, and show that the clustering of de novo pathogenic variants is not restricted to severe diseases with childhood onset44, but may extend to milder pathologies, such as RP.

In conclusion, we identified four recurrent pathogenic variants in RNU4-2 and in four of the five paralogs of the U6 snRNA as a frequent cause of de novo or inherited adRP. The immediate impact of these findings involves improved diagnosis and genetic counseling for patients with hereditary visual loss, especially for isolated cases who could potentially bear heterozygous de novo events. More fundamentally, this work substantially broadens our understanding of the genetic landscape of human disease, paving the way for the development of new molecular therapeutic approaches.

Methods

Patients and DNA samples

This study adhered to the tenets of the Declaration of Helsinki, and signed, informed consent was obtained from all participants. All procedures were conducted in accordance with Institutional Review Board-approved human research protocols and were approved by the ethics committees of the Radboud University Medical Center (Nijmegen, the Netherlands) and the Rotterdam Eye Hospital (Rotterdam, the Netherlands) (MEC-2010-359; OZR protocol no. 2009-32), and the local ethics committees of all other participating institutions.

Clinical characterization and analysis

Complete ophthalmic examinations were performed by a retinal specialist, which included measurement of best-corrected visual acuity and intraocular pressures, and examination of anterior segment and fundus (dilated). Color fundus photographs and montages were captured using the FF450plus Fundus Camera (Carl Zeiss Meditec) and Optos 200 Tx (Optos). Fundus autofluorescence images (488-nm excitation) and high-resolution spectral-domain optical coherence tomography (SD-OCT) scans were acquired using the Spectralis HRA+OCT module (Heidelberg Engineering). Hyper-autofluorescent ring contours were analyzed using a custom program in FIJI software (National Institute of Mental Health)45. Progression rates were calculated using linear mixed-effects regression in R (v.4.0.4) with time (years) since baseline as the primary independent variable, baseline ring size as a covariate and inter-ocular differences as a random effect. Photoreceptor+ thickness was assessed on horizontal SD-OCT scans through the fovea using a semi-automated procedure46. Photoreceptor+ was defined as the distance between the Bruch’s membrane/choroid interface and the inner nuclear layer/outer plexiform layer boundary. Layer segmentation was performed in a semi-automated manner using a custom software in MatLab (MathWorks). Full-field electroretinogram recordings were conducted using the Espion Visual Electrophysiology System (Diagnosys) according to International Society for Clinical Electrophysiology of Vision (ISCEV) standards47.

Genome sequencing and annotation

Genomic DNA from probands was isolated from peripheral blood lymphocytes according to standard procedures. Sequencing was performed by BGI Tech Solutions using the DNBseq Sequencing Technology, with a minimal median coverage per genome of 30×. The processing of the sequencing data was performed by using BWA mem (v.0.7.17)48, Picard (v.2.14.0-SNAPSHOT) (http://broadinstitute.github.io/picard) and GATK (v.4.1.4.1)49 for mapping to the human genome reference sequence (build hg19/GRCh37) and variant calling2. For variant annotation, we used ANNOVAR50 with the addition of splicing predictions by MaxEntScan51 and SpliceAI19.

Assessment of variants

Human Genome Variation Society (HGVS) notations of the variants were retrieved using VariantValidator52 and American College of Medical Genetics and Genomics (ACMG) classification53 was applied according to the ACGS Best Practice Guidelines for Variant Classification in Rare Disease 202354. In particular, we used the PS4_strong criterion for variants significantly enriched in cases versus controls (gnomAD v.4.1 and All of Us), as assessed for each variant by two-tailed Fisher’s exact test in R (fisher.test function), in agreement with the ACMG recommendations53 (odds ratio > 5.0, lower bound of the confidence interval > 1.0, corrected P value < 0.05), but only for variants present in at least three probands to avoid any bias from imbalanced case–control sets. This assessment was made using probands only (n = 1,891) for all variants, except for those in Supplementary Table 1, which were assessed in 4,722 individuals. PM6 and PP1 were applied according to ClinGen Sequence Variant Interpretation (SVI) recommendations. Specifically, PM6_sup was applied when two unrelated families had de novo variants without parental confirmation, given that RP is a ‘phenotype consistent with gene but not highly specific and high genetic heterogeneity’. PP1_sup, PP1_mod and PP1_strong were assigned when the variant segregated with disease in ≥1, ≥2 and ≥5 informative meioses, respectively55. We defined thresholds for PM2_sup and BS2 based on the frequency of RHO p.(Pro23His), the most prevalent variant causing adRP, which was detected once in gnomAD v.4.1 and 13 times in All of Us. Specifically, PM2_sup was assigned to variants that were present fewer than two times in gnomAD v.4.1 and fewer than 14 times in All of Us. BS2 was applied to variants that were observed more than four times in gnomAD v.4.1 or more than 28 times in All of Us, that is, twice the values of p.(Pro23His). PM2_sup was not applied to variants for which the PS4 criterion had already been used, to avoid double-counting evidence related to their low frequency in gnomAD. BA1 was considered for variants with allele frequency >5% in gnomAD v.4.1 or the All of Us databases, whereas BS1 was assigned to variants with allele frequencies greater than expected for disease (1/2,000 = 0.05%).

Screening by Sanger sequencing

Genomic DNA was collected, and RNU4-1, RNU4-2, RNU6-1, RNU6-2, RNU6-7, RNU6-8 and RNU6-9 genes were amplified using standard PCR procedures. RNU4-1, RNU4-2, RNU6-1, RNU6-2, RNU6-7, RNU6-8 and RNU6-9 PCR fragments were sequenced using Sanger sequencing and screened for novel variants in these genes.

Two-dimensional modeling of the effect of variants and three-dimensional representation

We utilized RNAfold WebServer to model the effect of variants with default parameters56 and RNAcanvas was used for drawing the structure57. We used ChimeraX with PDB file, using PDB file 6QW6 to draw three-dimensional representation of the U4/U6 duplex with and without surrounding PRPF proteins.

RNA-seq experiments and analysis

RNA was isolated from human donor eye tissue, which was collected and dissected according a reported procedure58 from an ethically approved Research Tissue Bank (UK NHS Health Research Authority reference no. 15/NW/0932). Total RNA was isolated from four NSR samples, 16 pelleted RPE samples and 13 choroid samples that had been stored in RNAlater (Thermo Fisher Scientific), using an Animal Tissue RNA Purification kit (Norgen Biotek), as per manufacturer’s instructions. Sequencing libraries were prepared using the NEBnext multiplex small RNA library preparation kit, as per manufacturer’s protocols, with size selection performed using Ampure beads. Paired-end sequencing (2 × 75 base pairs (bp)) was performed on an Illumina HiSeq 4000.

NEBnext adapters were removed from sequencing reads using trimmomatic (v.0.39) before alignment against the GRCh38 reference genome with bowtie59 (v.1.3). No mismatches between sequencing reads and the reference genome were allowed, and no restriction was set on multi-mapping reads. Sequence read counts were restricted to primary alignments using samtools (v.1.21)60, and therefore only counted once if they aligned to multiple RNU4 (n = 90) or RNU6 (n = 1,277) genes or pseudogenes. Calculations were drawn from read 1 datasets and normalized for the total read count achieved for the sample. Total RNU4 and RNU6 expression was based on all annotated genes and pseudogenes in GENCODE v.38.

ATAC-seq and H3K27ac ChIP–seq data

ATAC-seq data from ref. 61 (eight different experiments) and H3K27ac ChIP–seq data from ref. 62 (five different experiments) were downloaded as bigwig files from the RegRet database (http://genome.ucsc.edu/s/stvdsomp/RegRet)63. For both data types, the signal (the genes and 500 bp on each side) was extracted using bedtools (v.2.27.1) after conversion using bigWigToWig (v.469). We quantified the signal for all RNU genes and pseudogenes first by normalizing the signal of each experiment to the maximum and then summing them. For RNU4, we quantified two genes and 105 pseudogenes, while for RNU6 we assessed five genes and 1,312 pseudogenes, in addition to RNU4ATAC and RNU6ATAC.

RNA-seq from blood RNA

Peripheral blood samples were collected from affected individuals and controls using either Tempus Blood RNA tubes (Applied Biosystems) or PAXgene Blood RNA tubes (Qiagen). Total leukocyte RNA was extracted with the Tempus Spin RNA Isolation Kit (Applied Biosystems) or the Preserved Blood RNA Purification Kit II (Norgen Biotek), respectively, following the manufacturers’ protocols. Following the quality assessment of RNA integrity and concentration, 100 ng of input RNA per sample was subsequently processed for library preparation using the KAPA RNA HyperPrep Kit with RiboErase (HMR) and KAPA Globin Depletion Hybridization Oligos (Roche). Sequencing was performed on an Illumina NovaSeq 6000 platform with 2 × 101-bp paired-end reads. To improve quality score calculations for the final base, one additional base was sequenced in both read 1 and read 2. The Q30 value for all RNA-seq data was ≥91.1%. Adapters were trimmed with Skewer (v.0.2.2)64.

Reads were aligned to reference transcripts from Ensembl (v.110, GRCh38) using STAR (v.2.7.11a) with the option --twopassMode Basic. DESeq2 (v.1.46.0) with default options was used for differential expression analysis between different groups according to sample origin (Tempus or PAXgene tubes) and presence/absence of the pathogenic RNU genotypes, with fold-change > 2 and FDR P value < 0.05. We used rMATS65 to assess differential alternative splicing, separately for the Tempus and PAXgene sets and with specific options (--allow-clipping --variable-read-length --anchorLength 1 --novelSS --task both --libType fr-unstranded -t paired --readLength 101). We further used the Python scripts from ref. 11 to process the rMATS output and filter the data according to a mean coverage > 7, an FDR P value < 0.1 and a deltaPSI value > 0.05. The R function fisher.test with default parameters was used to assess differences in base compositions at splicing sites, at each position, as well as differences for 2-mers (for example, positions −4/−3 to +7/+8).

U4 and U6 snRNP analysis

U4 n.18_19insA, n.56T>C and n.64_65insT variants were introduced by site-directed mutagenesis into the plasmid expressing U4-MS266. The full-length U6 sequence, including 256 bp upstream and 93 bp downstream of the RNU6-1 gene, was inserted into the pcDNA3 plasmid lacking the CMV promoter. The MS2 loop was inserted between nucleotides 10 and 11. U6 n.55_56insG, n.55_56insT, n.56_57insG and n.57G>T variants were introduced by site-directed mutagenesis. U4- and U6-expressing plasmids were transfected into HeLa cells stably expressing MS2-YFP protein. At 24 h after transfection, snRNAs were immunoprecipitated using anti-GFP antibodies and co-precipitated proteins were analyzed by western blotting66.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.