Paired DNA and RNA sequencing uncovers common and rare variation regulating human retinal gene expression

Sampson, Jacob; Segrè, Ayellet V.; Bujakowska, Kinga M.; Clark, Simon J.; Bishop, Paul N.; Haynes, Steve; Baralle, Diana; Al-Deek, Jospin; Holden, Stacey; Anderson, Beverley; Hayes, Andrew; Kemal, Rahmat A.; Thomas, Huw B.; O’Keefe, Raymond T.; Banka, Siddharth; Black, Graeme C.; Sergouniotis, Panagiotis I.; Ellingford, Jamie M.

doi:10.1038/s41467-026-72979-4

Download PDF

Article
Open access
Published: 26 May 2026

Paired DNA and RNA sequencing uncovers common and rare variation regulating human retinal gene expression

Nature Communications volume 17, Article number: 4595 (2026) Cite this article

31 Altmetric
Metrics details

Subjects

Abstract

Genetic disorders impacting vision affect millions of individuals worldwide, including age-related macular degeneration (common) and inherited retinal disorders (rare). There is an incomplete understanding of the impact of genetic variation on gene expression in the human retina and its role in genetic disorders. Through the generation of whole genome sequencing and bulk RNA-sequencing of neurosensory retina and retinal pigment epithelium from 201 post-mortem eyes, we uncover common and rare genomic variants shaping retinal expression profiles. This includes 1,483,595 significant cis-expression quantitative trait loci impacting 9,959 and 3,699 genes in neurosensory retina and retinal pigment epithelium, respectively, with associated genomic variants enriched to cis-candidate regulatory elements and notable shared eGenes between both tissues. We also detect 1051 expression outliers and prioritise 299 rare non-coding single-nucleotide, structural variants or copy number variants as plausible drivers for 28% of outlier events. This study increases understanding of gene expression regulation in the human retina.

Introduction

Genomic variation has been well established to play a role in the onset and susceptibility of visual impairment by disrupting normal functioning of the retina, a highly specialised light-sensitive tissue at the back of eye¹. The retina depends on the interaction between neuronal and non-neuronal cell types, including those in the neurosensory retina (NSR), e.g. photoreceptors and ganglion cells² and the retinal pigment epithelium (RPE), a monolayer which lines the photoreceptor outer segments³. Inherited retinal disorders (IRDs) are a diverse set of largely monogenic conditions driven primarily by highly impactful genetic variants that are rare in the population and disrupt the function of the NSR and/or RPE⁴. Monogenic IRDs may present in isolation, for example, Stargardt disease, retinitis pigmentosa and cone-rod dystrophy or as part of a multi-system disorder, for example, Usher syndrome, Joubert syndrome and Senior-Loken syndrome. Age-related macular degeneration (AMD) is a common disorder that impacts the retina and is a leading cause of visual impairment in adults⁵, predicted to impact 288 million individuals by 2040⁶. Whilst non-genetic risk factors exist for AMD, including age, diet and lifestyle, its heritability is estimated to be as high as 71%⁷. Genome-wide association studies (GWAS) initially identified more than 50 genomic loci impacting 34 genes that convey high risk to AMD in a European ancestry cohort⁸. Recent expansion of AMD GWAS to Hispanic and African ancestries has uncovered 30 additional genomic loci and distinct AMD genomic architecture in these populations⁹.

The Genotype-Tissue Expression (GTEx) project has transformed our ability to pinpoint genetic variants that impact gene expression¹⁰, including tissue-specific and tissue-shared expression quantitative loci (eQTLs) and rare genetic variants associated with expression outlier (eOutlier) events. Findings from these investigations, along with other studies, have been leveraged across various biological and medical fields to gain a deeper understanding of disease mechanisms^11,12, to provide more informed diagnosis and prognosis¹³ and to pursue pathways for novel treatments¹⁴. Notably, ophthalmic tissue was not included in the GTEx resource, but recently, the EyeGEx study identified over 2 million eQTLs in the retina, from a cohort comprised of healthy eyes and eyes displaying signs of AMD¹⁵ and Strunz et al.¹⁶ identified 580,171 eQTLs in the neural retina from a cohort of healthy eyes. These resources enable the investigation of the role of common single-nucleotide variants (SNVs) influencing retinal gene expression. However, to our knowledge, there are no suitable datasets to also interrogate the impact of rare variants, structural variants (SVs) and copy number variants (CNVs) on retinal gene expression. Expanding our understanding of gene expression regulation in the retina will provide insights into the molecular mechanisms underlying both common and rare eye diseases and help identify new potential strategies for treatment and prevention.

Here, we describe the creation of a paired genomic and transcriptomic resource for the human retina from 201 donors and develop a new understanding of both common and rare variants that drive expression in this highly specialized tissue.

Results

The METR genome-transcriptome resource integrates genomic and retinal transcriptomic data from 201 post-mortem eye samples

The Manchester Eye Tissue Repository (METR) genome-transcriptome cohort comprises 201 unrelated individuals who donated eye tissue post-mortem. The median age of the cohort was 71 years (IQR 64–77) at the time of post-mortem, with a slight male predominance (63.7%). The median ischemic time was 40 h (IQR = 32–44) (Supplementary fig. 1). While 47 individuals (23% of the cohort) were found to carry genetic variants that confer high-risk for age-related macular degeneration (AMD), none of the 201 individuals included in the cohort had phenotypic presentation, assessed post-mortem, consistent with late-stage AMD or monogenic ophthalmic disorders.

Short-read whole genome sequencing was performed on an Illumina NovaSeq6000, with alignment and variant detection performed using DRAGEN software (v4.0.3). The median genome-wide average coverage per sample was 35.9x (IQR = 30.3–40.5) (Supplementary fig. 2), with an average of 88.0% and 92.8% of the genome covered by at least 15 or at least 10 sequencing reads, respectively. Joint SNV calling with DRAGEN PopGen 4.2.4 obtained aggregate calls at 15,617,784 high-confidence variant sites after quality control (Supplementary Data 1). On average, 173 CNVs and 8,814 SVs were identified per sample (Supplementary Data 2). Genomic variation profiles among the 201 donors confirmed that the cohort was exclusively of European genetically inferred ancestry (Supplementary fig. 3).

Transcriptomic data were generated by short-read bulk RNA sequencing of polyadenylated enriched RNA, using an Illumina NovaSeq6000. The median RNA integrity number (RIN) for samples selected for transcriptomic analysis was 7.9 (IQR = 7.5–8.1) for NSR samples (n = 183) and 6.9 (IQR = 6.5–7.5) for RPE samples (n = 176) (Supplementary figs. 1 and 4A and Supplementary Data 3). We obtained an average of 139 million uniquely mapped reads for NSR (IQR = 138–161 million) and 62.3 million for RPE (IQR = 59–94 million), representing 89.4% (IQR = 88.2–90.7%) and 82.9% (IQR = 78–86%) of all generated reads, respectively (Supplementary fig. 4). The median 3’/5’ bias, defined as the ratio of sequencing depth between the 150 bp region at the 3’ end and the 5’ end of the gene for genes with a length greater than 600 bp and at least 5 unambiguous reads, was 0.5 for NSR samples (IQR = 0.48–0.51) and 0.51 for RPE samples (IQR = 0.49–0.55) (Supplementary fig. 1B). Some level of expression (mean TPM > 0.1) was indicated for 28,512 genes across both tissues, with 18,891 and 13,214 genes expressed at moderate (TPM > 1) and high (TPM > 5) levels, respectively (Fig. 1a). 59% of expressed genes (mean TPM > 0.1) (n = 16,785) and 90% of highly expressed genes (mean TPM > 5) (n = 11,984) were protein coding, representing 84% and 60% of all GENCODE protein coding genes, respectively. Significantly higher expression variability was observed in the RPE compared to the NSR for genes expressed at low, moderate and high levels in samples with both NSR and RPE data, measured by the coefficient of variation (p-value < 2.2 × 10⁻¹⁶; Fig. 1b).

**Fig. 1: Tissue-specific transcriptomic data were generated for the neurosensory retina (NSR) and the retinal pigment epithelium (RPE).**

To ensure the validity of the transcriptomic datasets generated in this study, we assessed the biological relevance of expressed genes in NSR and RPE. Gene expression profiles were enriched for gene ontology (GO) terms indicative of the tissues of origin (Fig. 1c). Overall, 14,957 differentially expressed genes (mean expression > 1 TPM and adj. p-value < 0.05) were identified between the RPE and the NSR. Unsurprisingly, cell type deconvolution analyses, with reference to single-cell retinal datasets¹⁷, demonstrated a significantly higher representation of RPE cells in data generated from RPE samples compared to data generated from NSR samples (Supplementary fig. 5A). Moreover, genes with increased expression in the RPE (n = 7353) were enriched for 987 GO terms, which were grouped into 55 non-redundant clusters, including epithelial cell proliferation (GO:0050678), regulation of cell adhesion (GO:0030155) and positive regulation of immune response (GO:0050778) (Supplementary Data 4). Deconvolution of the NSR datasets supported the presence of at least 7 neuronal cell types at high levels (estimated proportion > 1%), with an average relative composition, per sample, of 29% rod photoreceptors (95%CI:26.4-30.7%), 28% retinal astrocytes (95% CI = 26.1–29.6%), 16% amacrine cells (95% CI = 15.7–17.1%), 10% horizontal cells (95% CI = 9.5–10.2%), 7% retinal ganglion cells (95% CI = 5.8–9%), 4% bipolar cells (95% CI = 3.8–4.4%), 2% Müller glia (95% CI = 1.7–3.1%) and ~4% other cell types (Supplementary fig. 5C). Genes with increased expression in the NSR, compared to the RPE (n = 7604), were enriched for 238 GO terms, grouped into 27 non-redundant clusters including synapse organisation (GO:0050808), neurotransmitter transport (GO:0006836) and cell morphogenesis involved in neuron differentiation (GO:0048667) (Supplementary Data 4).

METR eQTLs provide novel insights into non-coding variants that impact known eye disease-related genes

We performed cis-eQTL mapping to identify common genetic variants that are associated with gene expression in the NSR and the RPE. We found 1,424,946 significant (FDR < 0.05) cis-eQTL associations between 806,789 variants (eVariants) and 9959 genes (eGenes) in NSR (Supplementary fig. 6). Additionally, 465,045 eQTLs were identified between 303,773 eVariants and 3699 eGenes in the RPE (Supplementary fig. 6). The lower range of alternate internal allele frequencies for eVariants identified as part of eQTLs was 2.5% and included novel variants in comparison to gnomAD v4.1 (Supplementary fig. 7). 406,396 eQTLs were common to both the retina and the RPE, while 1,018,550 associations were NSR-specific and 58,649 were RPE-specific (Fig. 2a). Henceforth, we will refer to eQTLs identified in the NSR and/or the RPE as METR-eQTLs (n = 1,483,595), which included 10,471 unique eGenes (6772 NSR-specific, 512 RPE-specific and 3187 eGenes in both NSR and RPE).

**Fig. 2: Intersection between METR-eQTLs and eQTLs from different studies.**

We compared the top eQTLs identified for NSR (METR-NSR eQTLs) for each eGene identified in this study with two published retina-specific eQTL datasets, the EyeGEx project¹⁵ and Strunz et al.¹⁶, to identify: (1) eQTLs identically replicated in the NSR tissues; (2) METR-NSR eQTLs that impacted eGenes previously described but had alternative eVariants in high linkage disequilibrium (LD) with findings from EyeGEx or Strunz et al. and (3) previously unreported eQTLs for NSR, including newly identified eGenes (Fig. 2b and Supplementary fig. 8). Of note, our cohort excludes individuals with late-stage AMD, whereas EyeGEx includes late-stage AMD eyes. We report 6181 NSR eGenes which were previously described by at least one other study (62% of all NSR eGenes), of which 547 eGenes (9%) share identical top eVariants with at least one other study and 1882 (30%) have top eVariants in high LD with previously identified top eVariants (r² > 0.8) (Fig. 2b). We identified 343,527 novel eQTLs (24%) in eGenes that were previously described by at least one previous study and 386,741 novel eQTLs (27%) in 3,778 newly identified eGenes. Importantly, we replicate 13 eQTLs that have previously been reported to impact genes that are implicated in increased risk of AMD (Table 1).

Table 1 Replication of eQTLs that impact AMD risk genes and were identified as lead candidates by Ratnapriya et al.¹⁵ and Orozco et al.²⁶ in the METR-eQTL dataset

Full size table

Over 800 eGenes are newly identified in the NSR and RPE

To evaluate the tissue-specificity of our dataset, we compared the METR-eQTLs with non-eye-specific eQTLs from the GTEx project (Fig. 2c). We identified 337,424 METR-eQTLs (22.7%) and 916 eGenes (8.7%) that had not been previously identified by GTEx (Fig. 2c); 251,685 (74.6%) of these eQTLs have not been previously described as eQTLs in NSR or RPE previously. Of the novel eGenes, 479 (57.9%) encoded lncRNAs, and 5 had previously been associated with a known rare monogenic eye disease (HPS4, ACO2, CRX, CRYAA, PEX26). We evaluated the degree of similarity between METR-eQTLs and eQTLs from each GTEx tissue using the Intersection over Union (IoU) statistic, which accounts for the wide variation in the number of eQTLs from different tissues (Supplementary Figs. 9 and 10). The brain cortex had the highest level of eQTL similarity to our dataset (IoU = 0.28) and 5 of the top 10 most similar tissues were from the brain.

Genetic variants driving expression profile differences are enriched in candidate cis-regulatory elements (cCREs), with the highest enrichment in retina-specific cCREs

To understand whether eQTLs were enriched for putative regulatory regions, we compared locations of METR-eVariants to cell-type agnostic cis-candidate regulatory elements (cCRE) available through ENCODE (V3). METR-eVariants were enriched in cell-type agnostic promoters (p = 8.05 × 10⁻¹⁹) and proximal enhancers (p = 8.48 × 10⁻²⁶), compared to control variants matched for allele frequency and gene density. There was no enrichment of eVariants in distal enhancers, CTCF binding sites or DNase-H3K4me3 sites (Fig. 3a).

**Fig. 3: Enrichment of eye eVariants in previously annotated candidate cis-regulatory elements (cCREs).**

When stratified by cell-specific regulatory regions, bootstrapping analysis indicated a significant enrichment of METR-eVariants in NSR-specific (p-value = 4.52 × 10^–28) and RPE-specific cCREs (p-value = 8.74 × 10^–10)¹⁸ compared to control variants matched for allele frequency and gene density (number of gene TSSs within 1 Mb of variant) (Fig. 3b). Furthermore, we observe a significant enrichment of METR-eVariants in cell-type specific accessible chromatin regions across 8 different retina cell types¹⁹, with the greatest enrichment in rod cells (p-value = 6.69 × 10^–58) and cone cells (p-value = 6.58 × 10^–57) (Fig. 3c). Non-eye cCREs from adult tissues in EpiMap were also enriched for METR-eVariants, although the enrichment was lower than in the NSR and RPE. Despite the relative enrichment in annotated regulatory loci, most METR-eVariants (88.2%) do not overlap with any previously characterised cCREs (Fig. 3d).

Further, we assessed whether eVariants previously implicated in AMD risk intersected with cell-type agnostic or cell-specific regulatory regions, observing overlap for 48% (10/21) of unique eVariants with characterised cis-regulatory elements, although only 1 eVariant overlapped with regions previously shown to be active in the retina (Table 1).

Properties of METR-eQTLs differ between known monogenic disease genes and non-disease-related genes

To understand if there were trends that were specific to eQTLs associated with known monogenic eye disease genes, we compared findings from this study against the EyeG2P resource²⁰. We identified 230 METR-eGenes that were described as causes of rare monogenic disorders in EyeG2P (eye-disease genes) and compared trends identified in these genes against all other METR-eGenes (n = 10,241) (eye non-disease genes) (Fig. 4). We observed significantly lower expression variability across samples for eye disease eGenes compared to eye non-disease eGenes (p < 2.2 × 10^–16) (Fig. 4a). Eye disease eGenes were associated with significantly fewer eQTLs per gene (p = 7.8 × 10^–3) (Fig. 4b). Additionally, eQTLs associated with eye disease eGenes have a significantly lower impact on gene expression (p < 2.2 × 10^–16) (Fig. 4c) and significantly higher allele frequency (AF, gnomAD v4) compared to eye non-disease eVariants (p < 2.2 × 10^–16) (Fig. 4d). Genes that have been associated with rare monogenic eye disease have higher expression (mean TPM = 37.9) than non-disease genes (mean TPM = 20.3) and to control for this potential confounding factor, we adopted a bootstrapping approach (n = 1000 iterations) to randomly resample 100 eQTLs associated with eye-disease genes and 100 eQTLs associated with non-eye disease genes matched for gene expression level ( ± 5% TPM) (Supplementary fig. 11). The direction of trends remained similar after bootstrapping, with lower effect sizes and higher allele frequencies observed for eQTLs associated with eye disease genes than non-disease genes (Supplementary fig. 11). For both eye disease eGenes and eye non-disease eGenes, there is a negative correlation (p < 2.2 × 10^–16) between eVariant allele frequency and the impact of each eQTL on gene expression (Fig. 4E). These findings are consistent with the hypothesis that eVariants which are more common in the population have lower effect sizes on gene expression compared to rarer eVariants (min eVariant allele frequency = 2.5%) and are suggestive of a selective bias against rarer eVariants impacting known eye disease genes.

**Fig. 4: Differences in NSR eQTLs associated with known rare monogenic eye disease genes and genes which are not attributed to eye diseases (non-eye disease genes.**

Rare variants are plausible drivers of transcriptomic outliers in NSR and RPE

We utilised the DROP workflow²¹ to identify statistical outlier events within the METR transcriptome datasets, including expression, splicing and allelic imbalance outliers (Table 2). We identified 1,051 unique instances of a gene being aberrantly expressed in an METR sample (METR expression outlier events, METR-eOutlier events) (adjusted p < 0.05); 728 of these events were in the NSR, 443 in the RPE and 120 eOutlier events were found in both tissues. A median number of 3 genes per sample was considered a significant outlier event in the NSR (IQR = 1,4) and 1 in the RPE (IQR = 0,2). In total, we tested 3,209,821 gene-sample events in the NSR and 3,050,081 in the RPE, indicating a significant outlier rate of 0.023% and 0.015%, respectively. These observations are consistent with a recent study of the GTEx cohort, describing significant outlier rates of 0.026%²².

Table 2 Transcriptome outliers detected in the METR-cohort

Full size table

For each eOutlier event, we were able to harness paired genomic data to identify candidate rare variants potentially driving aberrant expression profiles. We leveraged a hierarchical framework and a probabilistic model to prioritise candidate rare genetic variants driving changes in expression. This identified 230 (23%) eOutlier events likely driven by protein-coding variants and 314 (31%) events with non-coding candidate variants (Supplementary Data 6).

Rare variants predicted to have a functional impact are identified for 50% of eOutlier events in NSR and RPE

First, we applied a hierarchical framework to identify rare SVs, CNVs and SNVs which were predicted to result in loss-of-function (pLoF, including frameshift, nonsense and start/stop site loss variants) or were expected to disrupt a nearby non-coding regulatory region (Supplementary fig. 12). Following this approach, we identified candidate functional variants driving 528 eOutlier events (50.2% of all eOutlier events identified in this study) (Fig. 5A and Supplementary Data 6). Of these, 131 eOutlier events were co-occurring with a SV or CNV impacting the coding-sequence of the outlier gene (77 NSR-only,12 RPE-only and 42 in both tissues) and 98 with a pLoF SNV (77 NSR-only, 6 RPE-only and 15 in both tissues) impacting the same gene (Fig. 5a). For those eOutlier events not explained by an SV, CNV or pLoF SNV disrupting the coding sequence, we identified genomic variants in 71 eOutlier events that were within 10Kb of the gene body and impacted an eye-specific cCRE¹⁸, including SV/CNVs (n = 2) and SNVs that are rare ( < 0.01 AF) or absent in gnomAD (n = 69). We also identified non-eye-specific cCREs from EpiMap²³ which were disrupted by SV/CNVs (n = 23) or rare SNVs (n = 121) within 10 Kb of the eOutlier gene. Examples of rare variants identified through this analysis strategy are included in Fig. 6 and Supplementary fig. 13.

**Fig. 5: Identification of rare variants driving METR transcriptomic outliers in neurosensory retina (NSR) and retinal pigment epithelium (RPE).**

**Fig. 6: Candidate structural and copy number variants driving METR transcriptomic outliers in neurosensory retina (NSR).**

A probabilistic model demonstrates high concordance for candidate SNVs driving expression outlier profiles in NSR and RPE

Next, we applied Watershed²⁴, a probabilistic model that was retrained on 6 tissue-specific outlier p-values from DROP. This was used in the METR transcriptome datasets to obtain posterior probabilities for SNVs and small indels that may be driving outlier expression profiles. Eye-specific cCREs were added as annotation features for Watershed (Supplementary Data 7) and identified 135 (13%) eOutlier events that were likely to be caused by nearby rare variants (posterior probability > 0.8), of which 110 (81%) were also predicted to be driven by the same variants by the hierarchical model and 11 were predicted to be driven by SV/CNVs that are not considered by Watershed (Fig. 5b). We used bootstrapping analysis to compare the annotations associated with these variants to other rare variants which overlapped with eOutlier genes but were not predicted by Watershed to have a functional impact, observing an enrichment of canonical splice variants, frameshift variants, stop gain variants and variants predicted to disrupt splicing (Fig. 5c). In support of other analyses described in this study, there was an enrichment of rare variants which overlap with retina cCREs and a slight enrichment of rare variants which overlap with epigenomic marks associated with non-eye specific regulatory elements from ENCODE. In total, there were 34 eOutlier events where the functional variants prioritised by Watershed intersected with a known candidate cis-regulatory region (cCRE); 28 of these cCREs were active in the eye.

Functional assays confirm the impact of rare variants prioritised as drivers of eOutlier expression

A dual reporter luciferase assay was performed in Human K562 cells to investigate the impact of a CAND2 heterozygous variant that was prioritised as a driver of drastically reduced expressed in NSR (fold change = 0.6, Z-score = –5.6, p-adj = 0.004) and RPE (fold change = 0.5, Z-score =–4.7, p-adj = 0.049) due to its overlap with features indicative of the CAND2 promoter region (NM_001162499.2:c.-41A > G; Fig. 7). CAND2 has recently been implicated in AMD risk through GWAS meta-analysis in European ancestry populations⁹ and has an emerging role in the targeted degradation of proteins that is distinct from its CAND1 homolog²⁵. Our dual reporter luciferase assay reveals a significant reduction in CAND2 promoter activity in the presence of c.-41A > G (adj p = 0.005; Fig. 7), confirming the disruption of CAND2 activity in NSR and RPE and provides a proof-of-principle for our applied prioritisation methods for genetic drivers of expression outliers.

**Fig. 7: Candidate rare variant in promoter of CAND2 driving METR transcriptomic outliers in neurosensory retina (NSR) and retinal pigment epithelium (RPE).**

Discussion

We present a resource to interrogate the impact of both common and rare genomic variation on gene regulation in the human NSR and RPE. We characterised novel eQTL associations that are tissue-specific (Fig. 2) and are enriched to known promoters and proximal enhancers (Fig. 3). We show that eQTLs impacting genes known as a cause of rare genetic eye disease have different properties when compared to those genes which are not known as a cause of eye disease (Fig. 4). We also identify candidate non-coding rare variants, SVs and CNVs which impact cCREs and represent plausible drivers of outlier expression profiles in human NSR and RPE (Fig. 5), including functional validation of a prioritised non-coding genetic variant impacting CAND2 (Fig. 7). The METR resource can be used alongside other multi-omic datasets to facilitate discovery of novel eye-specific regulatory elements, including those implicated in common (e.g. AMD) and rare (e.g. IRDs) genetic disorders impacting the retina.

The cohort of 201 human donors described in this study represents the first dataset, to our knowledge, to pair whole genome sequencing with high-depth RNA sequencing data from the NSR and RPE. Previous studies have developed RNA sequencing from the NSR alongside genotyping arrays^15,16,26 and this has enabled the characterisation of eQTLs in the retina including preliminary data supporting the role of a limited number of eQTLs in AMD (Table 1). We performed extensive QC for both DNA and RNA sequencing data to confirm the validity of the datasets generated in this study, in particular due to the prolonged median ischemic time compared to GTEx samples, which may impact the quality of the data obtained²⁷. These analyses confirmed suitable RNA integrity values across the cohort (Supplementary fig. 1), high unique mapping rates with appropriate read lengths and appropriate 3’/5’ biases from RNAseq data (Supplementary figs. 1 and 4), along with representative gene expression profiles (Fig. 1c and Supplementary Data 4). The high-depth and high-quality RNA and whole genome sequencing datasets developed for this study are from a cohort of individuals without clear signs of late-stage AMD and have enabled biological insights beyond those described previously.

Firstly, we were able to assess whether previously characterised eQTLs are replicated utilising alternative methods and technologies in an independent cohort of individuals of European genetic ancestry without late-stage AMD (Supplementary fig. 3). As gene expression profiles have been shown to be significantly disrupted during AMD pathogenesis^28,29, it is important to identify eQTL signals that are amplified or disrupted by broader changes in transcriptome profiles associated with AMD, as well as those that remain consistent within a cohort of individuals without clear signs of late-stage AMD. Overall, we show high levels of replication of eQTL findings from Ratnapriya et al., with 5993 identical eGenes and replication of 13 eQTLs previously implicated with a role in AMD (Table 1), including PILRB, which has recently been shown to lead to photoreceptor dysfunction in mice when function is impaired³⁰. Notably, we identified 5 novel QTLs for genes previously implicated in AMD (ACAD10, HTRA1, B3GLCT, PLA2G12A, BAIAP2L2) and 4 genes implicated in AMD without replication of a previously characterised eQTL (CFI, COL4A3, RDH5, TNFRSF10A). This suggests that differences in the approaches undertaken and/or cohort composition, e.g. AMD status, cohort size and/or genetic ancestry, impact the influence of genomic drivers on the expression of these genes.

Second, variants which are rare in the population or unique to individuals have been demonstrated to drive drastic changes in expression profiles, so-called ‘expression outliers’, across different tissues^24,31. The use of complete genomic sequencing in this cohort, achieving a median coverage of 36x, has enabled the characterisation of a greater diversity of genomic variation than has previously been studied in the context of expression drivers in the NSR and RPE and identified thousands of new regions which can be interrogated for rare variation within disease cohorts³². Using two distinct variant prioritisation approaches, we describe rare variants in the general population, including SVs, CNVs and small variants that are the most likely drivers of expression outliers in these tissues (Fig. 5). Through functional validation of a prioritised non-coding variant in the CAND2 promoter region, we establish a proof-of-principle for the applied variant prioritisation approaches (Fig. 7) and provide mechanistic insight into non-coding regions regulating the expression level of AMD-risk associated genes⁹. These data encourage further functional follow-up for the 578 prioritised variants that may be causative of pronounced changes in expression profiles in the human retina, including 272 rare variants predicted to cause loss-of-function and 299 that intersect with non-coding regions (including examples presented in Fig. 6 and Supplementary fig. 13). Other recent studies have identified outlier-associated non-coding rare variants that contribute to common disease predisposition³³ and underpin rare genetic disorders^34,35. Moreover, non-coding variation has been identified as a cause of genetic ophthalmic disorders in untranslated regions³⁶, retina-specific exons³⁷, promoters³⁸, distal enhancers³⁹ and non-coding genes⁴⁰ expressed in the NSR and RPE. With the increasing availability of genomic sequencing datasets for the diagnosis and discovery of genetic disorders^41,42, including ophthalmic conditions⁴³, the data developed in this study is timely and provides an opportunity, alongside other complementary datasets⁴⁴, to identify new pathogenic mechanisms underpinning genetic disorders.

Third, we have generated high-coverage RNAseq datasets achieving, on average, 139 million uniquely mapping reads for NSR and 62 million uniquely mapping reads for RPE. Previous studies have developed lower coverage RNAseq datasets for NSR, for example, EyeGEx¹⁵, Orozco et al. ²⁶ and Pinelli et al.⁴⁵ generated 33, 30 and 72 million sequencing reads per sample, respectively. Previous studies have remarked on the level of transcript diversity in NSR⁴⁶ and highlighted the advantage of high-depth RNAseq in this context. In comparison to EyeGEx, our high coverage approach elevates the number of observable protein-coding genes by 23% (from 13,662 to 16,765) and newly identifies 3,663 eGenes. For 4,481 eGenes that are not replicated from EyeGEx, further study, for example harmonisation of genomic and RNA sequencing dataset processing and meta-analyses would assist in understanding whether their detection is influenced by cohort composition, methodologies undertaken and/or sample size. The increased number of eGenes from this study enables observation of patterns in gene expression at increased resolution and has granted insight into the trends associated with genes previously implicated in genetic disorders impacting vision. Overall, we observed that eGenes that have been characterised as a cause of rare genetic eye disease²⁰ have lower expression variability across individuals than non-disease genes (Fig. 4), suggesting that regulation of these genes is more tightly controlled in NSR and RPE. The role of eQTLs in genetic disorders remains incompletely understood. For example, whilst some studies have shown eQTLs contribute to onset, penetrance and expressivity^47,48, including genetic disorders impacting the eye¹³, others have found limited evidence for their role in neuronal genetic disorders^49,50. Here, we observe that eQTL variants which were associated with changes in expression of eye-disease genes had significantly lower effect sizes and their allele frequency was higher than eQTLs impacting genes that have not previously been implicated in eye disease (Fig. 4 and Supplementary fig. 11). Intuitively, the absence of rarer and higher impact eVariants amongst a population of individuals without signs of genetic eye disease suggests constraint on genomic variation with these properties, although population-scale modelling and statistical analysis is required to formally test this hypothesis.

Finally, as our cohort includes 158 individuals with RNA extracted and sequenced from both NSR and RPE datasets, this enables further insights into the expression patterns and regulatory architecture of these tissues, unbiased by sample preparation methods and/or differences between individuals, e.g. genomic background. It should be noted that our cohort is biased towards male individuals (64%) and this may have a hidden bias on eQTLs and transcriptome differences identified. However, our data newly identifies 916 eGenes in NSR and RPE compared to those characterised in other tissues¹⁰ and we observe a high level of overlap in eGenes between NSR and RPE, including 86% of RPE eGenes and 32% of NSR eGenes. These data further support the high level of overlap previously observed for active enhancers and promoters between RPE/choroid and NSR¹⁸.

Whilst the findings of this study have enhanced our understanding of genomic regulation in human NSR and RPE, other approaches that utilise single-cell^{26,51,52,53,54} single-nuclei^17,55 and spatial^56,57 transcriptomic approaches enable increased precision to understand gene expression in specialised retinal layers and cell types. These approaches are particularly advantageous for the NSR, which is a highly heterogeneous tissue comprised of several specialised layers and neuronal cell types, including photoreceptors, bipolar cells, amacrine cells and horizontal cells⁵⁸ and where transcriptome profiles may differ substantially between the central and peripheral retina⁵⁹ To overcome potential shortcomings of the bulk RNAseq approach adopted in this study, we performed deconvolution analyses to estimate the relative sample composition against single-nuclei RNA-sequencing¹⁷. Given the complexity associated with retinal tissue dissection and storage^60,61, the deconvolution approach also enabled confirmation of tissue sample integrity alongside differential expression profiles (Fig. 1c). Bulk RNAseq from NSR had representation, as expected, from diverse cell types with significant enrichment towards rod photoreceptors and retinal astrocytes, representing >50% of the estimated cellular make-up of most samples. In keeping with current understanding of retinal ageing⁶² there is observed a significant loss of rod photoreceptors with age (Supplementary Fig. 5B). However, deconvolution is naturally limited by the relative differential transcriptional activity between cell types and is complicated by cell types with similar transcription profiles, for example, between Müller glia and retinal astrocytes⁵³. We expect that the high number of astrocytes predicted in RNA samples is influenced by similar transcription profiles to other cell types and whilst we confirm that we have generated high-quality RNA sequencing datasets (Supplementary figs. 1 and 4 and Fig. 1), these estimates may also be influenced by altered transcriptome profiles in samples with longer ischemic times²⁷ and/or the high sequencing depth coverage generated. Moreover, the retina is known to have cyclic patterns of gene expression, related to both circadian rhythm and natural function, i.e. response to light⁶³ and as such, there is an incomplete molecular understanding of all cell types present in the human retina¹⁷. Overall, these data support the integrity of the RNAseq dataset developed in this study and whilst confident quantification of the cell types present is not possible, our analyses confirm that the datasets are representative of major cell types in the retina.

Taken together, the data presented in this study provide new insights into the genomic control of gene regulation in the human retina. We build upon previous understanding through replication of eQTLs in a cohort of individuals without clear signs of late-stage AMD, characterise hundreds of new genes under genomic regulation and provide insights into the role of rare variants, SVs and CNVs in the disruption of gene expression in these specialised tissues that enable vision. Future studies utilising this resource, including meta-analysis with other published datasets, co-localisation and transcriptome-wide association studies incorporating findings from genome-wide association studies, will continue to develop understanding of the expression profiles and the role of non-coding genetic variation in the onset and presentation of genetic disorders impacting vision.

Methods

Ethics approval and material transfer

All research and approaches undertaken in this manuscript were approved by the North West—Greater Manchester Central Research Ethics Committee and NHS Health Research Authority (15/NW/0932). Methodological approaches were approved and undertaken at The University of Manchester. The Manchester Eye Tissue Repository is a non-profit tissue bank, and no compensation was provided for the receipt or delivery of tissue samples. A material transfer agreement was agreed upon between the research team and the Manchester Eye Tissue Repository. Any surplus tissue, RNA and/or DNA after sample preparation and sequencing remained with the research team and will be destroyed or returned to the tissue bank within 3 years of the conclusion of the study. Other researchers who wish to access surplus material can submit independent requests to the Manchester Eye Tissue Repository.

Gene expression quantification in neurosensory retina and retinal pigment epithelium from RNA-Seq data

Paired-end short-read sequencing of polyA-enriched mRNA (RNAseq) was performed on an Illumina NovaSeq 6000 instrument for two layers of the retina: (1) the entire neurosensory retina (NSR), including macula and peripheral regions and (2) pelleted cells from the retinal pigment epithelium (RPE), which were scraped from Bruch’s membrane. Donor eye tissues were obtained from the Manchester Eye Tissue Repository, an ethically approved Research Tissue Bank (UK NHS Health Research Authority, 15/NW/0932). Eye tissue was acquired after the corneas had been removed for transplantation and explicit written informed consent had been obtained from donors or their next of kin to use the remaining tissue for research. Samples were selected for RNAseq with reference to RNA concentration (ng/μl) and integrity (RNA Integrity Number—RIN) values, calculated with the Agilent TapeStation system. The QC process was performed without knowledge of sample sex/gender. (see Supplementary Methods 1.1 and 1.2 for additional details on tissue extraction and the RNA sequencing protocol).

The Genotype-Tissue Expression (GTEx) analysis pipeline⁶⁴ was applied to RNAseq datasets to assess quality and to perform alignment and expression quantification. Alignment was performed against the GRCh38 human reference genome using STAR v2.7.4a⁶⁵. Duplicate reads were marked with Picard v2.27.1⁶⁶. Gene-level expression quantification, using the GENCODE v38 annotation⁶⁷ was carried out using RNA-SeQC 1.1.9⁶⁸, for gene-level read counts and RSEM v1.3.0⁶⁹, for gene-level quantifications in transcripts per million (TPM). Quality assessments of processed RNAseq datasets included reference to the total number of reads, number of uniquely mapped reads, number of splice junctions, number of chimeric reads, read length and 3’/5’ bias for all NSR and RPE samples. To ensure concordance between paired WGS and RNAseq samples, we excluded WGS-RNAseq pairs where the predicted relatedness, calculated using Somalier⁷⁰, was <0.8.

Whole genome sequencing data

Short-read paired-end whole genome sequencing (WGS) was generated for each donor on an Illumina NovaSeq6000 Instrument using DNA extracted from iris biopsies (see Supplementary Methods Section 2 for additional details). Genome alignment and variant calling were carried out using Illumina DRAGEN 4.0.3 software with Machine Learning and Graph Map Enabled. Aggregate variant detection and harmonisation were carried out using Illumina DRAGEN 4.0.3 software Population Mode. We applied quality control filters to the aggregate VCF to remove low-quality variant calls using a combination of bcftools (v.1.16) and PLINK (v.2.0) (see Supplementary Methods 2.3). For eQTL mapping, aggregate genotypes were binarized using PLINK 2.0.

Cell type deconvolution of bulk RNA-seq data

We used BayesPrism (Bayesian cell Proportion Reconstruction Inferred using Statistical Marginalization)⁷¹ to run a deconvolution model to estimate the proportion of retinal cell types in the generated bulk RNA-seq data in NSR and RPE. The reference dataset to train the model was a single-cell RNAseq dataset from the ocular posterior segment¹⁷ (See Supplementary Methods 1.5 for additional details).

Differential expression analysis between NSR and RPE

To ensure the validity of the transcriptomic datasets generated in this study, we assessed the biological relevance of expressed genes in NSR and RPE. We used the R package deseq2⁷² to identify genes that were differentially expressed between NSR and RPE. We included age and sex as covariates in the deseq2 model with the false discovery rate threshold set at 0.05. To confirm deseq2 results, we replicated the differential expression analysis using edgeR⁷³.

To identify which gene ontology biological pathways were enriched in the upregulated genes in NSR/RPE, we carried out gene set enrichment analysis (GSEA) of the genes that were differentially expressed between both tissues (FDR < 0.05), using WebGestalt⁷⁴. We processed the GSEA output with a clustering algorithm, rrvgo⁷⁵, to group similar GO terms together and selected representative terms. (See Supplementary Methods 1.6 for additional details).

Input Data for cis-eQTL analysis

For eQTL analysis, we generated a normalised expression matrix for each tissue. Genes that did not meet expression thresholds of >0.1 TPM in at least 20% of samples and ≥6 reads in at least 20% of samples were removed from eQTL analysis. Expression values were normalised using the trimmed mean of M-values normalisation (TMM) method⁷⁶ and using an inverse normal transform.

To account for known and unknown biological and experimental confounding factors, a set of 30 covariates was generated for each RNA-Seq sample using the Probabilistic Estimation of Expression Residuals (PEER) method⁷⁷ applied to normalised gene expression levels.

Principal component analysis with EIGENSOFT 6.0.1⁷⁸ was carried out to capture ancestral variation within the cohort. The top five principal components for each participant were used as covariates in the eQTL analysis.

Cis-eQTL mapping with tensorQTL

TensorQTL⁷⁹ was used to identify genetic variants that were significantly associated with the expression of nearby genes (up to 1 Mb away) in NSR and RPE (FDR < 0.05). The required input files were the normalised gene expression matrix, the binary and filtered genotype data and a covariates table which included the following information for each participant: sex, WGS batch, five top principal components and 30 PEER factors. To quantify the eQTL effect size, we estimated the log2 allelic fold change (aFC), following the method established by Mohammadi et al.⁸⁰ (see Supplementary Methods section 4 for additional details).

Comparison with other eQTL studies

We compared all METR eQTLs with retina eQTLs mapped by EyeGEx¹⁵ and Strunz et al.¹⁶. We identified genes had been associated with eQTLs in our study and in EyeGEx and/or Strunz et al. (eGenes shared between studies). For these shared eGenes, we extracted the top eQTLs identified by EyeGEx and/or Strunz et al. and checked if they were replicated in our cohort or if they were in high LD (r2 > 0.8) with a METR-NSR eQTL. Pairwise LD scores were calculated using LDlinkR⁸¹.

To compare to non-retina tissues, all significant eQTL associations were downloaded from the GTEx Open Access portal (v8) for each available tissue (https://www.gtexportal.org/home/downloads/adult-gtex/qtl). We calculated the intersection between the number of METR-eQTLs and eGenes which were also shared by each GTEx tissue using the Intersection over Union (IoU) statistic. The IoU calculates the ratio of the number of eQTLs/eGenes present in both sets over the total number of eQTLs/eGenes in one set and/or the other.

Annotation of eVariants and bootstrapping analysis to calculate enrichment of eQTLs in characterised regulatory loci

All NSR and RPE eQTL variants were annotated with the Ensembl Variant Effect Predictor⁸². We assessed overlap and annotated all eVariants with a set of tissue-specific and cell-type-specific annotations of candidate cis-regulatory elements (cCREs) from a variety of sources (Supplementary Methods Table 1). These included characterised regulatory loci from retina, RPE and macula¹⁸, cell-type specific regions of open chromatin detected by scATACseq from retina samples¹⁹, non-eye specific cCREs from adult tissues in EpiMap²³ and cell-type agnostic candidate cis-regulatory element (cCRE) annotations from ENCODE⁸³.

To calculate the relative enrichment of eVariants that overlapped with each type of regulatory element, we used bootstrapping analysis. We carried out 1000 iterations by subsampling 100,000 random eVariants with replacement and 100,000 control variants from our cohort that were included as input for the eQTL mapping and did not meet the eQTL significance threshold (FDR < 0.05), matched for gene density and allele frequency. We compared the ratio of eVariants to control variants that intersected with each type of regulatory element. (see Supplementary Methods section 5 for additional details).

Analysis of the properties of eQTLs that impact known eye disease-related genes

To understand if there were trends that were specific to eQTLs associated with known monogenic eye disease genes, we utilised the EyeG2P resource²⁰. All other METR-eGenes were considered non-eye disease genes. We compared the eQTL/eGene properties between eye-disease and non-eye-disease gene eQTLs, including eVariant allele frequency and effect size, measured using log2 allele fold change. (See Supplementary Methods section 6 for additional details.

Identification of transcriptome outliers using the DROP pipeline

We utilised the DROP v.1.4.0 pipeline²¹ to identify transcriptome outliers from NSR and RPE, using standard parameters.

Hierarchical workflow to identify candidate variants driving outlier expression

We developed a hierarchical workflow to identify candidate variants driving outlier expression (eOutliers) using snakemake version 7.32 (Supplementary Fig. 14). Briefly, the workflow would first search for a pLoF variant from the eOutlier sample in the corresponding eOutlier gene, which could be an exonic structural variant, or a SNV with a high impact consequence based on Ensembl’s Variant Effect Predictor (v.112.0). If no pLoF variant could be identified, the workflow would then search for regulatory variants that were within 10Kb of the eOutlier gene body. Regulatory variants were defined as structural variants and rare SNVs (gnomAD AF<0.01) that overlapped with nearby retina cCREs or non-retina-specific cCREs from different adult tissues in EpiMap. If no regulatory variant was identified, the model would identify any other non-coding structural variant within 10Kb of the eOutlier gene body, before returning a negative search result (see Supplementary Methods section 8 for additional details).

Implementation of the watershed

For all genes with an eOutlier in the NSR, we extracted all rare variants (gnomAD allele frequency <1%) that intersected with the gene body ± 10Kb. Variants were extracted for all samples with NSR RNAseq data from the post-QC aggregate VCF. We annotated all rare variants with selected annotations from VEP⁸² and CADD⁸⁴ (Supplementary Data 6) and intersected them with known retina-specific cCREs from Cherry et al.¹⁸ and non-retina-specific cCREs from EpiMap. Missing annotations were replaced with default imputation values obtained from CADD (Supplementary Data 6). The Watershed model²⁴ was run using the predict_watershed.R script with an adjusted p-val threshold of 0.05 and the number of dimensions set to 6. (See Supplementary Methods section 9 for additional details).

Dual reporter luciferase assay

A 294 bp fragment of the wild-type promoter region from CAND2 was PCR-amplified from control genomic DNA using Phusion High-Fidelity DNA Polymerase (Promega). To introduce variants, two overlapping fragments were amplified using a combination of mutagenic primers. Variants constructed were the variant of interest, NM_001162499.2:c.-41A > G and a variant that is common in the general population and not expected to impact CAND2 expression, NM_001162499.2:c.-36C > T.

The wild-type and variant fragments were assembled into NheI-NcoI digested pGL4.10[luc2] firefly luciferase plasmid using the Gibson method. The assemblies were transformed into competent E. coli grown overnight on LB agar containing carbenicillin. Candidate colonies were picked for culture and plasmid isolation. The plasmid constructs were verified by Sanger sequencing. Human K562 cells were transiently transfected with 500 ng of plasmid using Lipofectamine LTX (Invitrogen) following the manufacturer’s standard protocol. An empty pGL4.10[luc2] plasmid was transfected as a control for background activity. The Renilla luciferase pGL4.74[hRluc/TK] vector (Promega) was co-transfected as an internal luminescence control. Following 20–24 hr incubation at 37 °C with 5% CO₂, a dual luciferase assay was conducted using the Dual-Glo® Luciferase Assay (Promega).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All raw RNA sequencing and genomic sequencing datasets generated in this study are available under controlled access through the European Genome-phenome Archive (EGA; Study ID: EGAS50000001443; Dataset: EGAD50000002082). Processed datasets, including eQTL results, eOutlier statistics and aggregated genomic variant files, are available under controlled access through the EGA (Study ID: EGAS50000001443; Dataset: EGAD50000002082). Controlled access to these datasets is a condition of access to tissue samples from the Manchester Eye Tissue Repository to ensure traceability of data access and usage, as per conditions of ethical approval for the biobank (15/NW/0932). Applications for access to the raw and processed datasets can be made through the EGA and will receive a response by the EGAC50000000807 EGA Data Access Committee within 4 weeks of the data access request. The full data access policy and terms of data usage are available through the EGA. The Genotype-Tissue Expression (GTEx) Project¹⁰ data used in this study are available from the GTEX public portal at https://www.gtexportal.org/home/. The ENCODE Project Candidate cis-Regulatory Element Registry V3⁸³ data used in this study are available from the SCREEN portal at https://screen.encodeproject.org/. The Epigenome Integration across Multiple Annotation Projects²³ (EpiMap) data used in this study are available from the EpiMap Repository at https://compbio.mit.edu/epimap/. The retina-specific epigenomic tracks generated by Cherry et al.¹⁸ used in this study are available as custom tracks on the UCSC browser and were accessed from https://tinyurl.com/CherryLab-EyeBrowser. The single-cell RNAseq data from the ocular posterior segment generated by Monavarfeshani et al.¹⁷ used in this study are available at the Broad Institute Single Cell Portal, under study number SCP2310, accessible from https://singlecell.broadinstitute.org/single_cell/. The retina scATACseq peaks generated by Wang et al.¹⁹ used in this study are available in the GEO database under accession code GSE196235. The EyeGex¹⁵ used in this study is available under controlled access; access was obtained by contacting the corresponding author of the study. The retina and RPE-specific eQTLs generated by Orozco et al.²⁶ used in this study were accessed from https://eye-eqtl.com/ in April 2024. The retina-specific eQTLs generated by Strunz et al.¹⁶ used in this study are publicly available and were accessed from http://www-huge.uni-regensburg.de/ in October 2025. The 1000 Genomes Project V3 data used in this study are publicly available from the 1000 Genomes Project Public Portal at https://www.internationalgenome.org/data/. The Genome Aggregation Database (gnomAD v4) data used in this study are publicly available from the gnomAD Public Portal at https://gnomad.broadinstitute.org/data. All other data supporting the findings of this study are available in the article and its Supplementary Information files.

References

Wright, A. F., Chakarova, C. F., Abd El-Aziz, M. M. & Bhattacharya, S. S. Photoreceptor degeneration: genetic and mechanistic dissection of a complex trait. Nat. Rev. Genet. 11, 273–284 (2010).
Article CAS PubMed Google Scholar
Hoon, M., Okawa, H., Della Santina, L. & Wong, R. O. L. Functional architecture of the retina: development and disease. Prog. Retin. Eye Res. 42, 44–84 (2014).
Article CAS PubMed Central PubMed Google Scholar
Strauss, O. The retinal pigment epithelium in visual function. Physiol. Rev. 85, 845–881 (2005).
Article CAS PubMed Google Scholar
Hanany, M., Rivolta, C. & Sharon, D. Worldwide carrier frequency and genetic prevalence of autosomal recessive inherited retinal diseases. Proc. Natl. Acad. Sci. USA 117, 2710–2716 (2020).
Article ADS CAS PubMed Central PubMed Google Scholar
Fleckenstein, M. et al. Age-related macular degeneration. Nat. Rev. Dis. Primers 7, 31 (2021).
Article PubMed Central PubMed Google Scholar
Wong, W. L. et al. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: a systematic review and meta-analysis. Lancet Glob. Health 2, e106–e116 (2014).
Article PubMed Google Scholar
Seddon, J. M., Cote, J., Page, W. F., Aggen, S. H. & Neale, M. C. The US twin study of age-related macular degeneration: relative roles of genetic and environmental influences. Arch. Ophthalmol. Chic. 123, 321–327 (2005).
Article Google Scholar
Fritsche, L. G. et al. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat. Genet. 48, 134–143 (2016).
Article CAS PubMed Google Scholar
Gorman, B. R. et al. Genome-wide association analyses identify distinct genetic architectures for age-related macular degeneration across ancestries. Nat. Genet. 56, 2659–2671 (2024).
Article CAS PubMed Central PubMed Google Scholar
THE GTEX CONSORTIUM The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article Google Scholar
Barbeira, A. N. et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 22, 49 (2021).
Article PubMed Central PubMed Google Scholar
Hamel, A. R. et al. Integrating genetic regulation and single-cell expression with GWAS prioritizes causal genes and cell types for glaucoma. Nat. Commun. 15, 396 (2024).
Article ADS CAS PubMed Central PubMed Google Scholar
Michaud, V. et al. The contribution of common regulatory and protein-coding TYR variants to the genetic architecture of albinism. Nat. Commun. 13, 3939 (2022).
Article ADS CAS PubMed Central PubMed Google Scholar
Davenport, E. E. et al. Discovering in vivo cytokine-eQTL interactions from a lupus clinical trial. Genome Biol. 19, 168 (2018).
Article PubMed Central PubMed Google Scholar
Ratnapriya, R. et al. Retinal transcriptome and eQTL analyses identify genes associated with age-related macular degeneration. Nat. Genet. 51, 606–610 (2019).
Article CAS PubMed Central PubMed Google Scholar
Strunz, T. et al. A mega-analysis of expression quantitative trait loci in retinal tissue. PLoS Genet 16, e1008934 (2020).
Article CAS PubMed Central PubMed Google Scholar
Monavarfeshani, A. et al. Transcriptomic analysis of the ocular posterior segment completes a cell atlas of the human eye. Proc. Natl. Acad. Sci. USA 120, e2306153120 (2023).
Article CAS PubMed Central PubMed Google Scholar
Cherry, T. J. et al. Mapping the cis-regulatory architecture of the human retina reveals noncoding genetic variation in disease. Proc. Natl. Acad. Sci. USA 117, 9001–9012 (2020).
Article ADS CAS PubMed Central PubMed Google Scholar
Wang, S. K. et al. Single-cell multiome of the human retina and deep learning nominate causal variants in complex eye diseases. Cell Genom. 2, 100164 (2022).
Article CAS PubMed Google Scholar
Lenassi, E. et al. EyeG2P: an automated variant filtering approach improves efficiency of diagnostic genomic testing for inherited ophthalmic disorders. J. Med. Genet. 60, 810–818 (2023).
Article CAS PubMed Central PubMed Google Scholar
Yépez, V. A. et al. Detection of aberrant gene expression events in RNA sequencing data. Nat. Protoc. 16, 1276–1296 (2021).
Article PubMed Google Scholar
Hölzlwimmer, F. R. et al. Aberrant gene expression prediction across human tissues. Nat. Commun. 16, 3061 (2025).
Article ADS PubMed Central PubMed Google Scholar
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).
Article ADS CAS PubMed Central PubMed Google Scholar
Ferraro, N. M. et al. Transcriptomic signatures across human tissues identify functional rare genetic variation. Science 369, eaaz5900 (2020).
Article PubMed Central PubMed Google Scholar
Wang, K. et al. Molecular mechanisms of CAND2 in regulating SCF ubiquitin ligases. Nat. Commun. 16, 1998 (2025).
Article ADS CAS PubMed Central PubMed Google Scholar
Orozco, L. D. et al. Integration of eQTL and a single-cell atlas in the human eye identifies causal genes for age-related macular degeneration. Cell Rep. 30, 1246–1259.e6 (2020).
Article CAS PubMed Google Scholar
Ferreira, P. G. et al. The effects of death and post-mortem cold ischemia on human tissue transcriptomes. Nat. Commun. 9, 490 (2018).
Article ADS PubMed Central PubMed Google Scholar
Voigt, A. P. et al. Choroidal endothelial and macrophage gene expression in atrophic and neovascular macular degeneration. Hum. Mol. Genet. 31, 2406–2423 (2022).
Article CAS PubMed Central PubMed Google Scholar
Orozco, L. D. et al. A systems biology approach uncovers novel disease mechanisms in age-related macular degeneration. Cell Genom. 3, 100302 (2023).
Article CAS PubMed Central PubMed Google Scholar
Dey, P. N. et al. Loss of paired immunoglobulin-like type 2 receptor B gene associated with age-related macular degeneration impairs photoreceptor function in mouse retina. Hum. Mol. Genet. 34, 64–76 (2025).
Article CAS PubMed Central PubMed Google Scholar
Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).
Article ADS PubMed Central PubMed Google Scholar
Ellingford, J. M. et al. Recommendations for clinical interpretation of variants found in non-coding regions of the genome. Genome Med. 14, 73 (2022).
Article CAS PubMed Central PubMed Google Scholar
Smail, C. et al. Integration of rare expression outlier-associated variants improves polygenic risk prediction. Am. J. Hum. Genet. 109, 1055–1064 (2022).
Article CAS PubMed Central PubMed Google Scholar
Wakeling, M. N. et al. Non-coding variants disrupting a tissue-specific regulatory element in HK1 cause congenital hyperinsulinism. Nat. Genet. 54, 1615–1620 (2022).
Article CAS PubMed Central PubMed Google Scholar
Tenney, A. P. et al. Noncoding variants alter GATA2 expression in rhombomere 4 motor neurons and cause dominant hereditary congenital facial paresis. Nat. Genet. 55, 1149–1163 (2023).
Article CAS PubMed Central PubMed Google Scholar
Dueñas Rey, A. et al. Combining a prioritization strategy and functional studies nominates 5’UTR variants underlying inherited retinal disease. Genome Med. 16, 7 (2024).
Article PubMed Central PubMed Google Scholar
Vig, A. et al. DYNC2H1 hypomorphic or retina-predominant variants cause nonsyndromic retinal degeneration. Genet. Med. J. Am. Coll. Med. Genet. 22, 2041–2051 (2020).
CAS Google Scholar
Daich Varela, M. et al. Multidisciplinary team-directed analysis of whole genome sequencing reveals pathogenic non-coding variants in molecularly undiagnosed inherited retinal dystrophies. Hum. Mol. Genet. 32, 595–607 (2023).
Article PubMed Central PubMed Google Scholar
Small, K. W. et al. North carolina macular dystrophy is caused by dysregulation of the retinal transcription factor PRDM13. Ophthalmology 123, 9–18 (2016).
Article PubMed Google Scholar
Quinodoz, M. et al. De novo and inherited dominant variants in U4 and U6 snRNAs cause retinitis pigmentosa. MedRxiv Prepr. Serv. Health Sci. 2025.01.06.24317169 https://doi.org/10.1101/2025.01.06.24317169. (2025).
Turnbull, C. et al. The 100 000 Genomes Project: bringing whole genome sequencing to the NHS. BMJ 361, k1687 (2018).
Article PubMed Google Scholar
The 100,000 Genomes Project Pilot Investigators 100,000 Genomes pilot on rare-disease diagnosis in health care — preliminary report. N. Engl. J. Med. 385, 1868–1880 (2021).
Article Google Scholar
Ellingford, J. M. et al. Whole genome sequencing increases molecular diagnostic yield compared with current diagnostic testing for inherited retinal disease. Ophthalmology 123, 1143–1150 (2016).
Article PubMed Central PubMed Google Scholar
D’haene, E. et al. Comparative 3D genome analysis between neural retina and retinal pigment epithelium reveals differential cis-regulatory interactions at retinal disease loci. Genome Biol. 25, 123 (2024).
Article PubMed Central PubMed Google Scholar
Pinelli, M. et al. An atlas of gene expression and gene co-regulation in the human retina. Nucleic Acids Res. 44, 5773–5784 (2016).
Article CAS PubMed Central PubMed Google Scholar
Farkas, M. H. et al. Transcriptome analyses of the human retina identify unprecedented transcript diversity and 3.5 Mb of novel transcribed sequence via significant alternative splicing and novel genes. BMC Genomics 14, 486 (2013).
Article PubMed Central PubMed Google Scholar
Castel, S. E. et al. Modified penetrance of coding variants by cis-regulatory variation contributes to disease risk. Nat. Genet. 50, 1327–1334 (2018).
Article CAS PubMed Central PubMed Google Scholar
Einson, J. et al. Genetic control of mRNA splicing as a potential mechanism for incomplete penetrance of rare coding variants. Genetics 224, iyad115 (2023).
Article CAS PubMed Central PubMed Google Scholar
Rio Frio, T., Civic, N., Ransijn, A., Beckmann, J. S. & Rivolta, C. Two trans-acting eQTLs modulate the penetrance of PRPF31 mutations. Hum. Mol. Genet. 17, 3154–3165 (2008).
Article PubMed Google Scholar
Wigdor, E. M. et al. Investigating the role of common cis-regulatory variants in modifying penetrance of putatively damaging, inherited variants in severe neurodevelopmental disorders. Sci. Rep. 14, 8708 (2024).
Article ADS CAS PubMed Central PubMed Google Scholar
Lukowski, S. W. et al. A single-cell transcriptome atlas of the adult human retina. EMBO J. 38, e100811 (2019).
Article PubMed Central PubMed Google Scholar
Menon, M. et al. Single-cell transcriptomic atlas of the human retina identifies cell types associated with age-related macular degeneration. Nat. Commun. 10, 4902 (2019).
Article ADS CAS PubMed Central PubMed Google Scholar
Yan, W. et al. Cell atlas of the human fovea and peripheral retina. Sci. Rep. 10, 9802 (2020).
Article ADS CAS PubMed Central PubMed Google Scholar
van Zyl, T. et al. Cell atlas of the human ocular anterior segment: Tissue-specific and shared cell types. Proc. Natl. Acad. Sci. USA 119, e2200914119 (2022).
Article PubMed Central PubMed Google Scholar
Liang, Q. et al. Single-nuclei RNA-seq on human retinal tissue provides improved transcriptome profiling. Nat. Commun. 10, 5743 (2019).
Article ADS CAS PubMed Central PubMed Google Scholar
Choi, J. et al. Spatial organization of the mouse retina at single-cell resolution by MERFISH. Nat. Commun. 14, 4929 (2023).
Article ADS CAS PubMed Central PubMed Google Scholar
Dorgau, B. et al. Deciphering the spatiotemporal transcriptional and chromatin accessibility of human retinal organoid development at the single-cell level. iScience 27, 109397 (2024).
Article ADS CAS PubMed Central PubMed Google Scholar
Masland, R. H. The neuronal organization of the retina. Neuron 76, 266–280 (2012).
Article CAS PubMed Central PubMed Google Scholar
Sharon, D., Blackshaw, S., Cepko, C. L. & Dryja, T. P. Profile of the genes expressed in the human peripheral retina, macula, and retinal pigment epithelium determined through serial analysis of gene expression (SAGE). Proc. Natl. Acad. Sci. USA 99, 315–320 (2002).
Article ADS CAS PubMed Google Scholar
McHarg, S., Brace, N., Bishop, P.N. & Clark, S.J. Enrichment of Bruch’s membrane from human donor eyes. J. Vis. Exp. JoVE 53382 https://doi.org/10.3791/53382. (2015).
Cabral, T. et al. Dissection of human retina and RPE-choroid for proteomic analysis. J. Vis. Exp. JoVE 56203 https://doi.org/10.3791/56203. (2017).
Gao, H. & Hollyfield, J. G. Aging of the human retina. Differential loss of neurons and retinal pigment epithelial cells. Invest. Ophthalmol. Vis. Sci. 33, 1–17 (1992).
CAS PubMed Google Scholar
Bhoi, J. D., Goel, M., Ribelayga, C. P. & Mangel, S. C. Circadian clock organization in the retina: from clock components to rod and cone pathways and visual function. Prog. Retin. Eye Res. 94, 101119 (2023).
Article PubMed Google Scholar
GTEx Consortium. Laboratory and Analysis Methods. GTEx Portal. https://gtexportal.org/home/methods (2019).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Picard toolkit. Broad Institute. GitHub repository. https://broadinstitute.github.io/picard/ (2019).
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47, D766–D773 (2019).
Article CAS PubMed Central PubMed Google Scholar
DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012).
Article CAS PubMed Central PubMed Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma. 12, 323 (2011).
Article CAS Google Scholar
Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).
Article CAS PubMed Central PubMed Google Scholar
Chu, T., Wang, Z., Pe’er, D. & Danko, C. G. Cell type and gene expression deconvolution with BayesPrism enables Bayesian integrative analysis across bulk and single-cell RNA sequencing in oncology. Nat. Cancer 3, 505–517 (2022).
Article CAS PubMed Central PubMed Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed Central PubMed Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Article CAS PubMed Google Scholar
Liao, Y., Wang, J., Jaehnig, E. J., Shi, Z. & Zhang, B. WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs. Nucleic Acids Res. 47, W199–W205 (2019).
Article CAS PubMed Central PubMed Google Scholar
Sayols, S. rrvgo: a Bioconductor package for interpreting lists of Gene Ontology terms. MicroPublication Biol. https://doi.org/10.17912/micropub.biology.000811 (2023).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Article PubMed Central PubMed Google Scholar
Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).
Article ADS MathSciNet PubMed Central PubMed Google Scholar
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet 2, e190 (2006).
Article PubMed Central PubMed Google Scholar
Taylor-Weiner, A. et al. Scaling computational genomics to millions of individuals with GPUs. Genome Biol. 20, 228 (2019).
Article PubMed Central PubMed Google Scholar
Mohammadi, P., Castel, S. E., Brown, A. A. & Lappalainen, T. Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Res. 27, 1872–1884 (2017).
Article CAS PubMed Central PubMed Google Scholar
Myers, T. A., Chanock, S. J. & Machiela, M. J. LDlinkR: an R package for rapidly calculating linkage disequilibrium statistics in diverse populations. Front. Genet. 11, 157 (2020).
Article PubMed Central PubMed Google Scholar
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Article PubMed Central PubMed Google Scholar
Snyder, M. P. et al. Perspectives on ENCODE. Nature 583, 693–698 (2020).
Article ADS CAS PubMed Central PubMed Google Scholar
Schubach, M., Maass, T., Nazaretyan, L., Röner, S. & Kircher, M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res. 52, D1143–D1154 (2024).
Article CAS PubMed Central PubMed Google Scholar

Download references

Acknowledgements

We express our sincere thanks to the donors and their families for enabling this research. We thank Selina Mcharg, Nadhim Bayatti and Jay Brown for the development of the Manchester Eye Tissue Resource. We also thank staff at the University of Manchester Genomic Technologies Core Facility and at the Ocular Genomics Institute, Harvard Medical School, for their help in the generation of DNA and RNA sequencing datasets for this study. The views expressed are those of the authors and not necessarily those of the funders, including the NIHR and the Department of Health and Social Care. J.E discloses support for the research of this work from the Macular Society (United Kingdom), Fight For Sight, the UK Medical Research Council and the NIHR Manchester Biomedical Research Centre (NIHR203308). J.S discloses support for the research of this work from the UK Medical Research Council (MR/W007428/1). P.I.S discloses support for the publication of this work from the Wellcome Trust (224643/Z/21/Z, Clinical Research Career Development Fellowship) and the UK National Institute for Health Research (NIHR) Clinical Lecturer Programme (CL-201-06-001). D.B discloses support for the publication of this work from the NIHR Research Professorship grant (RP-2016-07- 011). A.V.S discloses support for publication of this work from the NIH/ NEI (R01 EY031424). K.M.B discloses support for publication of this work from the NIH/NEI (R01EY035717).

Author information

Authors and Affiliations

Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
Jacob Sampson, Simon J. Clark, Paul N. Bishop, Steve Haynes, Jospin Al-Deek, Stacey Holden, Beverley Anderson, Andrew Hayes, Rahmat A. Kemal, Huw B. Thomas, Raymond T. O’Keefe, Siddharth Banka, Graeme C. Black, Panagiotis I. Sergouniotis & Jamie M. Ellingford
Ocular Genomics Institute, Department of Ophthalmology, Massachusetts Eye and Ear, Boston, MA, USA
Ayellet V. Segrè & Kinga M. Bujakowska
Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
Ayellet V. Segrè & Kinga M. Bujakowska
Broad Institute of Harvard and MIT, Cambridge, MA, USA
Ayellet V. Segrè
University Eye Clinic, Eberhard Karls University of Tübingen, Tübingen, Germany
Simon J. Clark
Institute for Ophthalmic Research, Eberhard Karls University of Tübingen, Tübingen, Germany
Simon J. Clark
School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
Diana Baralle
Department of Medical Biology, Faculty of Medicine, Universitas Riau, Pekanbaru, Indonesia
Rahmat A. Kemal
Manchester Centre for Genomic Medicine, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
Siddharth Banka, Graeme C. Black & Panagiotis I. Sergouniotis
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL- EBI), Wellcome Genome Campus, Cambridge, UK
Panagiotis I. Sergouniotis
Manchester Royal Eye Hospital, Manchester University NHS Foundation Trust, Manchester, UK
Panagiotis I. Sergouniotis

Authors

Jacob Sampson
View author publications
Search author on:PubMed Google Scholar
Ayellet V. Segrè
View author publications
Search author on:PubMed Google Scholar
Kinga M. Bujakowska
View author publications
Search author on:PubMed Google Scholar
Simon J. Clark
View author publications
Search author on:PubMed Google Scholar
Paul N. Bishop
View author publications
Search author on:PubMed Google Scholar
Steve Haynes
View author publications
Search author on:PubMed Google Scholar
Diana Baralle
View author publications
Search author on:PubMed Google Scholar
Jospin Al-Deek
View author publications
Search author on:PubMed Google Scholar
Stacey Holden
View author publications
Search author on:PubMed Google Scholar
Beverley Anderson
View author publications
Search author on:PubMed Google Scholar
Andrew Hayes
View author publications
Search author on:PubMed Google Scholar
Rahmat A. Kemal
View author publications
Search author on:PubMed Google Scholar
Huw B. Thomas
View author publications
Search author on:PubMed Google Scholar
Raymond T. O’Keefe
View author publications
Search author on:PubMed Google Scholar
Siddharth Banka
View author publications
Search author on:PubMed Google Scholar
Graeme C. Black
View author publications
Search author on:PubMed Google Scholar
Panagiotis I. Sergouniotis
View author publications
Search author on:PubMed Google Scholar
Jamie M. Ellingford
View author publications
Search author on:PubMed Google Scholar

Contributions

J.M.E., A.V.S., K.M.B., P.I.S., D.B. and G.C.B. conceived the study and obtained funding. J.S. led and performed the analysis described in the study under the supervision of J.M.E. and S.B. J.A.-D. performed cellular deconvolution analyses under the supervision of J.M.E. S.J.C. and P.N.B. collected retinal tissue and performed quality and age-related macular degeneration assessments. SH1, BA, SH2 and AH extracted molecular material and generated the paired DNA and RNA sequencing datasets. R.A.K., H.B.T. and R.T.O’K. designed and performed dual reporter luciferase assays. J.S. and J.M.E. wrote the manuscript with review and critical input from all authors.

Corresponding author

Correspondence to Jamie M. Ellingford.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Anand Swaroop and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data 1-7 (download XLSX )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sampson, J., Segrè, A.V., Bujakowska, K.M. et al. Paired DNA and RNA sequencing uncovers common and rare variation regulating human retinal gene expression. Nat Commun 17, 4595 (2026). https://doi.org/10.1038/s41467-026-72979-4

Download citation

Received: 03 June 2025
Accepted: 28 April 2026
Published: 26 May 2026
Version of record: 26 May 2026
DOI: https://doi.org/10.1038/s41467-026-72979-4