Introduction

DNA plays a dual role in gene expression. It encodes for regulators of transcription such as transcription factors (TFs) and allows their positioning in the transcriptional machinery by bearing specific transcription factor binding sites (TFBSs)1,2. The fundamental interaction that generates DNA-protein binding specificity and eventually gene expression, is generated by the specific affinity of TF protein contacts with specific DNA bases within TFBSs3,4,5. However, in metazoans, a tight control of the level, pattern, and timing of gene expression of different cell types has arisen by evolving TFs (protein) and their cognate TFBSs (DNA) interaction6. Each gene is regulated by multiple TFs-TFBS interactions7. TFBSs contain DNA sequence variation that exceeds that present in the evolutionarily more constrained coding sequence of TFs, in a model of co-evolution in which TFBSs evolve faster than their binders, TFs8,9,10. TFBSs are short degenerate DNA sequences (also known as motifs) that occur as monomeric11,12, dimeric, and overlapping sequences embedded in cis-regulatory elements, so that the DNA context surrounding them can also influence how they are bound by TFs13,14,15,16,17,18. On the other hand, TFs independently11,12,19, or coupling TF-TF interactions (TF-TFs) cooperatively20,21,22,18and combinatorially23,24recognize and bind TFBSs and their flanks through protein-23,25or DNA-protein-driven mechanisms15,26,27,28,29. On a larger scale, TFs and TFBSs combine their binding modes functionally to determine gene expression. The number, position, helical DNA face and binding site (strand) of TFBSs and are recognized by multiple TFs, resulting in qualitative and quantitative binding at the promoters and enhancers30,31,32, eventually leading to transcription, propelled by protein-proteins interaction33. These different modes originate and resolve in one recognized basic mechanism, TF-DNA binding affinity34,35. The dynamic interactions that regulate DNA-protein recognition passing through chromatin status and interaction of TFs with nucleosomes36,37, eventually lead to rates of association and dissociation of DNA-binding proteins to and from DNA, which in turn determine cell-specific transcriptional activity38,39,40. Therefore, DNA-protein interactions ranging from low to high affinity41(the number and types of TFBS and concentrations of TFs42,43,44and types) account for gene expression from noise45to gene expression programs46,47,48.

The human RHODOPSIN (RHO) gene encodes for a G-protein-coupled receptor expressed exclusively in retinal rod photoreceptors and located in their outer segment (OS)49. RHO in the OS intercepts and converts light photons into electrical signals through the phototransduction cascade and, when mutated, causes retinitis pigmentosa (RP) and other blinding disorders50,51. The Rho gene is also tightly regulated, as Rho gene expression levels determines the size of the outer segment of the rods52,53,54, and both low and high expression levels can lead to photoreceptor degeneration55,56. RHO expression during development increases linearly and after birth becomes highly and constantly transcribed in rod photoreceptors57. Two conserved enhancers (Rho Enhancer Region (RER)58 and CRX-Bound Region 1 (CBR)59) and the Rho promoter contribute differently along pre-natal and post-natal development of rod progenitors60. However, genetic studies on murine suggest that in the adult retina Rho expression maintenance is strongly controlled by the adult-specific accessible regions at the proximal promoter60,61, accounting for 50% of its expression60.

We previously have shown that, both a synthetic zinc finger protein (ZF-DB) and an ectopically expressed transcription factor (KLF15) block Rhodopsin (Rho) expression when delivered  in vivo to rod photoreceptors of adult porcine retina by Adeno-associated virus (AAV) vectors62,63. ZF-DB and KLF15 bind 20- and 9-bp DNA sequences, partially overlapping (6 bp), between two key TFs for RHO expression, the Cone Rod Homeobox protein (CRX)64and the Neural Retina-specific Leucine zipper protein (NRL)65 in the proximal promoter region of RHO (-82 to -62 and − 84 to -75 from the transcription start site, TSS, respectively; (Fig. 1A)). ZF-DB and KLF15 repression of Rho caught our attention for two main aspects: (i) ZF-DB and KLF15 bind sensitively a DNase-unprotected66,67region at 20-fold lower concentrations than those of CRX and NRL, and selectively, as low number of transcriptional off targets are induced62,63; (ii) ZF-DB and KLF15 resulted in Rho repression despite their biochemical differences, ZF-DB possesses exclusively DNA-binding properties (without effector domains), KLF15 is a fully competent TF68, overall suggesting that a very compact DNA cis-regulatory sequence near to TSS controls Rho expression in terminally differentiated rods. Thus, ZF-DB and KLF15 within Rho native chromosomal context in vivo, upon binding to an accessible DNA sequence between CRX and NRL, are sufficient to overcome the transactivation effects of CRX and NRL and in turn block Rho expression. These observations led us to hypothesize that CRX and NRL TFs pair additionally to binding, interact with the specific DNA sequence interposed between them by a novel mechanism, which ultimately contributes to determining Rho expression levels.

Results

Forty-nine bp containing CRX and NRL BSs (RHO regulatory unit, RRU) control RHO expression in adult retina

ZF-DB and KLF15 target a sequence between two extensively validated cis-regulatory elements, BAT-1 and NRE59,66,69, which are bound by the retinal-specific TF, CRX64and by the rod photoreceptor-specific NRL65, respectively. CRX-NRL pair operate synergistically to transactivate numerous rod-specific genes, including RHO59,69. On this basis, we assessed whether the 49 bp long cis-regulatory elements encompassing the CRX and NRL binding sites and the interposed linker sequence bound by ZF-DB and KLF15 may represent a cis-regulatory unit (here named RHO regulatory unit, RRU) controlling RHO expression (Fig. 1A). To this end, we generated constructs for an in vivo Adeno-associated virus (AAV) vectors reporter assay (Fig. 1A). In vivo reporter assay70 links variation of a DNA cis-regulatory sequence to the activity of a reporter gene. Here, gene expression variation output of the enhanced Green Fluorescent Protein reporter gene, eGFP, driven by cis-regulatory sequence (human RHO promoter fragment and multiple variants, see below) delivered by AAV was assessed into post-mitotic adult mouse retina71(Fig. 1A). To dissect the sequence-specific-function relationships derived by variation of human RHO promoter fragment in object, we selected conventional, one-at-a-time, one AAV-promoter variant per eye72. Relevant for the present study, and diversely from in vivo electroporation70,73, AAV vectors efficiently transduce post-mitotic adult photoreceptors71. AAV vectors delivered DNA to the target tissue, including in the retina71, remains almost exclusively episomal71,74,75, thereby isolating cis-regulatory activity from the genome context76,77. We cloned a human RHO proximal promoter fragment of 259 bp (hRHOp: -164 bp from TSS and 95 bp of 5’UTR), containing the 49 bp of the RRU, to drive the expression of the eGFP (Fig. 1A). To transduce photoreceptors, we used the extensively validated subretinal delivery of AAV serotype 872 at a dose of 1 × 109 (AAV8 genome copies, GC) in adult C57BL/6 mice (4 weeks old). To minimize the potential variability in the dose of the vector delivered following subretinal administration, we co-injected with the test AAV8 vectors a low dose of 1 × 108 GC of an AAV8 vector encoding for the red fluorophore mCherry and AAV8 vector encoding for the human RHO (hRHO) under the control of the photoreceptor specific promoter fragment of G protein subunit alpha transducin 1 gene (mCherry; AAV8-hGNAT1-mCherry; GNAT1; AAV8-hGNAT1-hRHO; 1 × 108 GC). Fifteen days after subretinal administration, eyes were harvested, and retinas were processed to measure eGFP levels by RT-PCR or immunofluorescence analysis to assess eGFP expression pattern (Fig. 1). The relative abundance (ratio between transcript levels of eGFP and mCherry or hRHO) of eGFP was evident and yielded similar values between mCherry or hRHO (data not shown; we chose to use AAV8-hGNAT1-hRHO as the sole normalizer in the following experiments, paragraphs below).

To dissect the contribution of DNA sequence determinants of RRU, we first mutated the CRX and NRL binding sites. AAV8-hRHOp-eGFP carrying individual mutations in mCRX1, mCRX2 or NRL binding sites showed that each binding site is required to sustained eGFP reporter expression (Fig. 1B). Next to assess whether RRU is necessary for RHO expression, we either deleted (Prom-Q) or duplicated (Prom-S) the RRU (Fig. 1C). AAV8-Prom-Q abolished reporter expression levels, whereas AAV8-Prom-S duplication almost doubled eGFP expression. These results suggest that RRU (a 49 bp long RHO compact cis-regulatory element containing CRX and NRL binding sites and the 22 bp long interposed linker sequence) contains key features necessary to control both levels and expression pattern restricted to rod photoreceptors expression in adult mice.

Fig. 1
figure 1

Forty-nine bp containing CRX and NRL BSs control RHO expression in adult retina. (A), Proximal promoter region of the human RHO containing the Rho Regulatory Unit (hRRU, chr3:129,528,537 − 129,528,585). hRRU is composed of the BAT-1 and NRE elements (dashed grey boxes), CRX and NRL TFs binding sites (in green and light blue, respectively) and DNA-linker (in red). The “footprints” onto hRHOp of the synthetic zinc finger protein (drawing of the six zinc-finger array of ZF-DB below the negative strand, amaranth) and the TF KLF15 (drawing of KLF15 including the effector domain in violet above the positive strand). Below, schematic representation of gene reporter assay with of Adeno-associated virus (AAV8) vector carrying eGFP under the control of the RHO proximal promoter (hRHOp) injected sub-retina in adult mice (4 weeks old), on the right-handed panel immunofluorescence analysis of a cross section of a retina transduced with AAV8-hRHOp-eGFP (3 × 109 genome copies, GC), co-injected with AAV8-hGNAT1-mCherry (3 × 108 GC) 15 days post-injection. (B), Histogram, quantitative PCR analysis (qPCR), eGFP expression levels 15 days after subretinal administration of AAV8-eGFP (3 × 109 GC) co-delivered with AAV8-hGNAT1-mCherry (3 × 108 GC) to account variability associated with subretinal AAV8 vector delivery (eGFP expression levels were normalised to those of mCherry, Methods; one promoter variant per eye, three replicates). (C) qPCR upon removal of hRRU (Prom-Q) and doubling the hRRU (Prom-S), and immunofluorescence analysis on Prom-Q, and Prom-S co-injected with a AAV8 vector encoding for mCherry (eGFP expression levels were normalised to those of mCherry, Methods; three individual retinal replicates for each construct). RPE, retinal pigment epithelium; OS, outer segment; ONL, outer nuclear layer; INL, inner nuclear layer; GCL, ganglion cell layer. Statistical significance was computed using the One-Way Anova test (p-values ≤ 0.05), and post-hoc Dunnett’s multiple comparisons test (p-values ≤ 0.05).

Changes of the DNA sequence interposed between CRX and NRL lead to differential reporter gene expression

To dissect the DNA sequence determinants within RRU, we restricted the analysis to the 22 bp long DNA sequence intervening between CRX and NRL, from now on termed the DNA-linker (Fig. 1A), and its relationship with the flanking CRX and NRL binding sites. To do so, we generated constructs carrying changes in CRX and NRL binding sites order, orientation, and the spacing between them (the length of the DNA-linker), referred to as the cis-regulatory syntax39, and tested them in vivo with AAV reporter assay. Mutual changes of CRX and NRL binding sites order (AAV8-Prom G-1), and concurrent change of order and orientation (AAV8-Prom G-2) resulted in less than half, and abolished reporter expression levels, respectively, relative to control (Fig. 2A). To manipulate the spacing between CRX and NRL binding sites, we generated three distinct non overlapping deletions of 5 bp, Del-5’, Del-C and Del-3’, in their respective AAV-reporter vectors, within CRX and NRL DNA-linker. These deletions entail a half DNA helical turn (10.5 bp represents an entire DNA turn) of the CRX and NRL binding sites, arranging them on the opposite sides relative to native configuration. In vivo reporter gene expression varied in these three constructs from null levels (AAV8-Del-3’) to levels equivalent (AAV8-Del-5’) to the native sequence (Fig. 2A). The fact that the same reduction of spacing resulted in differential eGFP expression levels, with an apparent decrease in eGFP expression levels in the 5’ to 3’ direction of the DNA-linker, points that the source of the variation is contained into CRX and NRL linking sequence; thus, we next mutated its DNA base sequence. We generated both contiguous (7 to 10 bp) and a single base substitution. These changes resulted in diversified levels of reporter gene expression (Fig. 2B). Compared with deletions, in this case only the first 5’ mutagenesis of AAV8-Prom-N retains some of the activity, while a few bases away from the 3’ inner flank of the DNA-linker (AAV8-Prom-Me and AAV8-Prom-3a) reporter expression begins to be greatly reduced. Thus, deletions and base substitutions within the DNA-linker lead to highly differentiated gene expression (Fig. 2B, C). This suggest that DNA base content and DNA-linker length are key features for the function of RRU. However, these effects may be due to the generation of de novo binding sites for either activators or repressors. To rule out this possibility we scanned each sequence for the respective BSs from the HOCOMOCO human collection of TFs (see Methods)78. The NRL and CRX BSs were consistently retained in the flanks of the DNA-linker in all hRHOp artificial promoter sequences tested, thus, compatible with previously described occupancy on the human RHO promoter67. On the contrary, in the DNA-linkers of both wild-type and synthetic promoters we were not able to identify any match with TFs expressed in the human retina (Supplementary Table 2). Thus, under these experimental conditions, equal spacing between CRX and NRL corresponds to a transcriptional output of RHO dictated by the DNA spacer sequence composition between them. To further rule out the possibility of generating novel BSs we inverted the 5’ and 3’ orientation (Prom-L; Fig. 2D). This arrangement affects only the 3’ and the 5’ flanks of CRX and NRL binding sites, respectively (Fig. 2D), while ensuring both the conservation of the DNA-linker sequence and its plus-minus DNA strands configuration relative to CRX and NRL BSs. Notably, in this arrangement the reporter expression levels were strongly reduced (Fig. 2D). Finally, immunofluorescence analysis showed that these changes of levels of reporter expression across different mutants corresponded to a variation of the intensity of eGFP expression and maintenance of the same eGFP pattern, restricted to rod photoreceptors (Figs. 1A and C and 2C). Collectively, these results suggest that without the apparent intervening action of other retina-specific TFs, the DNA linker connecting CRX and NRL controls the levels but not the pattern of RHO expression. Thus, the source of variation in RHO reporter expression depends on the syntax of CRX and NRL BSs (order, orientation, and distance of TFBSs), and in addition on the base composition of the DNA-linker sequence interposed between CRX and NRL (Prom-L).

Fig. 2
figure 2

Changes of the DNA sequence interposed between CRX and NRL lead to uncorrelated reporter gene expression. (A), Sequences, and histogram, qPCR analysis, eGFP expression levels 15 days after AAV8-eGFP in vivo retinal delivery in adult mice (Methods), showing the impact of change of the order, orientation, and spacing, of CRX and NRL BSs. (B), Sequences, and qPCR of nucleotide substitutions in the DNA-linker (C), Immunofluorescence analysis on AAV8-hRHOp, AAV8-Prom-N, AAV8-Prom-Me, AAV8-Prom-H, AAV8-Prom-E and AAV8-Prom-3A. RPE, retinal pigment epithelium; OS, outer segment; ONL, outer nuclear layer; INL, inner nuclear layer; GCL, ganglion cell layer. (D), Quantitative effects (qPCR) of DNA-linker sequence cloned in the 5’ and 3’ orientation (sequence, AAV8-Prom-L). Statistical significance was computed using the One-Way Anova test (p-values ≤ 0.05), and post-hoc Dunnett’s multiple comparisons test (p-values ≤ 0.05). Only in Fig. 2D we performed t-student test (p-values ≤ 0.0001).

Human and murine RRUs diverge but lead to correlated reporter gene expression

Next, we compared the DNA sequence features of RRU along the mammalian phylogeny. The alignment of 100 mammalian genomes shows that, apart from rodents, the two CRX and the NRL binding sites are conserved (Fig. 3A, B and Supplementary Table 2). Another noteworthy feature is the conserved spacing of 22 nucleotides of the DNA-linker. However, alternative nucleotide changes are present in several mammals, although not completely evenly distributed (the central sequence is more conserved than the 5’ and 3’ regions), including the internal bases of the flanking core CRX and NRL binding sites (Fig. 3A). Therefore, given conserved spacing, nucleotide sequence varies across mammalian phylogeny (Supplementary Fig. 3). Unlike other mammals, in rodents, and particularly in mouse and rat, RRU has one CRX binding site, and the DNA-linker distance has one more nucleotide (23 bp), while nucleotide sequence of the DNA-linker diverges extensively (Fig. 3A). Thus, we compared the human and murine homolog sequence. To study murine RRU (mRRU), we generated AAV reporter constructs with the proximal murine rhodopsin promoter (mRhop, -164 bp from the TSS and − 78 bp from the 5’UTR). The murine promoter resulted in slightly lower reporter gene expression than the human counterpart (Fig. 3C). However, reciprocal replacement of human RRU with murine RRU in human RHOpromoter (hRHOp) and of murine RRU with human RRU in mouseRho promoter restored the species-specific reporter expression levels. This suggests that despite the divergences between the DNA-linker, human and murine RRUs are functionally equivalent and relatively independent from both the embedded motif and the background sequences17. Furthermore, between humans and mice, nucleotide divergence in the DNA-linker corresponded to similar levels of reporter expression, supporting that, unlike synthetic constructs, a functional correlation may be obtained via an evolution-driven nucleotide arrangement. To further explore and isolate the properties of the human and murine DNA-linkers we swapped the 5 bases in the middle of them. These sequences are highly divergent, GC-rich in human and AT-rich in mouse (Fig. 3D). In addition, these bases are centered in the DNase-unprotected sequence66,67. AAV constructs containing the humanRHO promoter carrying the mouse 5 bp element (ATGAT, AAV8-hRHOp-mAT) and the mouseRho promoter carrying the human 5 bp element (CCCCA, AAV8-mRhop-hCG), showed a precise inversion of the expression levels. Indeed, the human RHOpromoter with the mouse sequence (hRHOp-mAT) equals mouseRho promoter levels, whereas the murine Rhopromoter (mRhop-hCG) equals the human RHOpromoter. This suggests that an even shorter sequence within the DNA-linkers is responsible for the species-specific Rho expression levels.

Fig. 3
figure 3

Human and murine DNA-linkers diverge but lead to species-specific RHO reporter expression. (A), Multiple sequence alignment for representative mammalian species from UCSC track. The CRX and NRL binding sites are highlighted in green and blue, respectively. The DNA-linker is highlighted in grey. Conserved bases are indicated in bold. (B), Distribution of PhyloP score of DNA bases in the NRL (green box) and CRX binding sites (blue box) and in the DNA-linker of the human RRU. PhyloP score indicates the degree of conservation of a DNA base in vertebrates. (C), Sequences and qPCR analysis of mutual changes of hRRU (AAV8-mRhop-hRRU) and mRRU (AAV8-hRHOp-mRRU) insertion in murine and human Rho/RHO promoters, respectively. (D), Sequence and qPCR analysis of the human RHO promoter with 5 bp of central murine DNA-linker (ATGAT, AAV8-hRHOp-mAT) and mouse Rho promoter with 5 bp of central human DNA-linker (CCCCA, AAV8-mRhop-hCG). Statistical significance was computed using the One-Way Anova test (p-values ≤ 0.05), and post-hoc Dunnett’s multiple comparisons test (p-values ≤ 0.05).

Orthogonal DNA-linker binding proteins perturb reporter gene expression

To determine whether these cis-regulatory sequence properties are retained in isolation, in an unrelated trans-regulatory system in which retina-specific genes are not expressed, including CRX and NRL, we co-transfected HEK293 cell line with the same plasmids used to generate AAV vectors and those encoding for CRX and NRL (Methods). hRHOp showed synergistic activation upon co-transfection with CRX and NRL and similar gene expression variation as observed in vivo on several constructs (Supplementary Fig. 1B and Fig. 1C and Supplementary Table 1). Next, we interfered the DNA-linker activity with the DNA-linker binding proteins which we have previously used to repress RHO expression in vivo62,63. Since the first of the six fingers of the ZF-DB protein intrudes into the NRL-binding site (Fig. 1A), we removed it obtaining ZF-DB-5 (Fig. 4A). This ZF-DB-5 mutant, which has shown similar RHO repression properties as those of the ZF-DB, in RHO-P347S mice retina in vivo (Supplementary Fig. 2)62, when transfected with hRHOp in vitro, repressed RHO reporter expression (Fig. 4). Next, to get insights into the potential mechanisms underlying the action of the DNA-linker, we speculated that by interfering DNA-linker properties with the orthogonal proteins ZF-DB-5 and KLF15 we could affect the presumptive activity passing through the DNA-linker. To this end we tested whether the DNA-linker (Prom-L) which carries the DNA-linker sequence cloned in the 3’-5’ direction, thus maintaining the same binding sites for ZF-DB-5 and KLF15 (Fig. 4A) could interfere its gene expression output. Surprisingly, Prom-L, which has shown to strongly reduce RHO reporter activity when co-transfected with ZF-DB-5 and KLF15 (Fig. 4B light green), promoted a transition to transactivation (Fig. 4B dark green). Notably, ZF-DB-5 without a transactivation domain increased the expression of the reporter more than KLF15, which possesses a catalytic domain68. These results suggest that interfering the DNA-linker sequence with orthogonal proteins results in down- or up-regulation of gene expression, depending on the orientation of the DNA-linker sequence between CRX and NRL binding sites.

Fig. 4
figure 4

Orthogonal DNA-linker binding proteins perturb reporter gene expression in vitro. (), Drawing of hRHOp and Prom-L promoters with the orthogonal protein ZF-DB-5 and KLF15 bound to the DNA-linker in the opposite direction while maintaining the same double strand side. (B), Quantitative effects (qPCR) of transfecting ZF-DB-5 or KLF1562,63, and Fig. 4A, in HEK-293 cell line with hRHOp (light green, left hand side, and panel B) or Prom-L (DNA-linker sequence cloned in the 3’-5’ direction, dark green, right hand side, and panel B). Statistical significance was computed using the One-Way Anova test (p-values ≤ 0.05), and post-hoc Dunnett’s multiple comparisons test (p-values ≤ 0.05).

Discussion

This study shows that DNA is itself a source of contribution to variation of gene expression levels of RHO. We found that a 49-bp-long sequence of the human RHO proximal promoter, containing binding sites (BSs) for CRX and NRL, is a functional discrete cis-regulatory unit (RHO regulatory unit, RRU) that controls Rho expression levels in the adult rod photoreceptors of the retina. This result is in accordance with previous observations which have shown that short proximal sequences of RHO promoter enable Rho expression in vivo in adult mouse rod photoreceptors60,61,79. Whereas, in adult retina, Rho enhancers account for a relative minor role in maintenance of  Rho expression compared to Rho proximal promoter, as recently genetically determined60. These results complement those obtained in vivo with two transcriptional repressors, ZF-DB or KLF15 in both humanized P347S mouse (carrying 4 kb of the promoter of human RHO, including the DNA-linker element)62,63 and pig post-mitotic retina62,63and along murine retinal development upon their germ-line or somatic expression82. Expression in post-mitotic rod photoreceptors of these two trans-acting proteins (in both absence or presence of an effector domain68 ZF-DB, KLF15, respectively) upon binding, independently, to a partially overlapping 23 bp of the DNA-linker result in robust RHO-specific repressive action (~ 50%)62,63. Strongly supporting that: (i) the proximal promoter elements of Rho play a central role in Rho expression in the adult retina, (ii) more specifically these key elements are contained in the compact RRU (Fig. 1A).

This “strong self-functioning promoter”60 allowed us to intercept a potential additional property of DNA-protein interaction. Indeed, here we found that within RRU, the base specific composition and length (22 bp) of the DNA-linker, contribute to the RHO expression levels. We first found that the shortening (5 bp) of three different consecutive regions of the DNA-linker, which leads to a different DNA sequence between the CRX and NRL binding sites (Fig. 2A), caused a differential eGFP expression in the retina, in vivo (Supplementary Fig. 1A and Supplementary Table 1). Similarly, mutagenesis of the DNA-linker resulted in highly differentiated eGFP expression, consistent with that observed by others in the homologous sequence of the murine Rho promoter83. Notably, the sequence derived by both these shortened and mutagenized DNA-linkers did not generate novel binding sites for TFs expressed in the human retina. These results were also confirmed by immunofluorescence analysis, which showed that the expression of eGFP was found invariably restricted to photoreceptors, as the native sequence, albeit with different intensities, thus suggesting that the mutated linkers do not allow binding of additional TFs. Complementarily, when guided by evolution, we found that the relative optimal expression levels of the human and murine RHO proximal promoters were determined by the specific composition and length of the DNA sequence of their respective DNA-linkers. Human and murine RHO proximal promoters show a slightly different reporter expression levels in favour of the human promoter, in our experimental setting. Notably, swapping their respective RRUs translated functionally in an inversion of promoter strengths. Furthermore, we showed that the exchange of the 5 bp divergent sequences between the human and mouse, in the middle of their DNA-linkers, enables an exact species-specific segregation of their respective RHO promoter strength (Fig. 3D). Therefore, in contrast to shortened and mutagenized DNA-linkers, these naturally occurring divergent human and murine DNA-linkers promoted Rho gene expression in a correlated manner. These results suggest that a DNA sequence-dependent feature impacts Rho expression, and that a precise base DNA sequence content and length (22 in humans and 23 bp in mice) ensures adequate quantitative gene expression levels of human and mouse  Rho. Considering that the size of the DNA sequence space grows exponentially with sequence length (4l for DNA, where l is the sequence length), the possible combinations for 22 bp are exceptionally high. However, the phylogenetic analysis has shown that human and mouse share in their respective DNA-linkers both sparse bases and almost complete conservation of the length of the DNA-linkers (Fig. 3 and Supplementary Fig. 3). These aspects suggest that the DNA-linker code is evolutionarily constrained, and that evolution may have created the respective DNA-linkers after separation from the common ancestor to meet physiological ranges of  Rho expression relevant for the two species. Interestingly, it has been suggested that Rho gene expression levels determine the size of the outer segment of photoreceptors53,54. What we observed suggests that the human DNA-linker provides higher transcriptional output than the murine DNA-linker. Considering that human rod photoreceptors are larger than murine84, it will be interesting to determine whether these species-specific size differences are due to the contribution of the two diversified DNA-linkers.

We additionally showed that DNA-linker-derived variation is interconnected with the binding sites of CRX and NRL. Single BS mutations of CRX and NRL abolish RHO promoter activity (Fig. 1B). In addition, changing their order and orientation led to a profound change in reporter expression activity (Fig. 2A). Although the BSs of CRX and NRL (as well as the DNA-linker itself) are not palindromic and can have four orientations relative to each other, they show complete conservation of both the sequence of the BSs and their orientation, supporting that these features are also evolutionally constrained. Similarly, although the linker DNA sequence conservation in mammals is poor, it is nevertheless evident that the DNA-linker orientation is also constrained. In fact, changing its orientation with respect to the CRX and NRL BSs, by reversing the 5’ − 3’ axis (Prom-L), greatly reduced the expression of the RHO reporter. However, this configuration modifies the 3’ and 5’ flanks of CRX and NRL, respectively, which, in turn may affect CRX and NRL binding. Indeed, the sequences flanking TFBSs are important for cooperative binding of TFs14,16,26. Nonetheless, (i), the binding preferences of TF pairs are typically limited to a few bases flanking the TFBSs; (ii), the 3’ and 5’ flanks of the DNA-linker are variable along the mammalian phylogeny (Fig. 3A), and (iii), we observed that changes along the entire 22 bp length of the DNA-linker induce a wide variation in reporter gene expression. Thus, these aspects seem to be too broad both qualitatively and quantitatively to justify that what is observed here might be determined by the binding preferences of TF pairs alone, even considering mechanisms concerning the mutual influence of TF pairs that are exerted at greater distance85. As opposed to short-mid range binding effects, the DNA-linker length appears too short to allow loop formation between CRX and NRL. Thus, we favour a model in which the DNA-linker is coupled in a continuum to its flanking CRX and NRL BSs.

For there to exist a mechanism of gene expression due to the contribution of the DNA sequence contained between two TFs, it should be possible to admit (and observe) the existence of an additional component, independent of the DNA-protein interaction by binding affinity, which is the accepted core mechanism from which gene expression originates and is implemented41. The ability of both ZF-DB and KLF15 to block  Rho expression in vivo at 20-fold lower concentrations than CRX and NRL62,63and the unprotection from DNase of the DNA-linker sequence66,67 open the possibility that the DNA-linker, being not bound, may act subsequently of TF binding, and thus, by a DNA binding affinity-independent mechanism. Thus, in this view, our model admits that to the initial TFs binding of CRX and NRL occurring in affinity-dependent manner, a second event intervenes consequently (post-binding) as a DNA sequence-dependent intramolecular mechanism, acting as an additional component to the activity of the two TFs CRX and NRL. Furthermore, the same orthogonal trans-acting factors, ZF-DB-5 and KLF15, which block Rho expression (above, Fig. 1A)62,63 when bind the DNA-linker sequence arranged in the opposite direction (orientation 3’-5’, Prom-L, Fig. 4), which severely impairs reporter expression, surprisingly resulted in reporter gene activation. Strikingly, this activation is more pronounced when the DNA-linker is bound by ZF-DB-5, which lacks catalytic domains68. Thus, rather than a protein-protein steric interference, or a direct protein-protein interaction with CRX and NRL, which appears unlikely because of their biochemically distinct properties and their binding arrangement (Fig. 4), may further support that ZF-DB-5 and KLF15 modify an activity passing through an exact DNA bases composition and length coupled to the action of two TFs.

The results obtained with the AAV reporter assay decouple cis-regulatory DNA sequences from their native context by being episomal, thus allowing their function to be interrogated in isolation76,77. On the other hand, however, this method cannot fully account for the effects of DNA-linker sequence variation at the locus, including considering the impact of copy number variations of AAV episomes, their unequal distribution within photoreceptors, and the potential interference effects resulting from the use of the eGFP reporter. It will therefore be necessary to design species-specific exchanges by knocking-in the DNA-linker sequence at the promoter locus and to search for possible inter- and intra-specific variants to establish the net contribution of DNA-linker sequence composition and length in Rho gene expression in health and disease. To uncover the basis of the biophysical nature of the DNA-TF interaction, beyond CRX-NRL DNA binding interactions86,87, by including and resolving the ternary CRX-NRL-DNA-linker complexes structure, determine the dynamics of TF-binding events87,88, resolve at single-nucleotide resolution the fraction of the DNA-linker effectively bound/unbound by proteins89 and its chromatin profiling90, to finally decouple the relative contribution of the canonical binding-dependent mechanisms (binding affinity) from the binding-independent mechanism proposed here that explains Rho gene expression.

In conclusion, beyond pattern and timing, we propose that levels of gene expression are contributed by DNA itself via a binding-independent mechanism which adds to the TFs binding-dependent mechanisms (Supplementary Fig. 4). Once TFs are bound to TFBSs, by the specific DNA base sequence interposed between them allows the activity of TF-TFs to be integrated; as a result, the DNA base sequence and length constitute a new code that operates within a TF-DNA-TF complex, which thus represents a unit of activity, determining the appropriate final levels of gene expression. In this view, the role of DNA is thus threefold: it contains information that encodes for proteins, regulates their interactions (with a first feedback, binding to DNA), and participates in their own end action (with a second feedback activity) by becoming a phenotype.

The control of gene expression proposed here for RHO may have evolved, and function more generally, as a still-hidden property, adding to the DNA-binding-dependent mechanisms determinant of gene expression.

Materials and methods

Plasmid construction

The human rhodopsin proximal promoters (hRHOp, 164 bp from the transcription starting site (TSS) and 95bp of 5’UTR) and the mouse proximal promoter (mRHO, 164bp from the transcriptional starting site (TSS) and 78bp of 5’UTR) were generated by gene synthesis of Eurofins MWG® and cloned in pAAV2.1-eGFP using NheI and NotI restriction enzymes. All variant promoters of hRHOp (Prom-3A, Prom-H, Prom-Me, Del-3’, Del-C, Prom-E, Prom-Q, Prom-N, Del-5’, Prom-S, hRHOp-mRRU, mRHOp-hRRU, PromG-1, PromG-2, Prom-L, MutCRX1, MutCRX2, MutCRX1.2, MutNRL) were generated by gene synthesis of Eurofins MWG® and cloned in pAAV2.1-eGFP using NheI and NotI restriction enzymes.

AAV vector preparation and purification

AAV vectors were produced by by the TIGEM AAV Vector Core by triple transfection of HEK293 cells followed by two rounds of CsCl2 purification. For each viral preparation, physical titers [genome copies (GC)/mL] were determined by averaging the titer achieved by dot-blot analysis and by PCR quantification using TaqMan (Applied Biosystems, Carlsbad, CA, USA)91.

Mice

All studies on mice were performed in accordance with the institutional guidelines for animal research (ARRIVE guidelines) and approved by the Italian Ministry of Health; Department of Public Health, Animal Health, Nutrition and Food Safety in accordance with the law on animal experimentation (article 7; D.L. 116/92; protocol number: 114/2015-PR). C57BL/6J mice (Charles Rivers Laboratories, Calco, Italy) and P347S+/+ animals80, were bred as follow: P347S–/– mice were crossed with C57BL/6J mice (Charles Rivers Laboratories) to obtain the P347S+/– mice80,81,62,92. These mouse lines were tested free of rd8 mutations92.

Vector administration

Intraperitoneal injection of ketamine and medetomidine (100mg/kg and 0.25mg/kg respectively), then AAV vectors were delivered sub-retinal via a trans-scleral transchoroidal approach. Once anesthetized, the pupils are dilated with 0.5% tropicamide. A conjunctival incision is made and used to grasp and rotate the ocular globe. Subsequently, a sclerotomy is made by penetrating the sclera with a 27-gauge needle. A 33-gauge needle connected to a Hamilton syringe is inserted through the sclerotomy 2 mm posterior to the temporal limbus. The cannula is then advanced tangential to the curvature of the globe to the subretinal space in the posterior pole. Each eye was co-injected with two AAV8 vectors: AAV8 vector carrying AAV8-eGFP driven by human and murine proximal promoter variants (dose 1x109GC/µl) and an AAV8 control vector encoding for human RHO gene driven by the GNAT1 proximal promoter, or AAV8 control vector encoding for mCherry driven by the GNAT1 proximal promoter (dose 1x108GC/µl).

qReal time PCR

RNAs from tissues were isolated using RNAeasy Mini Kit (Qiagen), according to the manufacturer protocol. cDNA was amplified from 1 μg isolated RNA using QuantiTect Reverse Transcription Kit (Qiagen), as indicated in the manufacturer instructions.  The PCRs with cDNA were carried out in a total volume of 20 μl, using 10 μl LightCycler 480 SYBR Green I Master Mix (Roche) and 400nM primers under the following conditions: pre-Incubation, 50°C for 5 min, cycling: 45 cycles of 95°C for 10 s, 60°C for 20 s and 72°C for 20 s. Each sample was analyzed at least in triplicate. Transcript levels of eGFP, hRHO or mCherry were normalized against murine Gapdh and Actβ (DCt), and then the eGFP expression shown is obtained as fold change over control (co-injected vector:AAV8-hGNAT1-hRHO or AAV8-hGNAT1-mCherry). We used the following primers: Act_forward (CAAGATCATTGCTCCTCCTGA) and Act_reverse (CATCGTACTCCTGCTTGCTGA), Gapdh_forward (GTCGGTGTGAACGGATTTG) and Gapdh_reverse (CAATGAAGGGGTCGTTGATG); eGFP_forward (ACGTAAACGGCCACAAGTTC) and eGFP_reverse (AAGTCGTGCTGCTTCATGTG); hRHO_forward (TCATGGTCCTAGGTGGCTTC) and hRHO_reverse (GGAAGTTGCTCATGGGCTTA); mCherry_forward (CACTACGACGCTGAGGTCAA) and mCherry_reverse (GTGGGAGGTGATGTCCAACT).

Histological analysis

For morphological studies, after the mice were killed by cervical dislocation, the eyecups were harvested, fixed by immersion in 4% paraformaldehyde, and then embedded in OCT (KalteK). For each eye, 150 to 200 serial sections (10-μm thick) were cut along the horizontal plane; the sections were progressively distributed on 10 glass slides so that each slide contained 15 to 20 sections representative of the whole eye at different levels. Slides were coverslipped with Vectashield containing DAPI (4',6-diamidino-2- phenylindole; Vector laboratories, Burlingame, CA, USA) to stain cells nuclei and retinal histology was analyzed a Leica Fluorescence Microscope System (Leica Microsystems GmbH, Wetzlar, Germany).

Transient transfection and analysis of promoter activity by qReal Time PCR

HEK293 cells were plated in six-well plates at a density of 1x106 cells/well in Dulbecco’s modified Eagle’s medium supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin and grown for 24h a 37° C. The cells were transfected at 80% confluency using a TransIT-X2 Dynamic Delivery System (MIRUS), as indicated in the manufacturer instructions.  We co-transfected the following constructs: 50 ng reported plasmid, 250ng of a plasmid encoding for human CRX, 125 ng of a plasmid expressing human NRL and 125ng CMV-ZF5 or CMV-KLF15 plasmid. The DNA amount was kept at 550 ng/well with pUC19. Cells were grown and harvested 48h post transfection in TRIzol™ Reagent (Life Technologies). The cDNA was amplified from 1 μg isolated RNA using QuantiTect Reverse Transcription Kit (Qiagen), as indicated in the manufacturer instructions. The PCRs with cDNA were carried out in a total volume of 20 μl, using 10 μl LightCycler 480 SYBR Green I Master Mix (Roche), 400nM primers under the following conditions: pre-Incubation, 50°C for 5 min, cycling: 45 cycles of 95°C for 10 s, 60°C for 20 s and 72°C for 20 s. Each sample was analysed in triplicate. Transcript levels of cells were measured by real-time PCR using the LightCycler 480 (Roche) and the following primers: PCR eGFP_forward (ACGTAAACGGCCACAAGTTC) and eGFP_reverse (AAGTCGTGCTGCTTCATGTG), GAPDH_forward (GAAGGTGAAGGTCGGAGT) and GAPDH_reverse (GAAGATGGTGATGGGATTTC).  The eGFP expression shown is obtained normalizing to GAPDH.

Statistical analyses

Data are presented as mean ± Error bars indicate standard error mean (SEM). Statistical significance was computed using the One-Way Anova test (p-values ≤ 0.05), and post-hoc Dunnett’s multiple comparisons test (p-values ≤ 0.05). Only in Fig. 2D we performed t-student test (p-values ≤ 0.0001).

Motif Scan

TF human motifs were retrieved from HOCOMOCO (version 11)78  and scanned on target sequences using FIMO (meme-suite 5.5) (Grant CE,2011), with background model derived from target sequence composition and parameters --norc and –thresh 0.01. Matches with positive scores and p-value<0.01 were considered. Alternative matches for a given TF on the same sequence were disambiguated by considering the match with the highest score.

Matches with positive scores and p-value<0.01 were considered. Alternative matches for a given TF on the same sequence were disambiguated by considering the match with the highest score.

TF expression in retina

Transcript and protein expression data were retrieved from the Human Protein Atlas version 23.0 (https://www.proteinatlas.org/). We considered as positively expressed transcripts with normalized TPM >1 in human retina (n=13.114), and proteins with positive detection in photoreceptor cells (n=79).

Electrophysiological testing

The method is as described 93. Briefly, mice were dark reared for 3 hr and anesthetized. Flash electroretinograms (ERGs) were evoked by light flashes generated through a Ganzfeld stimulator (CSO, Costruzione Strumenti Oftalmici, Florence, Italy). ERG analysis in scotopic conditions were evoked by 11 stimuli (from – 4 to +1.3 log cd*s/m2) with an interval of 0.6 log unit. and registered as previously described.