Main

The DNA guanine quadruplexes (G4) are stable secondary structures of DNA that form in guanine-rich regions of the genome1. Although the in vitro formation of G4 has been known for decades, only recently have the formation and biological functions of G4 in cells been demonstrated1,2,3. Computational analysis of potential G4-forming sequences uncovered more than 376,000 G4 sequence motifs in the human genome4. Additionally, ChIP–seq analysis with the use of G4 structure-specific antibodies revealed approximately 10,000 potential G4 sites in chromatin of cultured human cells5,6,7. Moreover, several DNA G4-recognition proteins have been identified, including DNMT1, hnRNP A1, nucleolin, PARP1, SLIRP, etc.3,8,9,10,11. The large number of G4 sites detected in chromatin and the existence of multiple cellular proteins for G4 structure recognition suggest the crucial functions of G4 structures in cells. Indeed, G4 structures have been shown to assume important roles in various DNA metabolic processes, including replication, transcription and repair1,2,3.

YY1, a zinc-finger-containing transcription factor12, is ubiquitously expressed and modulates many crucial biological and cellular processes, including DNA replication/transcription/repair, cell proliferation/differentiation and embryogenesis12,13,14. Recently, YY1 was found to promote the formation of enhancer–promoter loops, which involves interaction of YY1 with duplex DNA harboring a consensus YY1-binding motif15.

Here, by using an unbiased quantitative proteomic method, we identified YY1 as a putative G4-binding protein. We also demonstrated that YY1 is capable of interacting directly with G4 DNA in vitro and in cells, and this interaction contributes to YY1-mediated long-range DNA looping and gene expression regulation.

Results

Identification of YY1 as a candidate G4-binding protein

We recently reported a stable isotope labeling by amino acid in cell culture (SILAC)-based quantitative proteomic workflow for the discovery of G4-binding proteins, which involved the use of biotin-conjugated G4 DNA probes derived from the promoters of KIT and MYC genes and the human telomere (hTEL), along with the corresponding mutated sequences incapable of folding into G4 structures (M4)11. The proteomic data also revealed YY1 as a candidate G4-binding protein, with the SILAC protein ratios (G4/M4 probes) being 1.92 ± 0.11, 1.77 ± 0.11 and 2.02 ± 0.21 for the cKIT, cMYC and hTEL probes, respectively (Extended Data Fig. 1). Representative electrospray ionization–mass spectrometry (ESI–MS) and MS/MS for a tryptic peptide derived from YY1, FAQSTNLK, are shown in Supplementary Figs. 1 and 2.

YY1 binds directly with G4 structures in vitro

We next asked whether the interaction between YY1 and G4 structures is direct, by measuring the binding affinities between recombinant human YY1 protein and G4 or M4 probes using fluorescence anisotropy (Fig. 1). To this end, we first purified GST-tagged YY1 protein and, because GST is known to form a homodimer16, we removed the GST tag and further purified the tag-free YY1 protein (see Methods). The results showed that recombinant YY1 binds strongly with all three G4 structures, with the Kd values being 26.7 ± 2.2, 26.3 ± 2.1 and 82.0 ± 5.3 nM for G4 structures derived from cKIT, cMYC and hTEL, respectively (Fig. 1a–c). The binding affinities toward M4 probes are at least tenfold lower than that toward G4 DNA (Fig. 1a–c). These results demonstrate that YY1 binds directly and strongly with G4 structures in vitro.

Fig. 1: Tag-free YY1 binds selectively with G-quadruplex DNA.
figure 1

ac, Fluorescence anisotropy for monitoring the binding of YY1 protein with G4 structures and the corresponding mutated sequences derived from the promoters of KIT (a) and MYC (b) genes, or the human telomere (c). The data in ac represent mean ± s.e.m. from three independent experiments. df, EMSA results showing the binding of YY1 toward G4 structure without treatment (d) or upon treatment with different concentrations of PDS (e) or TMPyP4 (f). The EMSA experiments were performed independently three times with similar results.

Source data

We also examined whether the binding between YY1 and G4 alters the folding of G4 structures. The circular dichroism (CD) spectra for the G4 DNA probes in YY1–DNA complexes are very similar to those of the free G4 DNA probes (Extended Data Fig. 2), underscoring that the interaction with YY1 does not alter the folding of G4 structures.

Pyridostatin (PDS) and 5,10,15,20-tetra-(N-methyl-4-pyridyl)porphyrin (TMPyP4) are small-molecule ligands that bind specifically to and stabilize G4 structures17,18. Our results from electrophoretic mobility-shift assay (EMSA) showed that both PDS and TMPyP4 could displace YY1 from G4 structures in vitro (Fig. 1d–f), further substantiating the capability of YY1 for binding directly with G4 structures.

Duplex DNA with a sequence motif of 5′-AANATGGC-3′ was previously identified as the consensus YY1-binding site19,20. For comparison, we also measured the Kd value for YY1 in binding with a consensus motif-carrying duplex DNA, and it turned out that the binding affinity (Kd = 11.6 ± 0.8 nM, Supplementary Fig. 3) was only slightly higher than those observed for its binding with G4 structures.

The role of YY1 zinc-finger domain in binding with G4

YY1 is highly conserved in vertebrates, where its homologs in human, chimpanzee, dog, mouse, rat and chicken share >80% amino acid sequence identity (Supplementary Fig. 4). YY1 contains a polyhistidine cluster, a REPO domain and four zinc-fingers (Extended Data Fig. 3a). The zinc-fingers, located in the C-terminal portion of YY1 and involved in its binding with duplex DNA, are identical among these species (Extended Data Fig. 3 and Supplementary Fig. 4), suggesting that the functions of the zinc-finger domain in YY1 are highly conserved.

To assess which segment of YY1 protein is involved in its binding with the G4 structure, we generated several truncated or mutated fragments of YY1 protein and measured their binding affinities towards G4 DNA (Extended Data Fig. 3 and Supplementary Fig. 5). Our results showed that YY1 fragments lacking the C-terminal zinc-finger (YY11–382), or with a cysteine in the third zinc-finger being mutated to a serine (YY1C360S), exhibit markedly diminished binding affinities toward the G4 structure (Extended Data Fig. 3b,c). On the other hand, truncated YY1 proteins without the N-terminal 230 (that is, YY1231–414) or 292 (that is, YY1293–414) amino acids display similar or slightly stronger binding affinities towards the hTEL G4 structure relative to full-length YY1 (Extended Data Fig. 3d,e). Neither YY1231–414 nor YY1293–414, however, exhibits selective binding toward G4 over M4 (Extended Data Fig. 3d,e). Together, the above results support that the C-terminal zinc-finger domain of YY1 is required for robust binding to G4 DNA, but not sufficient for the protein to discriminate G4 structure from single-stranded M4.

The capability of YY1 in binding to both consensus motif-containing duplex DNA and G4 DNA, and the requirement of the zinc-finger domain for both binding modes, suggest that these two types of bindings may compete with each other. To test this, we assessed the binding of YY1 with the G4 structure or duplex DNA harboring the YY1 consensus motif in the presence or absence of different concentrations of unlabeled G4 DNA or the consensus motif-bearing duplex DNA (Supplementary Fig. 3). It turned out that both competitors could repress, in a concentration-dependent manner, binding of YY1 with G4 or YY1 consensus motif-containing duplex DNA (Supplementary Fig. 3), underscoring the competitive binding of YY1 toward G4 structures and the consensus motif-containing duplex DNA.

YY1 binds with G4 structures in cells

We next investigated the interactions between YY1 and G4 structures in cells by using ChIP–seq analysis. To this end, we employed CRISPR–Cas9 to incorporate a tandem affinity tag (3× Flag–2× Strep) to the C terminus of endogenous YY1 protein in HEK293T cells (YY1-TAPTAG, Supplementary Fig. 6). ChIP–seq analysis with the use of anti-Flag antibody and the YY1-TAPTAG cells led to the detection of 12,063 significant peaks (Fig. 2). An analysis of the overlap between YY1 ChIP–seq peaks and G4 ChIP–seq peaks (a BG4 ChIP–seq dataset from the NCBI GEO database) showed that approximately 39% (4,721 out of 12,063) of YY1 ChIP–seq peaks in our dataset overlap by at least one base pair (bp) with G4 ChIP–seq peaks with a hypergeometric P value of 1.184 × 10–126 (Fig. 2a,b). Substantial overlaps (3,965 and 3,648 out of 12,063 peaks, or 33% and 30%, respectively) were also observed after expanding the size windows of the overlap analysis to 8 and 30 bp (Supplementary Table 1), which correspond to the lengths of the YY1-binding motif and G4 DNA, respectively.

Fig. 2: YY1 interacts with G4 structure in cells.
figure 2

a, Comparisons of YY1 ChIP–seq data with BG4 ChIP–seq data. b, A Venn diagram showing the overlap between YY1 ChIP–seq and BG4 ChIP–seq peaks. c, Comparison of YY1 ChIP–seq data acquired for cells with or without (Ctrl) PDS/TMPyP4 treatment. The P values were calculated by using the hypergeometric test. d,e, Enrichment ratios of YY1 ChIP–seq peaks containing different motifs after treatment with PDS (d) or TMPyP4 (e). BG4, peaks overlapped with BG4 ChIP–seq peaks; Y, peaks containing YY1 consensus motif; BG4Y, peaks containing both YY1 consensus motif and overlapped with BG4 ChIP–seq. The data in d and e represent mean ± s.e.m. In d, n = 370, 1,194 and 638 for BG4, Y and BG4Y, respectively; in e, n = 378, 1,198 and 639 for BG4, Y and BG4Y, respectively. In d and e the two-tailed Student’s t-test was used to calculate the P values. ***P < 0.001; P = 3.1 × 10−15 for Y and P = 1.6 × 10−20 for BG4Y in d; P = 2.0 × 10−6 for Y and P = 5.0 × 10−11 for BG4Y in e. f,g, Chromatin localization of YY1 after PDS (f) or TMPyP4 (g) treatment, where the experiments were performed independently three times with similar results.

Source data

It is worth noting that 3,086 out of the 8,955 BG4 ChIP–seq peaks correspond to potential quadruplex-forming sequences (GGGN1–7GGGN1–7GGGN1–7GGG, with N being any of the four canonical nucleotides)4. This parallels the previous G4-sequencing results and shows that ~70% of observed G4 sequences could not be assigned to the computationally predicted quadruplex-forming sequences21. This was attributed to the formation of G4 structures with long loop regions (>7 bases) and/or the involvement of noncontiguous guanine(s) in G4 formation21. In this vein, it is of note that the BG4 antibody employed for the ChIP–seq analysis recognizes G4 structures in general, whereas it cannot differentiate parallel, antiparallel and mixed parallel/antiparallel configurations of G4 structures5.

We also statistically analyzed the peak width and distribution of signal enrichment for the overlapping peaks identified from the YY1 ChIP–seq and BG4 ChIP–seq results (Extended Data Fig. 4). We found that the mean widths for YY1 and BG4 ChIP–seq peaks are very similar, and the enrichment signal summits for YY1 and BG4 ChIP–seq peaks overlap with each other (Extended Data Fig. 4). Among the 4,721 overlapping peaks in BG4 ChIP–seq and our YY1 ChIP–seq datasets, 2,603 are located in gene promoters and 925 harbor the YY1 consensus motif (Supplementary Table 1). High levels of overlap were also found for the BG4 ChIP–seq dataset and two other YY1 ChIP–seq datasets from the NCBI GEO database (Supplementary Fig. 6c,d and Supplementary Table 1). Among the 4,603 and 3,186 overlapped peaks found between BG4 ChIP–seq and the two YY1 ChIP–seq datasets (GSM1010753 and GSM110897), 2,504 and 1,684 are located in gene promoters, and 1,195 and 1,120 contain the YY1 consensus motif, respectively (Supplementary Table 1). These results revealed that YY1 binds with G4 structures in cells, and more than half of the overlapped peaks are located in gene promoters.

We also assessed how treatment with PDS and TMPyP4, which could bind to G4 structures in cells22,23, impacts the interaction between YY1 and G4 structures in cells. The results from ChIP–seq experiments with the YY1-TAPTAG cells showed that treatment with PDS or TMPyP4 led to significantly diminished interaction of YY1 with its binding regions (Fig. 2c and Supplementary Fig. 7), with the most pronounced diminutions being observed for those YY1 ChIP–seq peaks that overlap with BG4 ChIP–seq peaks (Fig. 2d,e). Moreover, treatment of cells with PDS or TMPyP4 resulted in a progressive decrease in the chromatin localization of YY1 (Fig. 2f,g), where we co-treated cells with MG132, a proteasome inhibitor, to minimize protein degradation during the treatment. Together, these results lent further evidence that YY1 binds with G4 structures in cells, and this binding could be perturbed by G4-stabilizing ligands.

YY1–G4 binding modulates YY1-mediated DNA looping

YY1 was recently shown to be an important regulator for long-range DNA looping15,24,25. We next assessed whether YY1–G4 binding functions in this process by employing HiChIP–seq, a method developed for monitoring protein-directed chromatin conformation26. The results from HiChIP–seq experiments with the use of YY1-TAPTAG cells showed that treatment with PDS or TMPyP4 led to substantial reductions in YY1-mediated DNA looping (Fig. 3a and Supplementary Fig. 8). Statistical analysis revealed that the percentage of DNA looping modulated by YY1 in a distance range of 5–200 kilobase (kb) is markedly attenuated upon treatment with PDS or TMPyP4 (35.6%, 9.4% and 9.9% for control, PDS- and TMPyP4-treated cells, respectively Fig. 3b). Likewise, treatment with PDS or TMPyP4 resulted in weaker interactions between the genomic regions occupied by YY1 (ChIP–seq peaks) and their surrounding regions (Fig. 3c and Supplementary Fig. 9). These results demonstrated that the role of YY1 in DNA looping involves its interaction with G4 structures.

Fig. 3: Disruption of the binding of YY1 with G4 structure attenuates YY1-mediated DNA looping.
figure 3

a, HiChIP interaction map of chromosome 17 in 293T cells that were untreated (left) or treated with PDS (middle) or TMPyP4 (right). b, Statistical analysis of HiChIP interaction distance of 5–200 kb after PDS or TMPyP4 treatment. c, A heat map depicting the interactions between YY1 ChIP–seq peaks and the surrounding region (−200 kb to 200 kb) after PDS or TMPyP4 treatment.

Source data

To further assess the role of G4 structure in YY1-mediated DNA looping, we asked how unwinding of G4 structure by ectopic overexpression of a G4 resolvase (that is, BLM helicase27) modulates the YY1-mediated DNA–DNA interaction (Extended Data Fig. 5). In this vein, we confirmed that overexpression of BLM led to substantially diminished signal for the G4 structure and YY1 binding at the selected loci (Extended Data Fig. 5a–d). In addition, HiChIP followed by a real-time quantitative PCR (HiChIP–qPCR) experiment showed that the BLM helicase-mediated unwinding of G4 structure abrogated the YY1-mediated DNA looping at these sites (Extended Data Fig. 5d,e).

We also mutated the G4-forming sequence at a specific genomic locus by using CRISPR–Cas9-based genome editing to disrupt its capability to fold into G4 structures (Fig. 4 and Supplementary Fig. 10). We found that the mutation indeed attenuated the formation of G4 structure markedly and pronouncedly diminished the binding of YY1 to the site, as revealed by BG4 ChIP–qPCR and YY1 ChIP–qPCR analyses, respectively (Fig. 4b,c). Moreover, HiChIP–qPCR results revealed that the mutation markedly reduced the YY1-mediated DNA looping involving the site (Fig. 4d). This result substantiated the involvement of YY1–G4 structure interaction in YY1-mediated DNA looping.

Fig. 4: YY1–G4 interaction and dimerization of YY1 promote long-range DNA looping.
figure 4

a, A schematic diagram showing the design of CRISPR–Cas9-based editing of an endogenous G4-forming sequence. PAM, protospacer adjacent motif. HA, homology arm. b,c, BG4 ChIP (b) and YY1 ChIP (c) enrichments of the site were substantially attenuated after mutation of the G4 sequence. d, YY1-mediated DNA looping was disrupted by mutation of the G4 sequence. Shown are the HiChIP–qPCR quantification results of YY1-mediated DNA looping in HEK293T cells with or without mutation of G4 sequence. e, A schematic diagram depicting the truncated YY1 protein that is defective in dimerization. aa, amino acid. f, ChIP enrichments for YY1, YY1∆231–290 and GST-YY1∆231–290 at sites 1 and 2. Sites 1 and 2 are the same sites as described in Extended Data Fig. 5. g, HiChIP–qPCR quantification results of YY1, YY1∆231–290- and GST-YY1∆231–290-mediated DNA looping. The data represent mean ± s.e.m. of results obtained from three independent experiments. The P values were calculated by using two-tailed Student’s t-test: **P < 0.01; ***P < 0.001; P = 0.0012 in b; P = 0.0015 in c; P = 0.0011 in d; P = 0.000031 and 0.0013 for YY1 and GST-YY1∆231–290, respectively, for site 1, and P = 0.0000034 and P = 0.0047 for YY1 and GST-YY1∆231–290, respectively, for site 2 in g.

Source data

Dimerization of YY1 promotes DNA–DNA interactions

YY1 protein is known to undergo dimerization, which promotes DNA–DNA interactions through binding of the protein with its consensus sequence motif in duplex DNA15,28. We also investigated whether YY1 dimerization enhances YY1-mediated DNA–DNA interactions in vitro. First, we confirmed, by using gel-filtration chromatography, that tag-free YY1 exists primarily as a dimer in solution (Extended Data Fig. 6). Second, our quantitative EMSA experiment revealed a 1:1 binding stoichiometry between YY1 and G4 DNA, indicating that each YY1 dimer can bind to two DNA G4 structures (Extended Data Fig. 6c,d).

Next, by employing a proximity ligation assay, we examined the role of YY1 in facilitating the interactions between different DNA elements, including duplex DNA harboring a YY1 consensus motif (motif), G4 DNA (G4) and its mutated counterpart that is unable to fold into G4 structure (M4) (Extended Data Fig. 7a). The results showed that the presence of YY1 led to augmented ligation efficiencies between G4 and G4, and between G4 and motif (Extended Data Fig. 7b). Moreover, addition of a YY1 motif-containing duplex competitor to the ligation mixture led to pronounced diminutions in the efficiency of ligation of DNA elements containing G4 and/or motif (Extended Data Fig. 7b). Parallel control experiments showed that YY1 promotes the ligation between motif and motif, but not between M4 and M4, motif and M4, or G4 and M4 (Extended Data Fig. 7c). Additionally, the inclusion of a G4-stabilizing agent, PDS or TMPyP4, in the ligation mixture led to markedly diminished ligation between G4 and G4 (Extended Data Fig. 7d). These results support that the dimerization of YY1 and its binding with G4 DNA enable long-range interactions between G4 DNA and another G4 DNA or a consensus motif-harboring duplex DNA.

To further examine the role of dimerization of YY1 in DNA looping, we generated a separation-of-function mutant of YY1, which maintains its capability for recognition of G4 structures, but loses its ability to dimerize. Considering that YY1231–414 and YY1293–414 are present as a dimer and a monomer, respectively (Extended Data Fig. 6a,b), we generated a truncated variant of YY1 with the linker region between the REPO and zinc-finger domains being deleted (YY1∆231–290) (Fig. 4a). We found that this truncated form of YY1 exists as a monomer in solution, while maintaining an approximately tenfold selectivity in binding toward G4 over M4 DNA (Extended Data Fig. 8). Owing to the inherent dimerization ability of GST, GST-YY1∆231–290 is present as a dimer in solution (Extended Data Fig. 8). Our proximity ligation assay results showed that the GST-tagged YY1∆231–290, but not its tag-free counterpart, enhanced the ligation between two G4 DNA probes (Extended Data Fig. 8e), supporting that dimerization of YY1 is crucial for DNA–DNA interaction in vitro.

We next asked how loss of dimerization modulates the ability of YY1 in promoting DNA looping in cells. Our ChIP–qPCR results revealed that both YY1∆231–290 and GST-YY1∆231–290 are capable of binding toward G4 structures in cells (Fig. 4f); YY1∆231–291, however, fails to promote DNA looping in cells, and fusion with the GST tag rescues the ability of the truncated protein to enhance DNA looping (Fig. 4g). These results, therefore, demonstrate that dimerization of YY1 is indispensable for its function in promoting DNA looping in cells.

YY1–G4 binding regulates gene expression

YY1 was first identified as a transcription factor for its function in activation or suppression of gene expression15,29, and the above results showed that PDS and TMPyP4 are capable of disrupting the interactions between YY1 and DNA G4 structures (Figs. 1d–f and 2c–h). Therefore, we next asked whether genetic depletion of YY1, or treatment with PDS or TMPyP4, influences those genes that are modulated by G4 structure. For this purpose, we performed RNA-seq analysis to assess, at the transcriptome-wide scale, the alterations in gene expression elicited by genetic depletion of YY1 and/or treatment with PDS (Fig. 5, Extended Data Fig. 9 and Supplementary Figs. 11 and 12). Statistical analysis showed a strong correlation between PDS-regulated and YY1-regulated genes (Pearson r > 0.67), underscoring the role of YY1–G4 structure interaction in YY1-mediated gene regulation (Fig. 5, Extended Data Fig. 9 and Supplementary Fig. 11). Moreover, the RNA-seq results revealed that YY1 can positively or negatively regulate the expression of genes through its interaction with G4 structures.

Fig. 5: Regulation of gene expression by YY1 promoter G4 interactions.
figure 5

ad, Dot plot showing the correlation of gene expression between PDS-treated and shYY1-1-treated (a), shYY1-2-treated (b), shYY1-1 + PDS-treated (c), or shYY1-2 + PDS-treated (d) cells. Shown are average ratios of RNA-seq results from two biological replicates; n = 14,707 genes for Pearson correlation calculation in ad. eg, Quantification results of mRNA expression levels of MANF (e), PDHB (f) and PGBD5 (g) genes in HEK293T cells with shRNA-mediated knockdown of YY1 and/or after PDS/TMPyP4 treatment. Top of each panel shows read enrichments from BG4 ChIP–seq and YY1 ChIP–seq. The data represent mean ± s.e.m. of results from three independent experiments.

Source data

We also validated the RNA-seq results by assessing the messenger RNA expression levels of six representative genes (that is, MANF, PDHB, MYC, PGBD5, SLC25A28 and TMEM145) by using reverse transcription-quantitative PCR (RT–qPCR). These six genes were chosen on the basis of the presence of G4 structures and the occupancy of YY1 in their promoter regions (Fig. 5 and Extended Data Fig. 9). The results showed that the expression levels of MANF, PDHB and MYC genes were markedly attenuated, whereas those of PGBD5, SLC25A28 and TMEM145 genes were substantially elevated in HEK293T cells upon treatment with PDS or TMPyP4, or upon short hairpin RNA (shRNA)-mediated knockdown of YY1 (Fig. 5, Extended Data Fig. 9 and Supplementary Fig. 12).

We also assessed whether the long-range DNA looping elicited by YY1–G4 binding plays a role in gene regulation. To this end, we monitored the mRNA expression levels of two genes, TRMT12 and EHD3, which do not carry G4 structures in their promoters but are associated with G4 structures at a remote site via YY1-mediated DNA looping (Fig. 6a,b and Extended Data Fig. 10a,b). Our results showed that the mRNA expression levels of these two genes in HEK293T cells were markedly diminished upon shRNA-mediated knockdown of YY1, or upon treatment with PDS or TMPyP4 (Fig. 6c and Extended Data Fig. 10c).

Fig. 6: YY1–G4 binding participates in transcription regulation of the TRMT12 gene through long-range DNA looping.
figure 6

a, Read enrichments obtained from BG4 ChIP–seq and YY1 ChIP–seq results. b, Normalized interaction frequency involving the sites linked with a red arch in a in HEK293T cells with or without PDS or TMPyP4 treatment, as captured by YY1 HiChIP. c, Quantification of mRNA expression levels of TRMT12 in HEK293T cells after knockdown of YY1 and/or with PDS/TMPyP4 treatment. The data represent mean ± s.e.m. of results from three independent experiments.

Source data

Next, we examined whether this observation can be extended to other genes at the genome-wide scale by integrating the results obtained from YY1 ChIP–seq, BG4 ChIP–seq, HiChIP–seq and RNA-seq data. To this end, we focused on those genes whose promoters are occupied by YY1 (based on YY1 ChIP–seq data), that do not fold into G4 structures (based on BG4 ChIP–seq data) and that interact with remote sites through YY1-mediated DNA looping (based on YY1 HiChIP–seq data). We divided these genes into two groups based on whether the remote site adopts G4 structure (based on BG4 ChIP–seq) (Supplementary Fig. 13). We found that PDS treatment differentially modulated the expression of these two groups of genes, with more pronounced changes being observed for those genes interacting with a distal site carrying G4 structure than for those without (Supplementary Fig. 13). Moreover, shRNA-mediated knockdown of YY1 abrogated the differences elicited by PDS treatment, and this is true for both the up- and downregulated genes (Supplementary Fig. 13). These results, therefore, support that the role of YY1 in transcriptional regulation involves, in part, the long-range DNA looping forged by its interaction with G4 structures.

Discussion

YY1 is an extensively studied transcription factor and it functions in many important biological processes12,13,15, where duplex DNA sequences with a consensus motif were previously identified as YY1-binding sites15,20. Here, we demonstrated that YY1, apart from binding to its consensus motif sequences in duplex DNA, is capable of interacting directly with G4 structures in vitro and in cells (Figs. 1 and 2). Further analysis showed that the C-terminal zinc-finger domain of YY1 is indispensable for its interaction with G4 structure. Our results from the YY1Δ231–290 truncation mutant showed that recognition of G4 structures by YY1 also entails the N-terminal portion of the protein, although the linker region between the REPO and zinc-finger domains of the protein is dispensable for this recognition (Extended Data Figs. 3 and 8). It will be important to solve, in the future, the structure of YY1 in complex with G4 DNA so as to gain insights into the similarities and differences in YY1’s recognition of G4 structure versus duplex DNA.

Spatial organization is a fundamental element in three-dimensional (3D) genome architecture and transcription regulation; however, not much is known about how 3D genome architecture is regulated and which proteins are involved15,30,31,32. In this vein, CTCF and cohesin were shown to contribute to 3D genome organization30,32. Recently, YY1 was found to function in this process by enabling enhancer–promoter interactions15. Our results unveiled a molecular determinant for the function of YY1 in this process, that is, through its binding with G4 structures, where displacement of YY1 from G4 structures with small-molecule G4 ligands, unwinding of G4 structures by overexpression of BLM helicase and mutation of a G4-forming sequence using CRISPR–Cas9-mediated genome editing all disrupt YY1-mediated DNA looping (Fig. 3, Extended Data Figs. 5 and 7 and Supplementary Fig. 10).

On the basis of our results, we propose a mechanistic model, where the dimerization of YY1 allows for its concurrent binding of two consensus sequence motifs, two G4 structures or one consensus sequence motif and one G4 structure, thereby enabling DNA looping. Our model is supported by results from proximity ligation assays showing that YY1 can promote the ligation between these DNA elements (Extended Data Fig. 7), from analytical gel-filtration assays revealing that YY1 forms a dimer in vitro (Extended Data Fig. 6) and from quantitative EMSA assays demonstrating that YY1 binds to G4 DNA in a 1:1 stoichiometry (Extended Data Fig. 6). Moreover, our identification of a separation-of-function mutant of YY1, which is defective in dimerization, but proficient in discriminating G4 structure from single-stranded DNA, and demonstration that the mutation diminishes markedly the YY1-mediated DNA looping in cells lends further support to our mechanistic proposal (Fig. 4).

Disruption of binding between YY1 and G4 structure with the use of G4-stabilizing ligands, PDS or TMPyP4, and genetic depletion of YY1 led to altered expression of not only those genes harboring G4 structures in their promoters (Fig. 5 and Extended Data Fig. 9), but also those genes that do not carry G4 structures in their promoters but that are linked with G4 structure at a remote site through YY1-mediated DNA looping (Fig. 6 and Extended Data Fig. 10). These results support that YY1–G4 interaction not only regulates the expression of proximal genes, but also that of distal genes through long-range DNA looping. This represents a very important mechanism for YY1-regulated gene expression.

Numerous studies have documented the association between YY1 and human cancer, and YY1 is also considered as a potential prognostic biomarker and therapeutic target33,34. In this vein, YY1 is overexpressed in multiple types of cancer, and its overexpression is also correlated with poor therapeutic outcomes33,34. In addition, YY1 is known to regulate a number of cancer-related genes33,34. Furthermore, G4 structures are over-represented in many cancer-related genes, and multiple studies revealed markedly increased presence of G4 structures in cancer cells35, indicating the involvement of G4 in cancer development. In this study, we reveal that the functions of YY1 in modulating gene expression and DNA looping depend, at least in part, on its binding with G4 structure. Hence, perhaps the interaction between YY1 and G4 can be exploited for therapeutic interventions for human cancer in the future.

Together, our work unveils a previously unrecognized mode of DNA recognition for a well-studied transcription factor, YY1 (that is, in its binding with G4 DNA), and reveals the role of this type of molecular recognition in modulating long-range DNA looping and gene expression. Our study also uncovers that the G4 structure-mediated gene regulation can occur not only through G4 structures in promoter regions of genes, but also via distal G4 structures that are brought into close proximity through long-range DNA looping. Therefore, the present work expands molecular determinants for DNA looping, and offers insight into the functions of YY1 and G4 structures in gene regulation.

Methods

Cell lines

HEK293T (293T) cells were purchased from ATCC. Cells were maintained in DMEM (Life Technologies) supplemented with 10% FBS (Invitrogen) and 100 unit ml−1 penicillin/streptomycin at 37 °C in a humidified incubator with 5% CO2.

Generation of recombinant YY1 proteins

The coding sequence (CDS) of human YY1 gene was amplified by PCR and inserted into the pGEX plasmid linearized by BamHI and XhoI restriction digestion enzymes. For truncated proteins, the corresponding CDS was amplified by PCR and inserted into the pGEX plasmid using the same method. The plasmids were transformed into Rosetta (DE3) pLysS Escherichia coli cells and the cells cultured in LB medium at 37 °C. After the optical density (OD600) reached approximately 0.6, the culture was cooled to 20 °C and the protein expression was induced with 1 mM IPTG (Sigma) at 20 °C for approximately 12 h. The cells were then collected and lysed in a buffer containing 20 mM Tris (pH 7.5), 500 mM NaCl, 10% glycerol and 2 mM DTT. After centrifugation, the supernatant was collected and GST-tagged proteins were purified with Glutathione Superflow Agarose (Pierce) following the manufacturer’s recommended procedures. Elution was performed using a buffer containing 20 mM Tris (pH 7.5), 250 mM NaCl, 10% glycerol, 2 mM DTT and 20 mM reduced glutathione. The GST-tagged protein was digested with 3C protease overnight to remove the GST tag. The tag-free protein was further loaded onto the heparin column (GE Healthcare) and eluted with a NaCl gradient (0.25–1.0 M) in a buffer containing 20 mM Tris (pH 7.5), 10% glycerol and 2 mM DTT. The fractions containing the target protein were pooled, concentrated and further purified by size-exclusion chromatography on a HiLoad 16/600 Superdex 200 pg column (GE Healthcare). The proteins were quantified using Bradford Protein Assay Kit (Bio-Rad), and their purities verified by SDS–PAGE analysis and A260/280 measurement (Supplementary Fig. 5 and Extended Data Fig. 8). The analytical gel-filtration chromatography was performed using a Superdex 200 10/300 column (GE Healthcare) in a buffer containing 20 mM Tris–HCl (pH 7.5), 250 mM KCl, 10% glycerol and 5 mM DTT.

Fluorescence anisotropy

Fluorescence anisotropy measurements were conducted as previously described11. Briefly, 5-carboxytetramethylrhodamine (TAMRA)-labeled DNA probes were annealed in a buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl and 0.1 mM EDTA to form G4 structure. The binding assays were performed with 10 nM labeled DNA and the indicated concentrations of recombinant YY1 protein in a binding buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl, 10 μM ZnCl and 1 mM DTT. After a 30-min incubation on ice, fluorescence anisotropy was measured on a Horiba QuantaMaster-400 spectrofluorometer (Photon Technology International), with the excitation and emission wavelengths being 550 and 580 nm, respectively. The instrument G factor was determined before anisotropy measurements. The data were fitted according to the following equation:

$$\begin{array}{l}A_{\mathrm{obs}} = A_0 + \Delta A\\ \times \displaystyle\frac{{[{\mathrm{DNA}}] + [{\mathrm{Protein}}] + K_{\mathrm{d}} - \sqrt {([{\mathrm{DNA}}] + [{\mathrm{Protein}}] + K_{\mathrm{d}})^2 - 4 \times [{\mathrm{DNA}}][{\mathrm{Protein}}]} }}{{2 \times [{\mathrm{DNA}}]}}\end{array}$$

In the equation, [DNA] and [Protein] denote the DNA and protein concentrations, respectively. Aobs is the observed anisotropy value. A0 is the anisotropy value in the absence of protein. ΔA represents the total change in anisotropy between free and fully bound DNA, and Kd is the equilibrium dissociation constant36.

EMSA

For protein–DNA binding, 25 nM TAMRA-labeled DNA was incubated with different concentrations of full-length YY1 protein, or its truncated or mutated counterparts, in a binding buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl, 10 μM ZnCl, 1 mM DTT and 3% glycerol on ice for 30 min. The samples were then loaded onto a 6% polyacrylamide gel in TBE buffer (40 mM Tris–HCl, pH 8.3, 45 mM boric acid and 1 mM EDTA) at 4 °C. The samples were run at 120 V at 4 °C for 30 min. The gels were imaged with Odyssey Imaging Systems (LI-COR Biosciences).

CD spectroscopy

The DNA probes (2.5 µM) were annealed in a buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl and 0.1 mM EDTA to form G4 structure. The CD spectra for the DNA probes were recorded at room temperature on a Jasco-815 spectrometer in the wavelength range of 200–320 nm, and the scan rate was 1 nm s−1. For assessing how YY1 binding modulates G4 folding, the annealed DNA probes were incubated with an equal concentration of the purified YY1 protein at room temperature for 30 min, and the CD spectra were subsequently acquired as noted above. The CD spectrum for the YY1 protein itself was then subtracted from the composite CD spectra of the YY1–G4 DNA complexes.

In vitro proximity ligation assay

For proximity ligation assay, 20 nM DNA fragments containing G4 or YY1 consensus motif were incubated with 100 nM YY1 protein in a binding buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl, 10 μM ZnCl2 and 1 mM DTT at 20 °C for 30 min. The mixture was subsequently diluted and the DNA fragments were ligated in a mixture containing 2 U µl−1 T4 DNA ligase, 2 U µl−1 T4 polynucleotide kinase and 1× T4 DNA ligase buffer at 30 °C for 10 min. The reactions were terminated immediately afterwards by heating at 65 °C for 5 min. The resulting mixture was diluted, and real-time quantitative PCR was employed to quantify the ligation products. The sequences for the DNA fragments and primers used in the quantitative PCR are listed in Supplementary Table 2.

Chromatin fractionation and western blot

Chromatin fractionation was performed as described37. The cells were grown to 60–70% confluence, incubated with 20 µM MG132 for 4 h and then treated with 20 µM PDS or 5 µM TMPyP4 at 37 °C for 12 h. The chromatin fraction was isolated using a stepwise procedure with a cytoplasmic lysis buffer (10 mM Tris–HCl, pH 8.0, 0.34 M sucrose, 3 mM CaCl2, 2 mM MgCl2, 0.1 mM EDTA, 1 mM DTT, 0.5% NP-40 and protease inhibitor cocktail), a nuclear lysis buffer (20 mM HEPES, pH 7.9, 1.5 mM MgCl2, 1 mM EDTA, 150 mM KCl, 0.1% NP-40, 1 mM DTT, 10% glycerol and protease inhibitor cocktail) and a chromatin isolation buffer (20 mM HEPES, pH 7.9, 1.5 mM MgCl2, 150 mM KCl, 10% glycerol, protease inhibitor cocktail and 0.15 unit μl−1 benzonase). The proteins were again quantified by Bradford assay. After separation on a 10% SDS–PAGE, the proteins were transferred to a nitrocellulose membrane (Bio-Rad). After blocking with blotting-grade blocker (Bio-Rad), the membrane was incubated in a solution containing primary antibody and 5% BSA for 2 h, and then incubated in a 5% blotting-grade blocker containing the horseradish peroxidase (HRP)-conjugated secondary antibody. The western blot signal was detected using ECL western blotting detection reagent. Primary antibodies used in this study included histone H3 (Cell Signaling Technology, 9715S; 1:10,000), YY1 (Santa Cruz Biotechnology, SC-7341; 1:2,000), anti-rabbit IgG peroxidase (Sigma, A0545; 1:10,000), anti-mouse IgG HRP (Santa Cruz Biotechnology, SC-516102; 1:10,000) and β-actin (Cell Signaling Technology, 4967S; 1:5,000).

RT–qPCR

RT–qPCR was conducted as previously described37. Total RNA was extracted with Omega Total RNA Kit I (Omega) and quantified. Reverse transcription was performed using M-MLV Reverse Transcriptase (Promega) for complementary DNA synthesis. RT–qPCR was carried out using iQ SYBR Green Supermix (Bio-Rad) on the CFX96 RT–qPCR detection system (Bio-Rad). Primers used for RT–qPCR are listed in Supplementary Table 2.

Targeted integration and mutation with CRISPR–Cas9

YY1-TAPTAG cell line was generated in HEK293T background using CRISPR–Cas9, following previously reported procedures11. The guide sequence for YY1 or G4 site was inserted into the pX330 plasmid (Addgene), which expresses hSpCas9. The guide sequence for YY1 was 5′-GAGAAGACCCTTCTCGACCA-3′. The donor plasmid for tagging YY1 (3× Flag–2× Strep tag) was synthesized (gBlock, Integrated DNA Technologies) and inserted into pUC19. For CRISPR-mediated mutation of an endogenous G4 site, the donor plasmid for the mutated G4 sequences was inserted into pUC19. After cotransfection with the two plasmids, the single-cell colonies were isolated and western blot was performed to screen for the insertion of the tandem tag using anti-YY1 antibody (SC-7341, Santa Cruz Biotechnology). Sanger sequencing was performed to confirm the mutation of the sequence for the endogenous G4 site.

ChIP–seq

ChIP experiments were conducted as previously described with a few modifications15,37. YY1-TAPTAG cells were cultured in DMEM medium with or without 20 µM PDS or 5 µM TMPyP4 for 12 h before cross-linking. Approximately 2 × 106 cells were cross-linked with 1/10 volume of freshly prepared 11% formaldehyde solution at room temperature for 10 min, and quenched with 125 mM glycine for 5 min. After washing with 1× PBS three times, the cells were resuspended in a lysis buffer I (50 mM HEPES–KOH, pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40, 0.25% Triton X-100 and protease inhibitors cocktail) at 4 °C on a rotator for 10 min.

After centrifugation, the pellet was resuspended in lysis buffer II (10 mM Tris–HCl, pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA and protease inhibitor cocktail) at 4 °C for 10 min with rotation. The pellet was subsequently resuspended in the sonication buffer (20 mM Tris–HCl, pH 8.0, 150 mM NaCl, 2 mM EDTA, 0.1% SDS, 1% Triton X-100 and protease inhibitor cocktail). Sonication was conducted using a Covaris S220 sonicator for 6 min with a peak incident power of 140 Watts, a duty cycle of 5% and 200 cycles per burst at 4 °C. After centrifugation at 16,000g for 15 min, the supernatant was incubated with anti-Flag antibody (Cell Signaling Technology, catalog no. 2368S) for 2 h. Protein A/G Plus-Agarose (Santa Cruz) was then added and the mixture was incubated at 4 °C overnight. After washing, DNA was eluted from the beads with 100 mM NaHCO3 and 1% SDS at 68 °C for 2 h. Cross-links were subsequently reversed and proteins digested with proteinase K at 65 °C overnight. After purification, RNA in the resulting DNA samples was removed with RNase A. Finally, the DNA was purified using QIAquick PCR Purification Kit (Qiagen).

The DNA-sequencing library was prepared using NEBNext Ultra DNA Library Prep Kit for Illumina (NEB) following the manufacturer’s instructions. The purified DNA libraries were subsequently quantified using an Agilent 2100 Bioanalyzer and multiplexed for sequencing on a NextSeq500 Sequencing System (Illumina).

The sequencing reads of ChIP–seq were checked with FastQC (v.0.68) and aligned to human hg19 reference genome using Bowtie2 (v.2.3.4.1) with the configuration of bowtie2 -q -N 0 -L 22 (ref. 37). Peak calling was performed using the model-based analysis of ChIP–seq (MACS2 v.2.1.1) with the following configuration: MACS2 callpeak -f BAM -g 2.7e+9 –n YY1-ChIP38. The UCSC Genome Browser was used to visualize the mapping results39. The overlap of ChIP–seq peaks was analyzed using custom script. In detail, each peak in the two BED files was compared, and those peaks that overlapped in the YY1 and BG4 ChIP–seq datasets by at least 1 bp were considered overlapping peaks, following previously published procedures3,40. Parallel analyses were also conducted by employing the criterion that the BG4 and YY1 ChIP–seq peaks overlap by at least 8 or 30 bp (Supplementary Table 1). For motif analysis, the genomic sequence of each peak was extracted, where the consensus motif of ‘AANATGGC’ and its complementary strand of ‘GCCATNTT’ were searched and counted. The ChIP–seq peaks were visualized using the UCSC Genome Browser on Human GRCh37/hg19.

For the BG4 ChIP–qPCR experiment, chromatin was immunoprecipitated using BG4 antibody, which was purified as described7, following the aforementioned procedures. After purification of the BG4 antibody-enriched DNA fragments, quantitative PCR was performed using the primers listed in Supplementary Table 2.

HiChIP–seq and HiChIP–qPCR

HiChIP was conducted following the previously described procedures15. Before cross-linking, the YY1-TAPTAG cells were mock-treated, or treated with 20 µM PDS or 5 µM TMPyP4 for 12 h. Approximately 5 × 106 cells were cross-linked using freshly prepared 1% formaldehyde solution at room temperature for 10 min, and quenched with 125 mM glycine for 5 min. After washing three times, the cells were incubated in HiChIP lysis buffer (10 mM Tris–HCl, pH 8.0, 10 mM NaCl and 0.2% IGEPAL CA-630 with protease inhibitors cocktail) at 4 °C for 30 min with rotation. After centrifugation, the cells were washed again with HiChIP lysis buffer and centrifuged. The resulting pellet was resuspended in 0.5% SDS and incubated at 60 °C for 10 min. To the mixture was added 1.5% Triton X-100 to quench the SDS. After incubation at 37 °C for 15 min, the chromatin was then digested with HindIII (NEB) in NEB CutSmart Buffer (NEB) at 37 °C for 6 h. The restriction enzyme was subsequently inactivated by heating at 65 °C for 20 min. Biotin-labeled dCTP was incorporated into the restriction fragment overhangs in the Fill-in master mix (0.25 mM biotin-14-dCTP, 0.25 mM dCTP, 0.25 mM dGTP, 0.25 mM dTTP and 50 U Klenow fragment of DNA Polymerase I in NEB CutSmart Buffer). The reaction was continued at 37 °C for 2 h with rotation. Labeled DNA was ligated in a ligation master mix (1% Trion X-100, 0.1 mg ml−1 BSA and 250 U T4 DNA ligase in T4 Ligation Buffer) at 16 °C for 4 h with rotation.

The ligated chromatin was collected by centrifugation at 2,500g for 5 min and resuspended in ChIP sonication buffer (50 mM HEPES–KOH, pH 7.5, 140 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100, 0.1% sodium deoxycholate, 0.1% SDS and protease inhibitor cocktail). Chromatin immunoprecipitation was conducted under the same conditions as described above. After elution, RNA and proteins in the samples were removed by digestion with RNase A and proteinase K, respectively. The samples were subsequently incubated at 65 °C overnight to reverse the cross-links, and the DNA was purified from the mixture using QIAquick PCR Purification Kit (Qiagen).

After purification, the fragmented DNA was enriched using the streptavidin C1 magnetic beads (Invitrogen, catalog no. 65001). The beads were washed twice with Tween washing buffer (5 mM Tris–HCl, pH 7.5, 0.5 mM EDTA, 1 M NaCl, 0.05% Tween-20) and subsequently resuspended in 2× biotin binding buffer (10 mM Tris–HCl, pH 7.5, 1 mM EDTA, 2 M NaCl). An equal volume of purified DNA was added, and incubated at room temperature for 15 min. The magnetic beads were then washed twice with Tween washing buffer, twice with binding buffer and then once with TE buffer (10 mM Tris–HCl, pH 8.0, 0.1 mM EDTA). The magnetic beads were collected and the DNA-sequencing libraries were constructed using NEBNext Ultra DNA Library Prep Kit for Illumina (NEB) following the manufacturer’s instructions. The purified DNA libraries were subsequently quantified using an Agilent 2100 Bioanalyzer and multiplexed for sequencing on a NextSeq500 Sequencing System (Illumina).

For HiChIP–seq data analysis, the reads were aligned to human hg19 reference genome using the Map with BWA-MEM (v.0.7.15.1.) tool with a gap extension penalty setting of 50 (bwa mem -A1 -B4 -E50 -L0)41. The interaction matrix was subsequently built with hicBuildMatrix (v.2.1.0) tool with the restriction site being set to HindIII (hicBuildMatrix –binSize 5000 –restrictionSequence AGCT)42. The visualization of the HiChIP data was performed with the hicPlotMatrix (v.2.1.0) tool42.

For the HiChIP–qPCR experiment, the DNA was purified after chromatin immunoprecipitation and was used for quantitative PCR with the primers listed in Supplementary Table 2.

RNA-seq and data analysis

Total RNA was extracted from cells with the Omega Total RNA Kit I. The NEBNext Ultra II RNA Library Prep Kit for Illumina was used to produce the RNA-sequencing library. Ribosomal RNAs were removed using the NEBNext rRNA Depletion Kit, and the resulting RNA samples were treated with DNase I to remove residual DNA. The rRNA-depleted RNA was incubated in NEBNext First Strand Synthesis Reaction Buffer by heating at 94 °C for 15 min on a thermal cycler. The fragmented RNA was reverse-transcribed using First Strand synthesis enzyme Mix (NEB) with random primers. The NEBNext Second Strand Synthesis Enzyme Mix (NEB) was used to perform the second-strand synthesis. After purification with AMPure XP Beads, the sequencing library was subsequently prepared as described above. The DNA was barcoded with NEBNext Multiplex Oligos for Illumina (NEB), and the resulting library was subjected to sequencing analysis on an Illumina HiSeq4000 instrument.

The sequencing reads of RNA-seq were aligned to human hg38 reference genome using HISAT2 (v.2.1.0) with the configuration of hisat2 -q -x -1 -2 -S. Transcript abundance was determined using featureCounts (v.1.6.4) with the configuration of featureCounts -T 4 -t exon -g gene_id. Differential gene expression was analyzed with DESeq2 (v.2.11.40.6).

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.