Abstract
The DNA guanine quadruplexes (G4) play important roles in multiple cellular processes, including DNA replication, transcription and maintenance of genome stability. Here, we showed that Yin and Yang 1 (YY1) can bind directly to G4 structures. ChIP–seq results revealed that YY1-binding sites overlap extensively with G4 structure loci in chromatin. We also observed that the dimerization of YY1 and its binding with G4 structures contribute to YY1-mediated long-range DNA looping. Displacement of YY1 from G4 structure sites disrupts substantially the YY1-mediated DNA looping. Moreover, treatment with G4-stabilizing ligands modulates the expression of not only those genes with G4 structures in their promoters, but also those associated with distal G4 structures that are brought to close proximity via YY1-mediated DNA looping. Together, we identified YY1 as a DNA G4-binding protein, and revealed that YY1-mediated long-range DNA looping requires its dimerization and occurs, in part, through its recognition of G4 structure.

Similar content being viewed by others
Main
The DNA guanine quadruplexes (G4) are stable secondary structures of DNA that form in guanine-rich regions of the genome1. Although the in vitro formation of G4 has been known for decades, only recently have the formation and biological functions of G4 in cells been demonstrated1,2,3. Computational analysis of potential G4-forming sequences uncovered more than 376,000 G4 sequence motifs in the human genome4. Additionally, ChIP–seq analysis with the use of G4 structure-specific antibodies revealed approximately 10,000 potential G4 sites in chromatin of cultured human cells5,6,7. Moreover, several DNA G4-recognition proteins have been identified, including DNMT1, hnRNP A1, nucleolin, PARP1, SLIRP, etc.3,8,9,10,11. The large number of G4 sites detected in chromatin and the existence of multiple cellular proteins for G4 structure recognition suggest the crucial functions of G4 structures in cells. Indeed, G4 structures have been shown to assume important roles in various DNA metabolic processes, including replication, transcription and repair1,2,3.
YY1, a zinc-finger-containing transcription factor12, is ubiquitously expressed and modulates many crucial biological and cellular processes, including DNA replication/transcription/repair, cell proliferation/differentiation and embryogenesis12,13,14. Recently, YY1 was found to promote the formation of enhancer–promoter loops, which involves interaction of YY1 with duplex DNA harboring a consensus YY1-binding motif15.
Here, by using an unbiased quantitative proteomic method, we identified YY1 as a putative G4-binding protein. We also demonstrated that YY1 is capable of interacting directly with G4 DNA in vitro and in cells, and this interaction contributes to YY1-mediated long-range DNA looping and gene expression regulation.
Results
Identification of YY1 as a candidate G4-binding protein
We recently reported a stable isotope labeling by amino acid in cell culture (SILAC)-based quantitative proteomic workflow for the discovery of G4-binding proteins, which involved the use of biotin-conjugated G4 DNA probes derived from the promoters of KIT and MYC genes and the human telomere (hTEL), along with the corresponding mutated sequences incapable of folding into G4 structures (M4)11. The proteomic data also revealed YY1 as a candidate G4-binding protein, with the SILAC protein ratios (G4/M4 probes) being 1.92 ± 0.11, 1.77 ± 0.11 and 2.02 ± 0.21 for the cKIT, cMYC and hTEL probes, respectively (Extended Data Fig. 1). Representative electrospray ionization–mass spectrometry (ESI–MS) and MS/MS for a tryptic peptide derived from YY1, FAQSTNLK, are shown in Supplementary Figs. 1 and 2.
YY1 binds directly with G4 structures in vitro
We next asked whether the interaction between YY1 and G4 structures is direct, by measuring the binding affinities between recombinant human YY1 protein and G4 or M4 probes using fluorescence anisotropy (Fig. 1). To this end, we first purified GST-tagged YY1 protein and, because GST is known to form a homodimer16, we removed the GST tag and further purified the tag-free YY1 protein (see Methods). The results showed that recombinant YY1 binds strongly with all three G4 structures, with the Kd values being 26.7 ± 2.2, 26.3 ± 2.1 and 82.0 ± 5.3 nM for G4 structures derived from cKIT, cMYC and hTEL, respectively (Fig. 1a–c). The binding affinities toward M4 probes are at least tenfold lower than that toward G4 DNA (Fig. 1a–c). These results demonstrate that YY1 binds directly and strongly with G4 structures in vitro.
a–c, Fluorescence anisotropy for monitoring the binding of YY1 protein with G4 structures and the corresponding mutated sequences derived from the promoters of KIT (a) and MYC (b) genes, or the human telomere (c). The data in a–c represent mean ± s.e.m. from three independent experiments. d–f, EMSA results showing the binding of YY1 toward G4 structure without treatment (d) or upon treatment with different concentrations of PDS (e) or TMPyP4 (f). The EMSA experiments were performed independently three times with similar results.
We also examined whether the binding between YY1 and G4 alters the folding of G4 structures. The circular dichroism (CD) spectra for the G4 DNA probes in YY1–DNA complexes are very similar to those of the free G4 DNA probes (Extended Data Fig. 2), underscoring that the interaction with YY1 does not alter the folding of G4 structures.
Pyridostatin (PDS) and 5,10,15,20-tetra-(N-methyl-4-pyridyl)porphyrin (TMPyP4) are small-molecule ligands that bind specifically to and stabilize G4 structures17,18. Our results from electrophoretic mobility-shift assay (EMSA) showed that both PDS and TMPyP4 could displace YY1 from G4 structures in vitro (Fig. 1d–f), further substantiating the capability of YY1 for binding directly with G4 structures.
Duplex DNA with a sequence motif of 5′-AANATGGC-3′ was previously identified as the consensus YY1-binding site19,20. For comparison, we also measured the Kd value for YY1 in binding with a consensus motif-carrying duplex DNA, and it turned out that the binding affinity (Kd = 11.6 ± 0.8 nM, Supplementary Fig. 3) was only slightly higher than those observed for its binding with G4 structures.
The role of YY1 zinc-finger domain in binding with G4
YY1 is highly conserved in vertebrates, where its homologs in human, chimpanzee, dog, mouse, rat and chicken share >80% amino acid sequence identity (Supplementary Fig. 4). YY1 contains a polyhistidine cluster, a REPO domain and four zinc-fingers (Extended Data Fig. 3a). The zinc-fingers, located in the C-terminal portion of YY1 and involved in its binding with duplex DNA, are identical among these species (Extended Data Fig. 3 and Supplementary Fig. 4), suggesting that the functions of the zinc-finger domain in YY1 are highly conserved.
To assess which segment of YY1 protein is involved in its binding with the G4 structure, we generated several truncated or mutated fragments of YY1 protein and measured their binding affinities towards G4 DNA (Extended Data Fig. 3 and Supplementary Fig. 5). Our results showed that YY1 fragments lacking the C-terminal zinc-finger (YY11–382), or with a cysteine in the third zinc-finger being mutated to a serine (YY1C360S), exhibit markedly diminished binding affinities toward the G4 structure (Extended Data Fig. 3b,c). On the other hand, truncated YY1 proteins without the N-terminal 230 (that is, YY1231–414) or 292 (that is, YY1293–414) amino acids display similar or slightly stronger binding affinities towards the hTEL G4 structure relative to full-length YY1 (Extended Data Fig. 3d,e). Neither YY1231–414 nor YY1293–414, however, exhibits selective binding toward G4 over M4 (Extended Data Fig. 3d,e). Together, the above results support that the C-terminal zinc-finger domain of YY1 is required for robust binding to G4 DNA, but not sufficient for the protein to discriminate G4 structure from single-stranded M4.
The capability of YY1 in binding to both consensus motif-containing duplex DNA and G4 DNA, and the requirement of the zinc-finger domain for both binding modes, suggest that these two types of bindings may compete with each other. To test this, we assessed the binding of YY1 with the G4 structure or duplex DNA harboring the YY1 consensus motif in the presence or absence of different concentrations of unlabeled G4 DNA or the consensus motif-bearing duplex DNA (Supplementary Fig. 3). It turned out that both competitors could repress, in a concentration-dependent manner, binding of YY1 with G4 or YY1 consensus motif-containing duplex DNA (Supplementary Fig. 3), underscoring the competitive binding of YY1 toward G4 structures and the consensus motif-containing duplex DNA.
YY1 binds with G4 structures in cells
We next investigated the interactions between YY1 and G4 structures in cells by using ChIP–seq analysis. To this end, we employed CRISPR–Cas9 to incorporate a tandem affinity tag (3× Flag–2× Strep) to the C terminus of endogenous YY1 protein in HEK293T cells (YY1-TAPTAG, Supplementary Fig. 6). ChIP–seq analysis with the use of anti-Flag antibody and the YY1-TAPTAG cells led to the detection of 12,063 significant peaks (Fig. 2). An analysis of the overlap between YY1 ChIP–seq peaks and G4 ChIP–seq peaks (a BG4 ChIP–seq dataset from the NCBI GEO database) showed that approximately 39% (4,721 out of 12,063) of YY1 ChIP–seq peaks in our dataset overlap by at least one base pair (bp) with G4 ChIP–seq peaks with a hypergeometric P value of 1.184 × 10–126 (Fig. 2a,b). Substantial overlaps (3,965 and 3,648 out of 12,063 peaks, or 33% and 30%, respectively) were also observed after expanding the size windows of the overlap analysis to 8 and 30 bp (Supplementary Table 1), which correspond to the lengths of the YY1-binding motif and G4 DNA, respectively.
a, Comparisons of YY1 ChIP–seq data with BG4 ChIP–seq data. b, A Venn diagram showing the overlap between YY1 ChIP–seq and BG4 ChIP–seq peaks. c, Comparison of YY1 ChIP–seq data acquired for cells with or without (Ctrl) PDS/TMPyP4 treatment. The P values were calculated by using the hypergeometric test. d,e, Enrichment ratios of YY1 ChIP–seq peaks containing different motifs after treatment with PDS (d) or TMPyP4 (e). BG4, peaks overlapped with BG4 ChIP–seq peaks; Y, peaks containing YY1 consensus motif; BG4Y, peaks containing both YY1 consensus motif and overlapped with BG4 ChIP–seq. The data in d and e represent mean ± s.e.m. In d, n = 370, 1,194 and 638 for BG4, Y and BG4Y, respectively; in e, n = 378, 1,198 and 639 for BG4, Y and BG4Y, respectively. In d and e the two-tailed Student’s t-test was used to calculate the P values. ***P < 0.001; P = 3.1 × 10−15 for Y and P = 1.6 × 10−20 for BG4Y in d; P = 2.0 × 10−6 for Y and P = 5.0 × 10−11 for BG4Y in e. f,g, Chromatin localization of YY1 after PDS (f) or TMPyP4 (g) treatment, where the experiments were performed independently three times with similar results.
It is worth noting that 3,086 out of the 8,955 BG4 ChIP–seq peaks correspond to potential quadruplex-forming sequences (GGGN1–7GGGN1–7GGGN1–7GGG, with N being any of the four canonical nucleotides)4. This parallels the previous G4-sequencing results and shows that ~70% of observed G4 sequences could not be assigned to the computationally predicted quadruplex-forming sequences21. This was attributed to the formation of G4 structures with long loop regions (>7 bases) and/or the involvement of noncontiguous guanine(s) in G4 formation21. In this vein, it is of note that the BG4 antibody employed for the ChIP–seq analysis recognizes G4 structures in general, whereas it cannot differentiate parallel, antiparallel and mixed parallel/antiparallel configurations of G4 structures5.
We also statistically analyzed the peak width and distribution of signal enrichment for the overlapping peaks identified from the YY1 ChIP–seq and BG4 ChIP–seq results (Extended Data Fig. 4). We found that the mean widths for YY1 and BG4 ChIP–seq peaks are very similar, and the enrichment signal summits for YY1 and BG4 ChIP–seq peaks overlap with each other (Extended Data Fig. 4). Among the 4,721 overlapping peaks in BG4 ChIP–seq and our YY1 ChIP–seq datasets, 2,603 are located in gene promoters and 925 harbor the YY1 consensus motif (Supplementary Table 1). High levels of overlap were also found for the BG4 ChIP–seq dataset and two other YY1 ChIP–seq datasets from the NCBI GEO database (Supplementary Fig. 6c,d and Supplementary Table 1). Among the 4,603 and 3,186 overlapped peaks found between BG4 ChIP–seq and the two YY1 ChIP–seq datasets (GSM1010753 and GSM110897), 2,504 and 1,684 are located in gene promoters, and 1,195 and 1,120 contain the YY1 consensus motif, respectively (Supplementary Table 1). These results revealed that YY1 binds with G4 structures in cells, and more than half of the overlapped peaks are located in gene promoters.
We also assessed how treatment with PDS and TMPyP4, which could bind to G4 structures in cells22,23, impacts the interaction between YY1 and G4 structures in cells. The results from ChIP–seq experiments with the YY1-TAPTAG cells showed that treatment with PDS or TMPyP4 led to significantly diminished interaction of YY1 with its binding regions (Fig. 2c and Supplementary Fig. 7), with the most pronounced diminutions being observed for those YY1 ChIP–seq peaks that overlap with BG4 ChIP–seq peaks (Fig. 2d,e). Moreover, treatment of cells with PDS or TMPyP4 resulted in a progressive decrease in the chromatin localization of YY1 (Fig. 2f,g), where we co-treated cells with MG132, a proteasome inhibitor, to minimize protein degradation during the treatment. Together, these results lent further evidence that YY1 binds with G4 structures in cells, and this binding could be perturbed by G4-stabilizing ligands.
YY1–G4 binding modulates YY1-mediated DNA looping
YY1 was recently shown to be an important regulator for long-range DNA looping15,24,25. We next assessed whether YY1–G4 binding functions in this process by employing HiChIP–seq, a method developed for monitoring protein-directed chromatin conformation26. The results from HiChIP–seq experiments with the use of YY1-TAPTAG cells showed that treatment with PDS or TMPyP4 led to substantial reductions in YY1-mediated DNA looping (Fig. 3a and Supplementary Fig. 8). Statistical analysis revealed that the percentage of DNA looping modulated by YY1 in a distance range of 5–200 kilobase (kb) is markedly attenuated upon treatment with PDS or TMPyP4 (35.6%, 9.4% and 9.9% for control, PDS- and TMPyP4-treated cells, respectively Fig. 3b). Likewise, treatment with PDS or TMPyP4 resulted in weaker interactions between the genomic regions occupied by YY1 (ChIP–seq peaks) and their surrounding regions (Fig. 3c and Supplementary Fig. 9). These results demonstrated that the role of YY1 in DNA looping involves its interaction with G4 structures.
a, HiChIP interaction map of chromosome 17 in 293T cells that were untreated (left) or treated with PDS (middle) or TMPyP4 (right). b, Statistical analysis of HiChIP interaction distance of 5–200 kb after PDS or TMPyP4 treatment. c, A heat map depicting the interactions between YY1 ChIP–seq peaks and the surrounding region (−200 kb to 200 kb) after PDS or TMPyP4 treatment.
To further assess the role of G4 structure in YY1-mediated DNA looping, we asked how unwinding of G4 structure by ectopic overexpression of a G4 resolvase (that is, BLM helicase27) modulates the YY1-mediated DNA–DNA interaction (Extended Data Fig. 5). In this vein, we confirmed that overexpression of BLM led to substantially diminished signal for the G4 structure and YY1 binding at the selected loci (Extended Data Fig. 5a–d). In addition, HiChIP followed by a real-time quantitative PCR (HiChIP–qPCR) experiment showed that the BLM helicase-mediated unwinding of G4 structure abrogated the YY1-mediated DNA looping at these sites (Extended Data Fig. 5d,e).
We also mutated the G4-forming sequence at a specific genomic locus by using CRISPR–Cas9-based genome editing to disrupt its capability to fold into G4 structures (Fig. 4 and Supplementary Fig. 10). We found that the mutation indeed attenuated the formation of G4 structure markedly and pronouncedly diminished the binding of YY1 to the site, as revealed by BG4 ChIP–qPCR and YY1 ChIP–qPCR analyses, respectively (Fig. 4b,c). Moreover, HiChIP–qPCR results revealed that the mutation markedly reduced the YY1-mediated DNA looping involving the site (Fig. 4d). This result substantiated the involvement of YY1–G4 structure interaction in YY1-mediated DNA looping.
a, A schematic diagram showing the design of CRISPR–Cas9-based editing of an endogenous G4-forming sequence. PAM, protospacer adjacent motif. HA, homology arm. b,c, BG4 ChIP (b) and YY1 ChIP (c) enrichments of the site were substantially attenuated after mutation of the G4 sequence. d, YY1-mediated DNA looping was disrupted by mutation of the G4 sequence. Shown are the HiChIP–qPCR quantification results of YY1-mediated DNA looping in HEK293T cells with or without mutation of G4 sequence. e, A schematic diagram depicting the truncated YY1 protein that is defective in dimerization. aa, amino acid. f, ChIP enrichments for YY1, YY1∆231–290 and GST-YY1∆231–290 at sites 1 and 2. Sites 1 and 2 are the same sites as described in Extended Data Fig. 5. g, HiChIP–qPCR quantification results of YY1, YY1∆231–290- and GST-YY1∆231–290-mediated DNA looping. The data represent mean ± s.e.m. of results obtained from three independent experiments. The P values were calculated by using two-tailed Student’s t-test: **P < 0.01; ***P < 0.001; P = 0.0012 in b; P = 0.0015 in c; P = 0.0011 in d; P = 0.000031 and 0.0013 for YY1 and GST-YY1∆231–290, respectively, for site 1, and P = 0.0000034 and P = 0.0047 for YY1 and GST-YY1∆231–290, respectively, for site 2 in g.
Dimerization of YY1 promotes DNA–DNA interactions
YY1 protein is known to undergo dimerization, which promotes DNA–DNA interactions through binding of the protein with its consensus sequence motif in duplex DNA15,28. We also investigated whether YY1 dimerization enhances YY1-mediated DNA–DNA interactions in vitro. First, we confirmed, by using gel-filtration chromatography, that tag-free YY1 exists primarily as a dimer in solution (Extended Data Fig. 6). Second, our quantitative EMSA experiment revealed a 1:1 binding stoichiometry between YY1 and G4 DNA, indicating that each YY1 dimer can bind to two DNA G4 structures (Extended Data Fig. 6c,d).
Next, by employing a proximity ligation assay, we examined the role of YY1 in facilitating the interactions between different DNA elements, including duplex DNA harboring a YY1 consensus motif (motif), G4 DNA (G4) and its mutated counterpart that is unable to fold into G4 structure (M4) (Extended Data Fig. 7a). The results showed that the presence of YY1 led to augmented ligation efficiencies between G4 and G4, and between G4 and motif (Extended Data Fig. 7b). Moreover, addition of a YY1 motif-containing duplex competitor to the ligation mixture led to pronounced diminutions in the efficiency of ligation of DNA elements containing G4 and/or motif (Extended Data Fig. 7b). Parallel control experiments showed that YY1 promotes the ligation between motif and motif, but not between M4 and M4, motif and M4, or G4 and M4 (Extended Data Fig. 7c). Additionally, the inclusion of a G4-stabilizing agent, PDS or TMPyP4, in the ligation mixture led to markedly diminished ligation between G4 and G4 (Extended Data Fig. 7d). These results support that the dimerization of YY1 and its binding with G4 DNA enable long-range interactions between G4 DNA and another G4 DNA or a consensus motif-harboring duplex DNA.
To further examine the role of dimerization of YY1 in DNA looping, we generated a separation-of-function mutant of YY1, which maintains its capability for recognition of G4 structures, but loses its ability to dimerize. Considering that YY1231–414 and YY1293–414 are present as a dimer and a monomer, respectively (Extended Data Fig. 6a,b), we generated a truncated variant of YY1 with the linker region between the REPO and zinc-finger domains being deleted (YY1∆231–290) (Fig. 4a). We found that this truncated form of YY1 exists as a monomer in solution, while maintaining an approximately tenfold selectivity in binding toward G4 over M4 DNA (Extended Data Fig. 8). Owing to the inherent dimerization ability of GST, GST-YY1∆231–290 is present as a dimer in solution (Extended Data Fig. 8). Our proximity ligation assay results showed that the GST-tagged YY1∆231–290, but not its tag-free counterpart, enhanced the ligation between two G4 DNA probes (Extended Data Fig. 8e), supporting that dimerization of YY1 is crucial for DNA–DNA interaction in vitro.
We next asked how loss of dimerization modulates the ability of YY1 in promoting DNA looping in cells. Our ChIP–qPCR results revealed that both YY1∆231–290 and GST-YY1∆231–290 are capable of binding toward G4 structures in cells (Fig. 4f); YY1∆231–291, however, fails to promote DNA looping in cells, and fusion with the GST tag rescues the ability of the truncated protein to enhance DNA looping (Fig. 4g). These results, therefore, demonstrate that dimerization of YY1 is indispensable for its function in promoting DNA looping in cells.
YY1–G4 binding regulates gene expression
YY1 was first identified as a transcription factor for its function in activation or suppression of gene expression15,29, and the above results showed that PDS and TMPyP4 are capable of disrupting the interactions between YY1 and DNA G4 structures (Figs. 1d–f and 2c–h). Therefore, we next asked whether genetic depletion of YY1, or treatment with PDS or TMPyP4, influences those genes that are modulated by G4 structure. For this purpose, we performed RNA-seq analysis to assess, at the transcriptome-wide scale, the alterations in gene expression elicited by genetic depletion of YY1 and/or treatment with PDS (Fig. 5, Extended Data Fig. 9 and Supplementary Figs. 11 and 12). Statistical analysis showed a strong correlation between PDS-regulated and YY1-regulated genes (Pearson r > 0.67), underscoring the role of YY1–G4 structure interaction in YY1-mediated gene regulation (Fig. 5, Extended Data Fig. 9 and Supplementary Fig. 11). Moreover, the RNA-seq results revealed that YY1 can positively or negatively regulate the expression of genes through its interaction with G4 structures.
a–d, Dot plot showing the correlation of gene expression between PDS-treated and shYY1-1-treated (a), shYY1-2-treated (b), shYY1-1 + PDS-treated (c), or shYY1-2 + PDS-treated (d) cells. Shown are average ratios of RNA-seq results from two biological replicates; n = 14,707 genes for Pearson correlation calculation in a–d. e–g, Quantification results of mRNA expression levels of MANF (e), PDHB (f) and PGBD5 (g) genes in HEK293T cells with shRNA-mediated knockdown of YY1 and/or after PDS/TMPyP4 treatment. Top of each panel shows read enrichments from BG4 ChIP–seq and YY1 ChIP–seq. The data represent mean ± s.e.m. of results from three independent experiments.
We also validated the RNA-seq results by assessing the messenger RNA expression levels of six representative genes (that is, MANF, PDHB, MYC, PGBD5, SLC25A28 and TMEM145) by using reverse transcription-quantitative PCR (RT–qPCR). These six genes were chosen on the basis of the presence of G4 structures and the occupancy of YY1 in their promoter regions (Fig. 5 and Extended Data Fig. 9). The results showed that the expression levels of MANF, PDHB and MYC genes were markedly attenuated, whereas those of PGBD5, SLC25A28 and TMEM145 genes were substantially elevated in HEK293T cells upon treatment with PDS or TMPyP4, or upon short hairpin RNA (shRNA)-mediated knockdown of YY1 (Fig. 5, Extended Data Fig. 9 and Supplementary Fig. 12).
We also assessed whether the long-range DNA looping elicited by YY1–G4 binding plays a role in gene regulation. To this end, we monitored the mRNA expression levels of two genes, TRMT12 and EHD3, which do not carry G4 structures in their promoters but are associated with G4 structures at a remote site via YY1-mediated DNA looping (Fig. 6a,b and Extended Data Fig. 10a,b). Our results showed that the mRNA expression levels of these two genes in HEK293T cells were markedly diminished upon shRNA-mediated knockdown of YY1, or upon treatment with PDS or TMPyP4 (Fig. 6c and Extended Data Fig. 10c).
a, Read enrichments obtained from BG4 ChIP–seq and YY1 ChIP–seq results. b, Normalized interaction frequency involving the sites linked with a red arch in a in HEK293T cells with or without PDS or TMPyP4 treatment, as captured by YY1 HiChIP. c, Quantification of mRNA expression levels of TRMT12 in HEK293T cells after knockdown of YY1 and/or with PDS/TMPyP4 treatment. The data represent mean ± s.e.m. of results from three independent experiments.
Next, we examined whether this observation can be extended to other genes at the genome-wide scale by integrating the results obtained from YY1 ChIP–seq, BG4 ChIP–seq, HiChIP–seq and RNA-seq data. To this end, we focused on those genes whose promoters are occupied by YY1 (based on YY1 ChIP–seq data), that do not fold into G4 structures (based on BG4 ChIP–seq data) and that interact with remote sites through YY1-mediated DNA looping (based on YY1 HiChIP–seq data). We divided these genes into two groups based on whether the remote site adopts G4 structure (based on BG4 ChIP–seq) (Supplementary Fig. 13). We found that PDS treatment differentially modulated the expression of these two groups of genes, with more pronounced changes being observed for those genes interacting with a distal site carrying G4 structure than for those without (Supplementary Fig. 13). Moreover, shRNA-mediated knockdown of YY1 abrogated the differences elicited by PDS treatment, and this is true for both the up- and downregulated genes (Supplementary Fig. 13). These results, therefore, support that the role of YY1 in transcriptional regulation involves, in part, the long-range DNA looping forged by its interaction with G4 structures.
Discussion
YY1 is an extensively studied transcription factor and it functions in many important biological processes12,13,15, where duplex DNA sequences with a consensus motif were previously identified as YY1-binding sites15,20. Here, we demonstrated that YY1, apart from binding to its consensus motif sequences in duplex DNA, is capable of interacting directly with G4 structures in vitro and in cells (Figs. 1 and 2). Further analysis showed that the C-terminal zinc-finger domain of YY1 is indispensable for its interaction with G4 structure. Our results from the YY1Δ231–290 truncation mutant showed that recognition of G4 structures by YY1 also entails the N-terminal portion of the protein, although the linker region between the REPO and zinc-finger domains of the protein is dispensable for this recognition (Extended Data Figs. 3 and 8). It will be important to solve, in the future, the structure of YY1 in complex with G4 DNA so as to gain insights into the similarities and differences in YY1’s recognition of G4 structure versus duplex DNA.
Spatial organization is a fundamental element in three-dimensional (3D) genome architecture and transcription regulation; however, not much is known about how 3D genome architecture is regulated and which proteins are involved15,30,31,32. In this vein, CTCF and cohesin were shown to contribute to 3D genome organization30,32. Recently, YY1 was found to function in this process by enabling enhancer–promoter interactions15. Our results unveiled a molecular determinant for the function of YY1 in this process, that is, through its binding with G4 structures, where displacement of YY1 from G4 structures with small-molecule G4 ligands, unwinding of G4 structures by overexpression of BLM helicase and mutation of a G4-forming sequence using CRISPR–Cas9-mediated genome editing all disrupt YY1-mediated DNA looping (Fig. 3, Extended Data Figs. 5 and 7 and Supplementary Fig. 10).
On the basis of our results, we propose a mechanistic model, where the dimerization of YY1 allows for its concurrent binding of two consensus sequence motifs, two G4 structures or one consensus sequence motif and one G4 structure, thereby enabling DNA looping. Our model is supported by results from proximity ligation assays showing that YY1 can promote the ligation between these DNA elements (Extended Data Fig. 7), from analytical gel-filtration assays revealing that YY1 forms a dimer in vitro (Extended Data Fig. 6) and from quantitative EMSA assays demonstrating that YY1 binds to G4 DNA in a 1:1 stoichiometry (Extended Data Fig. 6). Moreover, our identification of a separation-of-function mutant of YY1, which is defective in dimerization, but proficient in discriminating G4 structure from single-stranded DNA, and demonstration that the mutation diminishes markedly the YY1-mediated DNA looping in cells lends further support to our mechanistic proposal (Fig. 4).
Disruption of binding between YY1 and G4 structure with the use of G4-stabilizing ligands, PDS or TMPyP4, and genetic depletion of YY1 led to altered expression of not only those genes harboring G4 structures in their promoters (Fig. 5 and Extended Data Fig. 9), but also those genes that do not carry G4 structures in their promoters but that are linked with G4 structure at a remote site through YY1-mediated DNA looping (Fig. 6 and Extended Data Fig. 10). These results support that YY1–G4 interaction not only regulates the expression of proximal genes, but also that of distal genes through long-range DNA looping. This represents a very important mechanism for YY1-regulated gene expression.
Numerous studies have documented the association between YY1 and human cancer, and YY1 is also considered as a potential prognostic biomarker and therapeutic target33,34. In this vein, YY1 is overexpressed in multiple types of cancer, and its overexpression is also correlated with poor therapeutic outcomes33,34. In addition, YY1 is known to regulate a number of cancer-related genes33,34. Furthermore, G4 structures are over-represented in many cancer-related genes, and multiple studies revealed markedly increased presence of G4 structures in cancer cells35, indicating the involvement of G4 in cancer development. In this study, we reveal that the functions of YY1 in modulating gene expression and DNA looping depend, at least in part, on its binding with G4 structure. Hence, perhaps the interaction between YY1 and G4 can be exploited for therapeutic interventions for human cancer in the future.
Together, our work unveils a previously unrecognized mode of DNA recognition for a well-studied transcription factor, YY1 (that is, in its binding with G4 DNA), and reveals the role of this type of molecular recognition in modulating long-range DNA looping and gene expression. Our study also uncovers that the G4 structure-mediated gene regulation can occur not only through G4 structures in promoter regions of genes, but also via distal G4 structures that are brought into close proximity through long-range DNA looping. Therefore, the present work expands molecular determinants for DNA looping, and offers insight into the functions of YY1 and G4 structures in gene regulation.
Methods
Cell lines
HEK293T (293T) cells were purchased from ATCC. Cells were maintained in DMEM (Life Technologies) supplemented with 10% FBS (Invitrogen) and 100 unit ml−1 penicillin/streptomycin at 37 °C in a humidified incubator with 5% CO2.
Generation of recombinant YY1 proteins
The coding sequence (CDS) of human YY1 gene was amplified by PCR and inserted into the pGEX plasmid linearized by BamHI and XhoI restriction digestion enzymes. For truncated proteins, the corresponding CDS was amplified by PCR and inserted into the pGEX plasmid using the same method. The plasmids were transformed into Rosetta (DE3) pLysS Escherichia coli cells and the cells cultured in LB medium at 37 °C. After the optical density (OD600) reached approximately 0.6, the culture was cooled to 20 °C and the protein expression was induced with 1 mM IPTG (Sigma) at 20 °C for approximately 12 h. The cells were then collected and lysed in a buffer containing 20 mM Tris (pH 7.5), 500 mM NaCl, 10% glycerol and 2 mM DTT. After centrifugation, the supernatant was collected and GST-tagged proteins were purified with Glutathione Superflow Agarose (Pierce) following the manufacturer’s recommended procedures. Elution was performed using a buffer containing 20 mM Tris (pH 7.5), 250 mM NaCl, 10% glycerol, 2 mM DTT and 20 mM reduced glutathione. The GST-tagged protein was digested with 3C protease overnight to remove the GST tag. The tag-free protein was further loaded onto the heparin column (GE Healthcare) and eluted with a NaCl gradient (0.25–1.0 M) in a buffer containing 20 mM Tris (pH 7.5), 10% glycerol and 2 mM DTT. The fractions containing the target protein were pooled, concentrated and further purified by size-exclusion chromatography on a HiLoad 16/600 Superdex 200 pg column (GE Healthcare). The proteins were quantified using Bradford Protein Assay Kit (Bio-Rad), and their purities verified by SDS–PAGE analysis and A260/280 measurement (Supplementary Fig. 5 and Extended Data Fig. 8). The analytical gel-filtration chromatography was performed using a Superdex 200 10/300 column (GE Healthcare) in a buffer containing 20 mM Tris–HCl (pH 7.5), 250 mM KCl, 10% glycerol and 5 mM DTT.
Fluorescence anisotropy
Fluorescence anisotropy measurements were conducted as previously described11. Briefly, 5-carboxytetramethylrhodamine (TAMRA)-labeled DNA probes were annealed in a buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl and 0.1 mM EDTA to form G4 structure. The binding assays were performed with 10 nM labeled DNA and the indicated concentrations of recombinant YY1 protein in a binding buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl, 10 μM ZnCl and 1 mM DTT. After a 30-min incubation on ice, fluorescence anisotropy was measured on a Horiba QuantaMaster-400 spectrofluorometer (Photon Technology International), with the excitation and emission wavelengths being 550 and 580 nm, respectively. The instrument G factor was determined before anisotropy measurements. The data were fitted according to the following equation:
In the equation, [DNA] and [Protein] denote the DNA and protein concentrations, respectively. Aobs is the observed anisotropy value. A0 is the anisotropy value in the absence of protein. ΔA represents the total change in anisotropy between free and fully bound DNA, and Kd is the equilibrium dissociation constant36.
EMSA
For protein–DNA binding, 25 nM TAMRA-labeled DNA was incubated with different concentrations of full-length YY1 protein, or its truncated or mutated counterparts, in a binding buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl, 10 μM ZnCl, 1 mM DTT and 3% glycerol on ice for 30 min. The samples were then loaded onto a 6% polyacrylamide gel in TBE buffer (40 mM Tris–HCl, pH 8.3, 45 mM boric acid and 1 mM EDTA) at 4 °C. The samples were run at 120 V at 4 °C for 30 min. The gels were imaged with Odyssey Imaging Systems (LI-COR Biosciences).
CD spectroscopy
The DNA probes (2.5 µM) were annealed in a buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl and 0.1 mM EDTA to form G4 structure. The CD spectra for the DNA probes were recorded at room temperature on a Jasco-815 spectrometer in the wavelength range of 200–320 nm, and the scan rate was 1 nm s−1. For assessing how YY1 binding modulates G4 folding, the annealed DNA probes were incubated with an equal concentration of the purified YY1 protein at room temperature for 30 min, and the CD spectra were subsequently acquired as noted above. The CD spectrum for the YY1 protein itself was then subtracted from the composite CD spectra of the YY1–G4 DNA complexes.
In vitro proximity ligation assay
For proximity ligation assay, 20 nM DNA fragments containing G4 or YY1 consensus motif were incubated with 100 nM YY1 protein in a binding buffer containing 10 mM Tris–HCl (pH 7.5), 100 mM KCl, 10 μM ZnCl2 and 1 mM DTT at 20 °C for 30 min. The mixture was subsequently diluted and the DNA fragments were ligated in a mixture containing 2 U µl−1 T4 DNA ligase, 2 U µl−1 T4 polynucleotide kinase and 1× T4 DNA ligase buffer at 30 °C for 10 min. The reactions were terminated immediately afterwards by heating at 65 °C for 5 min. The resulting mixture was diluted, and real-time quantitative PCR was employed to quantify the ligation products. The sequences for the DNA fragments and primers used in the quantitative PCR are listed in Supplementary Table 2.
Chromatin fractionation and western blot
Chromatin fractionation was performed as described37. The cells were grown to 60–70% confluence, incubated with 20 µM MG132 for 4 h and then treated with 20 µM PDS or 5 µM TMPyP4 at 37 °C for 12 h. The chromatin fraction was isolated using a stepwise procedure with a cytoplasmic lysis buffer (10 mM Tris–HCl, pH 8.0, 0.34 M sucrose, 3 mM CaCl2, 2 mM MgCl2, 0.1 mM EDTA, 1 mM DTT, 0.5% NP-40 and protease inhibitor cocktail), a nuclear lysis buffer (20 mM HEPES, pH 7.9, 1.5 mM MgCl2, 1 mM EDTA, 150 mM KCl, 0.1% NP-40, 1 mM DTT, 10% glycerol and protease inhibitor cocktail) and a chromatin isolation buffer (20 mM HEPES, pH 7.9, 1.5 mM MgCl2, 150 mM KCl, 10% glycerol, protease inhibitor cocktail and 0.15 unit μl−1 benzonase). The proteins were again quantified by Bradford assay. After separation on a 10% SDS–PAGE, the proteins were transferred to a nitrocellulose membrane (Bio-Rad). After blocking with blotting-grade blocker (Bio-Rad), the membrane was incubated in a solution containing primary antibody and 5% BSA for 2 h, and then incubated in a 5% blotting-grade blocker containing the horseradish peroxidase (HRP)-conjugated secondary antibody. The western blot signal was detected using ECL western blotting detection reagent. Primary antibodies used in this study included histone H3 (Cell Signaling Technology, 9715S; 1:10,000), YY1 (Santa Cruz Biotechnology, SC-7341; 1:2,000), anti-rabbit IgG peroxidase (Sigma, A0545; 1:10,000), anti-mouse IgG HRP (Santa Cruz Biotechnology, SC-516102; 1:10,000) and β-actin (Cell Signaling Technology, 4967S; 1:5,000).
RT–qPCR
RT–qPCR was conducted as previously described37. Total RNA was extracted with Omega Total RNA Kit I (Omega) and quantified. Reverse transcription was performed using M-MLV Reverse Transcriptase (Promega) for complementary DNA synthesis. RT–qPCR was carried out using iQ SYBR Green Supermix (Bio-Rad) on the CFX96 RT–qPCR detection system (Bio-Rad). Primers used for RT–qPCR are listed in Supplementary Table 2.
Targeted integration and mutation with CRISPR–Cas9
YY1-TAPTAG cell line was generated in HEK293T background using CRISPR–Cas9, following previously reported procedures11. The guide sequence for YY1 or G4 site was inserted into the pX330 plasmid (Addgene), which expresses hSpCas9. The guide sequence for YY1 was 5′-GAGAAGACCCTTCTCGACCA-3′. The donor plasmid for tagging YY1 (3× Flag–2× Strep tag) was synthesized (gBlock, Integrated DNA Technologies) and inserted into pUC19. For CRISPR-mediated mutation of an endogenous G4 site, the donor plasmid for the mutated G4 sequences was inserted into pUC19. After cotransfection with the two plasmids, the single-cell colonies were isolated and western blot was performed to screen for the insertion of the tandem tag using anti-YY1 antibody (SC-7341, Santa Cruz Biotechnology). Sanger sequencing was performed to confirm the mutation of the sequence for the endogenous G4 site.
ChIP–seq
ChIP experiments were conducted as previously described with a few modifications15,37. YY1-TAPTAG cells were cultured in DMEM medium with or without 20 µM PDS or 5 µM TMPyP4 for 12 h before cross-linking. Approximately 2 × 106 cells were cross-linked with 1/10 volume of freshly prepared 11% formaldehyde solution at room temperature for 10 min, and quenched with 125 mM glycine for 5 min. After washing with 1× PBS three times, the cells were resuspended in a lysis buffer I (50 mM HEPES–KOH, pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40, 0.25% Triton X-100 and protease inhibitors cocktail) at 4 °C on a rotator for 10 min.
After centrifugation, the pellet was resuspended in lysis buffer II (10 mM Tris–HCl, pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA and protease inhibitor cocktail) at 4 °C for 10 min with rotation. The pellet was subsequently resuspended in the sonication buffer (20 mM Tris–HCl, pH 8.0, 150 mM NaCl, 2 mM EDTA, 0.1% SDS, 1% Triton X-100 and protease inhibitor cocktail). Sonication was conducted using a Covaris S220 sonicator for 6 min with a peak incident power of 140 Watts, a duty cycle of 5% and 200 cycles per burst at 4 °C. After centrifugation at 16,000g for 15 min, the supernatant was incubated with anti-Flag antibody (Cell Signaling Technology, catalog no. 2368S) for 2 h. Protein A/G Plus-Agarose (Santa Cruz) was then added and the mixture was incubated at 4 °C overnight. After washing, DNA was eluted from the beads with 100 mM NaHCO3 and 1% SDS at 68 °C for 2 h. Cross-links were subsequently reversed and proteins digested with proteinase K at 65 °C overnight. After purification, RNA in the resulting DNA samples was removed with RNase A. Finally, the DNA was purified using QIAquick PCR Purification Kit (Qiagen).
The DNA-sequencing library was prepared using NEBNext Ultra DNA Library Prep Kit for Illumina (NEB) following the manufacturer’s instructions. The purified DNA libraries were subsequently quantified using an Agilent 2100 Bioanalyzer and multiplexed for sequencing on a NextSeq500 Sequencing System (Illumina).
The sequencing reads of ChIP–seq were checked with FastQC (v.0.68) and aligned to human hg19 reference genome using Bowtie2 (v.2.3.4.1) with the configuration of bowtie2 -q -N 0 -L 22 (ref. 37). Peak calling was performed using the model-based analysis of ChIP–seq (MACS2 v.2.1.1) with the following configuration: MACS2 callpeak -f BAM -g 2.7e+9 –n YY1-ChIP38. The UCSC Genome Browser was used to visualize the mapping results39. The overlap of ChIP–seq peaks was analyzed using custom script. In detail, each peak in the two BED files was compared, and those peaks that overlapped in the YY1 and BG4 ChIP–seq datasets by at least 1 bp were considered overlapping peaks, following previously published procedures3,40. Parallel analyses were also conducted by employing the criterion that the BG4 and YY1 ChIP–seq peaks overlap by at least 8 or 30 bp (Supplementary Table 1). For motif analysis, the genomic sequence of each peak was extracted, where the consensus motif of ‘AANATGGC’ and its complementary strand of ‘GCCATNTT’ were searched and counted. The ChIP–seq peaks were visualized using the UCSC Genome Browser on Human GRCh37/hg19.
For the BG4 ChIP–qPCR experiment, chromatin was immunoprecipitated using BG4 antibody, which was purified as described7, following the aforementioned procedures. After purification of the BG4 antibody-enriched DNA fragments, quantitative PCR was performed using the primers listed in Supplementary Table 2.
HiChIP–seq and HiChIP–qPCR
HiChIP was conducted following the previously described procedures15. Before cross-linking, the YY1-TAPTAG cells were mock-treated, or treated with 20 µM PDS or 5 µM TMPyP4 for 12 h. Approximately 5 × 106 cells were cross-linked using freshly prepared 1% formaldehyde solution at room temperature for 10 min, and quenched with 125 mM glycine for 5 min. After washing three times, the cells were incubated in HiChIP lysis buffer (10 mM Tris–HCl, pH 8.0, 10 mM NaCl and 0.2% IGEPAL CA-630 with protease inhibitors cocktail) at 4 °C for 30 min with rotation. After centrifugation, the cells were washed again with HiChIP lysis buffer and centrifuged. The resulting pellet was resuspended in 0.5% SDS and incubated at 60 °C for 10 min. To the mixture was added 1.5% Triton X-100 to quench the SDS. After incubation at 37 °C for 15 min, the chromatin was then digested with HindIII (NEB) in NEB CutSmart Buffer (NEB) at 37 °C for 6 h. The restriction enzyme was subsequently inactivated by heating at 65 °C for 20 min. Biotin-labeled dCTP was incorporated into the restriction fragment overhangs in the Fill-in master mix (0.25 mM biotin-14-dCTP, 0.25 mM dCTP, 0.25 mM dGTP, 0.25 mM dTTP and 50 U Klenow fragment of DNA Polymerase I in NEB CutSmart Buffer). The reaction was continued at 37 °C for 2 h with rotation. Labeled DNA was ligated in a ligation master mix (1% Trion X-100, 0.1 mg ml−1 BSA and 250 U T4 DNA ligase in T4 Ligation Buffer) at 16 °C for 4 h with rotation.
The ligated chromatin was collected by centrifugation at 2,500g for 5 min and resuspended in ChIP sonication buffer (50 mM HEPES–KOH, pH 7.5, 140 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1% Triton X-100, 0.1% sodium deoxycholate, 0.1% SDS and protease inhibitor cocktail). Chromatin immunoprecipitation was conducted under the same conditions as described above. After elution, RNA and proteins in the samples were removed by digestion with RNase A and proteinase K, respectively. The samples were subsequently incubated at 65 °C overnight to reverse the cross-links, and the DNA was purified from the mixture using QIAquick PCR Purification Kit (Qiagen).
After purification, the fragmented DNA was enriched using the streptavidin C1 magnetic beads (Invitrogen, catalog no. 65001). The beads were washed twice with Tween washing buffer (5 mM Tris–HCl, pH 7.5, 0.5 mM EDTA, 1 M NaCl, 0.05% Tween-20) and subsequently resuspended in 2× biotin binding buffer (10 mM Tris–HCl, pH 7.5, 1 mM EDTA, 2 M NaCl). An equal volume of purified DNA was added, and incubated at room temperature for 15 min. The magnetic beads were then washed twice with Tween washing buffer, twice with binding buffer and then once with TE buffer (10 mM Tris–HCl, pH 8.0, 0.1 mM EDTA). The magnetic beads were collected and the DNA-sequencing libraries were constructed using NEBNext Ultra DNA Library Prep Kit for Illumina (NEB) following the manufacturer’s instructions. The purified DNA libraries were subsequently quantified using an Agilent 2100 Bioanalyzer and multiplexed for sequencing on a NextSeq500 Sequencing System (Illumina).
For HiChIP–seq data analysis, the reads were aligned to human hg19 reference genome using the Map with BWA-MEM (v.0.7.15.1.) tool with a gap extension penalty setting of 50 (bwa mem -A1 -B4 -E50 -L0)41. The interaction matrix was subsequently built with hicBuildMatrix (v.2.1.0) tool with the restriction site being set to HindIII (hicBuildMatrix –binSize 5000 –restrictionSequence AGCT)42. The visualization of the HiChIP data was performed with the hicPlotMatrix (v.2.1.0) tool42.
For the HiChIP–qPCR experiment, the DNA was purified after chromatin immunoprecipitation and was used for quantitative PCR with the primers listed in Supplementary Table 2.
RNA-seq and data analysis
Total RNA was extracted from cells with the Omega Total RNA Kit I. The NEBNext Ultra II RNA Library Prep Kit for Illumina was used to produce the RNA-sequencing library. Ribosomal RNAs were removed using the NEBNext rRNA Depletion Kit, and the resulting RNA samples were treated with DNase I to remove residual DNA. The rRNA-depleted RNA was incubated in NEBNext First Strand Synthesis Reaction Buffer by heating at 94 °C for 15 min on a thermal cycler. The fragmented RNA was reverse-transcribed using First Strand synthesis enzyme Mix (NEB) with random primers. The NEBNext Second Strand Synthesis Enzyme Mix (NEB) was used to perform the second-strand synthesis. After purification with AMPure XP Beads, the sequencing library was subsequently prepared as described above. The DNA was barcoded with NEBNext Multiplex Oligos for Illumina (NEB), and the resulting library was subjected to sequencing analysis on an Illumina HiSeq4000 instrument.
The sequencing reads of RNA-seq were aligned to human hg38 reference genome using HISAT2 (v.2.1.0) with the configuration of hisat2 -q -x -1 -2 -S. Transcript abundance was determined using featureCounts (v.1.6.4) with the configuration of featureCounts -T 4 -t exon -g gene_id. Differential gene expression was analyzed with DESeq2 (v.2.11.40.6).
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
The ChIP–seq, HiChIP–seq and RNA-seq data generated in this study have been deposited into the NCBI GEO database (for ChIP–seq and HiChIP–seq data: GEO accession number GSE128106; for RNA-seq: GEO accession number GSE142075). The ChIP–seq data for Ishikawa and SK-N-SH cells were obtained from NCBI GEO database with accession numbers of GSM1010753 and GSM1010897, respectively43. The two G4 ChIP–seq datasets were obtained from NCBI GEO database with accession numbers of GSE99205 and GSE107690 (refs. 3,7). The human hg19 reference genome was downloaded from https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips. Source data are provided with this paper.
Code availability
Custom codes used in this work are available from https://github.com/linliucr/UCR_code.
References
Bochman, M. L., Paeschke, K. & Zakian, V. A. DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 13, 770–780 (2012).
Hansel-Hertsch, R., Di Antonio, M. & Balasubramanian, S. DNA G-quadruplexes in the human genome: detection, functions and therapeutic potential. Nat. Rev. Mol. Cell Biol. 18, 279–284 (2017).
Mao, S. Q. et al. DNA G-quadruplex structures mold the DNA methylome. Nat. Struct. Mol. Biol. 25, 951–957 (2018).
Huppert, J. L. & Balasubramanian, S. Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 33, 2908–2916 (2005).
Biffi, G., Tannahill, D., McCafferty, J. & Balasubramanian, S. Quantitative visualization of DNA G-quadruplex structures in human cells. Nat. Chem. 5, 182–186 (2013).
Henderson, A. et al. Detection of G-quadruplex DNA in mammalian cells. Nucleic Acids Res. 42, 860–869 (2014).
Hansel-Hertsch, R., Spiegel, J., Marsico, G., Tannahill, D. & Balasubramanian, S. Genome-wide mapping of endogenous G-quadruplex DNA structures by chromatin immunoprecipitation and high-throughput sequencing. Nat. Protoc. 13, 551–564 (2018).
Soldatenkov, V. A., Vetcher, A. A., Duka, T. & Ladame, S. First evidence of a functional interaction between DNA quadruplexes and poly(ADP-ribose) polymerase-1. ACS Chem. Biol. 3, 214–219 (2008).
Gonzalez, V., Guo, K., Hurley, L. & Sun, D. Identification and characterization of nucleolin as a c-myc G-quadruplex-binding protein. J. Biol. Chem. 284, 23622–23635 (2009).
Paramasivam, M. et al. Protein hnRNP A1 and its derivative Up1 unfold quadruplex DNA in the human KRAS promoter: implications for transcription. Nucleic Acids Res. 37, 2841–2853 (2009).
Williams, P., Li, L., Dong, X. & Wang, Y. Identification of SLIRP as a G quadruplex-binding protein. J. Am. Chem. Soc. 139, 12426–12429 (2017).
Gordon, S., Akopyan, G., Garban, H. & Bonavida, B. Transcription factor YY1: structure, function, and therapeutic implications in cancer biology. Oncogene 25, 1125–1142 (2006).
Deng, Z., Cao, P., Wan, M. M. & Sui, G. Yin Yang 1: a multifaceted protein beyond a transcription factor. Transcription 1, 81–84 (2010).
Wu, S. et al. A YY1–INO80 complex regulates genomic stability through homologous recombination-based repair. Nat. Struct. Mol. Biol. 14, 1165–1172 (2007).
Weintraub, A. S. et al. YY1 is a structural regulator of enhancer-promoter loops. Cell 171, 1573–1588.e28 (2017).
Maru, Y., Afar, D. E., Witte, O. N. & Shibuya, M. The dimerization property of glutathione S-transferase partially reactivates Bcr-Abl lacking the oligomerization domain. J. Biol. Chem. 271, 15353–15357 (1996).
Han, F. X., Wheelhouse, R. T. & Hurley, L. H. Interactions of TMPyP4 and TMPyP2 with quadruplex DNA. Structural basis for the differential effects on telomerase inhibition. J. Am. Chem. Soc. 121, 3561–3570 (1999).
Rodriguez, R. et al. A novel small molecule that alters shelterin integrity and triggers a DNA-damage response at telomeres. J. Am. Chem. Soc. 130, 15758–15759 (2008).
Houbaviy, H. B., Usheva, A., Shenk, T. & Burley, S. K. Cocrystal structure of YY1 bound to the adeno-associated virus P5 initiator. Proc. Natl Acad. Sci. USA 93, 13577–13582 (1996).
Whitfield, T. W. et al. Functional analysis of transcription factor binding sites in human promoters. Genome Biol. 13, R50 (2012).
Chambers, V. S. et al. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 33, 877–881 (2015).
Muller, S., Kumari, S., Rodriguez, R. & Balasubramanian, S. Small-molecule-mediated G-quadruplex isolation from human cells. Nat. Chem. 2, 1095–1098 (2010).
Siddiqui-Jain, A., Grand, C. L., Bearss, D. J. & Hurley, L. H. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl Acad. Sci. USA 99, 11593–11598 (2002).
Lee, G. R. Role of YY1 in long-range chromosomal interactions regulating Th2 cytokine expression. Transcription 5, e27976 (2014).
Liu, H. et al. Yin Yang 1 is a critical regulator of B-cell development. Genes Dev. 21, 1179–1189 (2007).
Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13, 919–922 (2016).
Mohaghegh, P., Karow, J. K., Brosh, R. M. Jr., Bohr, V. A. & Hickson, I. D. The Bloom’s and Werner’s syndrome proteins are DNA structure-specific helicases. Nucleic Acids Res. 29, 2843–2849 (2001).
Lopez-Perrote, A. et al. Structure of Yin Yang 1 oligomers that cooperate with RuvBL1-RuvBL2 ATPases. J. Biol. Chem. 289, 22614–22629 (2014).
Shi, Y., Seto, E., Chang, L. S. & Shenk, T. Transcriptional repression by YY1, a human GLI-Kruppel-related protein, and relief of repression by adenovirus E1A protein. Cell 67, 377–388 (1991).
Merkenschlager, M. & Nora, E. P. CTCF and cohesin in genome folding and transcriptional gene regulation. Annu. Rev. Genomics Hum. Genet. 17, 17–43 (2016).
Tang, Z. et al. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163, 1611–1627 (2015).
Parelho, V. et al. Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132, 422–433 (2008).
Khachigian, L. M. The Yin and Yang of YY1 in tumor growth and suppression. Int. J. Cancer 143, 460–465 (2018).
Shi, J., Hao, A., Zhang, Q. & Sui, G. The role of YY1 in oncogenesis and its potential as a drug target in cancer therapies. Curr. Cancer Drug Targets 15, 145–157 (2015).
Rhodes, D. & Lipps, H. J. G-quadruplexes and their regulatory roles in biology. Nucleic Acids Res. 43, 8627–8637 (2015).
Favicchio, R., Dragan, A. I., Kneale, G. G. & Read, C. M. Fluorescence spectroscopy and anisotropy in the analysis of DNA–protein interactions. Methods Mol. Biol. 543, 589–611 (2009).
Li, L., Miao, W., Huang, M., Williams, P. & Wang, Y. Integrated genomic and proteomic analyses reveal novel mechanisms of the methyltransferase SETD2 in renal cell carcinoma development. Mol. Cell. Proteom. 18, 437–447 (2019).
Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP–seq enrichment using MACS. Nat. Protoc. 7, 1728–1740 (2012).
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Matsuda, K. et al. ChIP–seq analysis of genomic binding regions of five major transcription factors highlights a central role for ZIC2 in the mouse epiblast stem cell gene regulatory network. Development 144, 1948–1958 (2017).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997v2 (2013).
Wolff, J. et al. Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization. Nucleic Acids Res. 46, W11–W16 (2018).
Gertz, J. et al. Distinct properties of cell-type-specific and shared transcription factor binding sites. Mol. Cell 52, 25–36 (2013).
Acknowledgements
This work was supported by the National Institutes of Health (R35 ES031707 to Y.W. and R35 GM119721 to J.S.). M.H. was supported in part by an NRSA T32 Institutional Research Training Grant (ES018827).
Author information
Authors and Affiliations
Contributions
L.L. and Y.W. conceived the project. P.W., W.M. and M.H. performed the G-quadruplex pull-down experiments and analyzed the mass spectrometry data. L.L., M.Y.W., Z.G. and W.M. performed the plasmid construction and cell culture experiments. L.L. and Z.G. performed the in vitro binding assay. L.L. performed the ChIP–seq, HiChIP–seq, RNA-seq and relevant data analysis. W.R. and J.S. assisted with the protein expression and purification. L.L. and P.W. analyzed the data. L.L. and Y.W. wrote the manuscript, which was reviewed and commented by all co-authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Proteome-wide identification of G quadruplex-binding proteins.
Volcano plots showing the quantification results of G quadruplex-binding proteins identified from SILAC-based interaction screening. The −Log10(P value) was plotted against the log2(ratio G4/M4). YY1 is labeled in red.
Extended Data Fig. 2 Circular dichroism (CD) spectra of the three G4 sequences in the absence or presence of tag-free YY1 protein.
a-c, CD spectra of cMYC G4 (a), YY1 (b) and cMYC G4-YY1 complex (c). d-f, Comparison of CD spectra for G4 probes in the presence or absence of YY1 protein. The CD spectra for the G4 probes in the presence of YY1 protein were obtained by subtracting the CD spectrum of the YY1 protein from the composite spectra of the protein-G4 DNA complexes.
Extended Data Fig. 3 Zinc finger domain of YY1 is essential, but not sufficient for the specific binding toward G4 structure.
a, A schematic diagram depicting the domain structure of YY1 protein. b-e, Fluorescence anisotropy for monitoring the bindings of mutant (YY1C360S) and truncated (YY11-382, YY1231-414, and YY1293-414) YY1 proteins with hTEL G4 and the corresponding M4 DNA probes. The data represent the mean ± S.E.M. of results obtained from 3 independent experiments.
Extended Data Fig. 4 Statistical analysis of YY1 ChIP-Seq and BG4 ChIP-Seq data.
a, The ChIP-seq signal enrichment of YY1 ChIP-Seq and BG4 ChIP-Seq based on the overlapped peaks for the two datasets. The YY1 ChIP-Seq and BG4 ChIP-Seq average signal enrichments are plotted against the BG4 overlapped peaks. b, Analysis of peak width distribution for YY1 ChIP-seq and BG4 ChIP-seq data.
Extended Data Fig. 5 Unwinding of G4 with the overexpression of BLM helicase disrupts the YY1-mediated DNA looping.
a-b, Reads enrichment of two regions from BG4 ChIP-Seq and YY1 ChIP-Seq. The long-range DNA interactions monitored in HiChIP-PCR experiment are labeled with red arches, and the regions monitored in BG4 ChIP PCR experiments are indicated with blue triangles. c-d, BG4 ChIP and YY1 ChIP enrichments at the two sites were markedly diminished after ectopic overexpression of BLM helicase (BLM-O.E.). e, YY1-mediated DNA looping is disrupted by overexpression of BLM helicase. HiChIP-PCR quantification results of YY1-mediated DNA looping in HEK293T cells with and without the overexpression of BLM. Shown in (c-e) are mean ± S.E.M. of results obtained from 3 independent experiments. The p values were calculated by using two-tailed, unpaired Student’s t test: **, p < 0.01; ***, p < 0.001.
Extended Data Fig. 6 Analysis of YY1 dimerization and binding stoichiometry of the YY1-G4 DNA complex.
a-b, Gel filtration chromatography revealed the dimerization of YY1 and truncated YY1231-414, but not YY1293-414. c-d, The binding stoichiometry of YY1:G4 DNA was analyzed using EMSA. The quantification results showed the YY1-bound fraction of TAMRA-G4. The stoichiometry in binding of YY1 to G4 DNA matches with the theoretical curve in 1:1 binding stoichiometry. The data represent mean ± S.E.M. from 4 independent experiments.
Extended Data Fig. 7 YY1 promotes interactions between DNA elements containing its consensus sequence motifs, G4 DNA, or both.
a, A scheme depicting the in vitro proximity ligation assay for assessing the ability of YY1 to enhance DNA-DNA interactions involving G4 structures and/or YY1 consensus motifs. b, qPCR quantification results of the proximity ligation products formed between motifs, between G4 structures, and between motif and G4 structure in the presence or absence of YY1 protein. c, qPCR quantification results revealed the inability of YY1 in promoting the ligation between M4 (that is mutated sequence of G4 that can no longer fold into G4 structure) and G4, motif, or M4. d, qPCR quantification results of the proximity ligation products formed between G4 structures in the absence or presence of PDS or TMPyP4. The data represent mean ± S.E.M. of results from three independent experiments.
Extended Data Fig. 8 Dimerization of YY1 promotes long-range DNA looping.
a, SDS-PAGE for monitoring the purified recombinant truncated forms of YY1 that is incapable of dimerization, but able to discriminate G4 structure from single-stranded DNA. b, Gel filtration chromatography revealed that YY1∆231-290 (calculated monomer MW: 38.4 kDa) exists as a monomer, and GST-YY1∆231-290 is present as a dimer (calculated monomer MW: 66.3 kDa). c-d, Fluorescence anisotropy for monitoring the binding of YY1∆231-290 and GST-YY1∆231-290 protein with G4 structure and the corresponding mutated sequence (M4) derived from the MYC promoter. e, Proximity ligation assay showing that YY1 and GST-YY1∆231-290, but not YY1∆231-290, is capable of promoting ligation between G4 DNA sequences.
Extended Data Fig. 9 Regulation of gene expression by YY1-promoter G4 interactions.
a-c, Quantification of mRNA expression levels of MYC (a), SLC25A28 (b) and TMEM145 (c) genes after shRNA-mediated knockdown of YY1 and/or PDS/TMPyP4 treatment. Top of each panel shows read enrichments from BG4 ChIP-Seq and YY1 ChIP-Seq. The data represent mean ± S.E.M. of results from three independent experiments.
Extended Data Fig. 10 YY1-G4 binding participates in transcription regulation of EHD3 genes through long-range DNA looping.
a, Read enrichments obtained from BG4 ChIP-seq and YY1 ChIP-Seq experiments. b, Normalized interaction frequency between the two sites linked with a red arch in a in HEK293T cells with or without PDS/TMPyP4 treatment, as captured by YY1 HiChIP. c, Quantification results for the mRNA expression levels of EHD3 genes in HEK293T cells after knockdown of YY1 and/or with PDS/TMPyP4 treatment. The data represent mean ± S.E.M. of results from three independent experiments.
Supplementary information
Supplementary Information
Supplementary Tables 1 and 2 and Figs. 1–13.
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 1
Unprocessed western blots and/or gels.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 2
Unprocessed western blots and/or gels.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 6
Unprocessed western blots and/or gels.
Source Data Extended Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 8
Unprocessed western blots and/or gels.
Source Data Extended Data Fig. 9
Statistical source data.
Source Data Extended Data Fig. 10
Statistical source data.
Rights and permissions
About this article
Cite this article
Li, L., Williams, P., Ren, W. et al. YY1 interacts with guanine quadruplexes to regulate DNA looping and gene expression. Nat Chem Biol 17, 161–168 (2021). https://doi.org/10.1038/s41589-020-00695-1
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41589-020-00695-1
This article is cited by
-
DNA G-quadruplex structures act as functional elements in α- and β-globin enhancers
Genome Biology (2025)
-
DNA G-quadruplex profiling in skeletal muscle stem cells reveals functional and mechanistic insights
Genome Biology (2025)
-
G-quadruplex structures regulate long-range transcriptional reprogramming to promote drug resistance in ovarian cancer cells
Genome Biology (2025)
-
Viral oncogene EBNALP regulates YY1 DNA binding and alters host 3D genome organization
EMBO Reports (2025)
-
Chromatin loops are an ancestral hallmark of the animal regulatory genome
Nature (2025)