Abstract
The proximity ligation-based Hi-C and derivative methods are the mainstream tools to study genome-wide chromatin interactions. These methods often fragment the genome using enzymes functionally irrelevant to the interactions per se, restraining the efficiency in identifying structural features and the underlying regulatory elements. Here we present Footprint-C, which yields high-resolution chromatin contact maps built upon intact and genuine footprints protected by transcription factor (TF) binding. When analyzed at one-dimensional level, the billions of chromatin contacts from Footprint-C enable genome-wide analysis at single footprint resolution, and reveal preferential modes of local TF co-occupancy. At pairwise contact level, Footprint-C exhibits higher efficiency in identifying chromatin structural features when compared with other Hi-C methods, segregates chromatin interactions emanating from adjacent TF footprints, and uncovers multiway interactions involving different TFs. Altogether, Footprint-C results suggest that rich regulatory modes of TF may underlie both local residence and distal chromatin interactions, in terms of TF identity, valency, and conformational configuration.
Similar content being viewed by others
Introduction
Chromatin interactions between discrete regulatory elements, mediated by architectural type or tissue-specific transcription factors (TFs), constitute the three-dimensional genome organization, and underlie many important nuclear processes, such as transcription, DNA replication, and DNA damage repair1. Proximity ligation-based Hi-C and its derivative methods have been widely used to map chromatin interactions2,3,4,5,6. These methods often involve sequence-specific restriction enzymes to digest the genome before ligating two digested genomic fragments2,3. In such cases, the choice of enzyme, and the frequency and genomic distribution of the cutting sites, affect the efficiency of digestion and ligation. A chromatin interaction is potentially represented in a Hi-C contact map only when the cutting sequences are in close proximity to the two interacting cis elements. Other derivative methods often employ Tn5 transposase to achieve preferential coverage of the accessible portion of the genome5,6,7. These methods all require a further fragmentation step, either using sonication or other restriction enzymes, to break down the ligated chromatin to facilitate library preparation and sequencing. Therefore, many captured and sequenced DNA ligation products do not contain the binding sites of TFs mediating interactions (Supplementary Fig. 1a). The usage of such enzymes functionally irrelevant to the nature of interactions restrains the efficiency of discovering significant structural features from a chromatin contact map, and obscures the identity of underlying cis regulatory elements and trans TFs. The development of Micro-C partially circumvents this problem by employing Micrococcal Nuclease (MNase) to digest the genome, and preferentially captures ligations, and thus interactions between pairs of nucleosomal DNA4,8,9. In this study, we utilize Deoxyribonuclease I (DNase I), a nuclease previously used for mapping TF footprints10, and develop Footprint-C, which specifically captures chromatin interactions between TF footprints, and achieves higher efficiency in identifying chromatin structural features, such as chromatin loops and stripes, compared with other Hi-C methods. The analysis in one-dimensional and pairwise contact level of the Footprint-C datasets suggests extensive TF regulatory modes in both local residence and long-range chromatin interactions.
Results
Footprint-C captures chromatin interactions between TF footprints
A few studies have used DNase to fragment the genome in Hi-C experiments11,12,13. These methods bear no rational steps to preserve TF footprint sequence. They either captured ligation products between long DNA fragments (Supplementary Fig. 1b), possibly due to insufficient DNase digestion, or did DNase digestion on purified DNA molecules in the absence of TFs13. Furthermore, they all used sonication to fragment the genome after proximity ligation, possibly wiping out the footprint sequences from the Hi-C library. Indeed, when inspected with genome-wide coverage of one-dimensional (1D) fragments, these datasets show no enrichment at motifs of CTCF (CCCTC-binding factor), the master regulator of genome folding14, or at distal DNase Hypersensitive Sites (dDHS) (Supplementary Fig. 1c). In our strategy, we performed extensive DNase digestion to approximate an ideal length region of TF footprints, which further enabled us to use gel excision to strictly select those ligation products directly between two footprints. This size selection step also helps avoid the sonication step often used in other Hi-C methods, thereby keeping the integrity of footprints (Supplementary Fig. 1a).
We applied Footprint-C in human chronic myelogenous leukemia cell line K562, and human embryonic kidney cell line HEK293T, and obtained 1.93 and 0.93 billion contacts, respectively. About 86.7% of K562 and 84.5% of HEK293T contacts are cis (Supplementary Fig. 1d, e). The vast majority of Footprint-C contacts are between short fragments with a mean length of 50 base pairs (bp) (Fig. 1a and Supplementary Fig. 1f), consistent with the length of digested fragments in DNase-seq (Supplementary Fig. 1g). Next, we benchmarked K562 Footprint-C dataset with datasets acquired through different Hi-C and derivative methods in K562 cells by previous studies. When compared with in situ Hi-C3, BL-Hi-C15, and Micro-C16 datasets, Footprint-C shows an outstanding 1D enrichment at CTCF motifs and dDHS (Fig. 1b). Notably, HiCAR5 and Hi-TrAC6 are the two methods recently developed for mapping open chromatin interactions by employing Tn5 transposase. Neither exhibits the same degree of enrichment at CTCF motifs or dDHS as Footprint-C (Fig. 1b). V-plot is a form of visual representation of footprints protected by TF binding under DNase digestion, evidenced by the V-shaped enrichment of footprint-bearing fragments17. The canonical “V” patterns of Footprint-C fragments at both CTCF motifs and dDHS strongly indicate that a significant portion of the interacting short fragments ligated in Footprint-C were protected by TF binding under DNase digestion (Fig. 1c). The genome-wide 1D coverage of Footprint-C fragments is highly correlated with the coverage of DNase-seq fragments (Supplementary Fig. 1h). The set of peaks acquired from Footprint-C 1D coverage also largely overlap with the peaks from DNase-seq (Fig. 1d, e). We analyzed the common and specific peaks in K562 cells. The Footprint-C specific peaks exhibit higher enrichment of Footprint-C 1D coverage than DNase-seq, but relatively weak interaction frequency, possibly due to their overall low occupancy level compared with the common peaks (Supplementary Fig. 1i, j). The common peaks show the highest 1D enrichment and interaction frequency among all three sets of peaks (Supplementary Fig. 1i, j). Therefore, while the TF occupancy level seems to be independent of its interaction frequency, the interaction frequency possibly builds on top of the TF occupancy level.
a Length distribution of Footprint-C fragments in K562. b Pile-up of fragments from different datasets at occupied CTCF motifs or dDHS. c V-plot of Footprint-C fragments at CTCF motifs or dDHS. d Venn diagram showing overlap of MACS2 peaks between Footprint-C 1D and DNase-seq in K562 or HEK293T. e A screenshot showing Footprint-C 1D and DNase-seq signals in K562 or HEK293T. The red arrow marks the indexed CTCF motif. f Frequency of unique motif annotation of fragments from different Hi-C datasets. g Frequency of CTCF motif annotation of fragments (blue) or contacts (orange) from different Hi-C datasets. h Frequency of motif orientation of CTCF-CTCF contacts from different Hi-C datasets. i Frequency of motif orientation of CTCF-CTCF contacts from Footprint-C datasets in different cell lines. Source data are provided as a Source Data file.
We went on to annotate the Footprint-C fragments with a whole collection of consensus TF motif sequences18. Across different Hi-C methods, Footprint-C fragments show the highest rate of unique TF motif annotation (83.0%) (Fig. 1f), largely due to the short fragment length and the preservation of intact footprint sequences. About 1.25% of Footprint-C fragments contain a single CTCF motif, and 0.33‰ of Footprint-C contacts are between pairs of CTCF motifs (Fig. 1g). To demonstrate the high accuracy of motif annotation, we calculated the frequency of pairs of convergent CTCF motifs, a hallmark of CTCF-mediated chromatin loops in mammals19. Surprisingly, Footprint-C is the only Hi-C method showing a predominant (60.2% vs 25% expected) frequency of convergent CTCF motif pairs at single contact level (Fig. 1h), which becomes apparent at aggregated chromatin loop level in all other Hi-C datasets (Supplementary Fig. 1k). In contrast, Footprint-C data in fly cell line Kc167 does not show a predominant convergent pattern between interacting CTCF motifs (Fig. 1i), consistent with the findings that CTCF proteins work differently in organizing Drosophila genome than in human genome20. The above results showcase the power of Footprint-C to efficiently capture chromatin contacts between genuine TF footprints functionally relevant to the interactions.
Preferential modes of local TF clustering
The billions of chromatin contacts from Footprint-C, when viewed at 1D level, present a rich collection of TF footprints and an opportunity to analyze TF local residence with unprecedented resolution. We first obtained 1D peaks from the Footprint-C datasets, and classified them into single-, double-, or triple-footprint (1 V, 2 V, 3 V, respectively) sites based on the proximity of two adjacent peak summits (Fig. 2a and see Methods). V-plots at single sites and sampled collections of sites demonstrate the existence of footprints across the whole 1D peak set (Supplementary Figs. 2 and 3a, b). The locations of the footprints within these sites well correspond to the peaks from DNase-seq and TF ChIP-seq datasets (Fig. 2b, c). About 70% of the fragments from these sites are located within the theoretical inside-V area (Supplementary Fig. 3c), showing that footprint-bearing fragments account for a significant portion of the dataset. The consensus motif sequences of TFs involved in genome organization, like CTCF19,21 and MAZ22,23 (MYC-associated Zinc Finger Protein), are within the most represented motifs at these sites (Supplementary Fig. 3d). Interestingly, the density of TF footprints at promoter regions appears to correlate with gene expression level (Supplementary Fig. 3e), suggesting a quantitative effect of TF co-occupancy level on transcriptional output24,25.
a Frequency of Footprint-C 1D peak sites with 1, 2, 3, or 4+ footprints in K562 or HEK293T. b V-plots are shown for two 1 V sites with tracks from Footprint-C fragment ends, 1D signal, DNase-seq, CTCF and MAZ ChIP-seq. c V-plot is shown for one 3 V site with tracks from Footprint-C fragment ends, 1D signal, DNase-seq, TBP, CTCF and ZNF384 ChIP-seq. d Frequency of each TF with footprints in 1 V or 2 V+ sites in K562 or HEK293T. e Frequency of top 40 TF-TF pairs, from all 2 V sites in K562 or HEK293T. f, h Frequency of motif orientation of all CTCF:CTCF (f) or CTCF:FOX (h) pairs from 2 V sites in K562 or HEK293T. g, i Screenshots showing tracks from Footprint-C 1D, CTCF, RAD21, FOXK2 ChIP-seq and MNase-seq in K562. j V-plots of three types of CTCF 1 V sites are shown with CTCF and other TF ChIP-seq, and Footprint-C 1D average signal in K562. A diagram is shown at the bottom. The red arrows in b, c, g, i mark the indexed CTCF or FOX motifs.
The definition of 1 V, 2 V, and 3 V sites prompted us to investigate modes of TF local co-occupancy. TFs appear to possess preferences for binding alone or with other TFs in close proximity, and such preferences seem to be largely conserved between K562 and HEK293T cells (Fig. 2d). The members from the RFX (regulatory factor X) and CUX (cut like homeobox) families have more footprints at 1 V sites than the rest of the sites, while NRF1 and SP family members prefer to co-localize with other TFs26 (Fig. 2d). In contrast, the frequent duos of TFs at the 2 V and 3 V sites are highly cell type-specific. KLF (Krüppel-like factors) and HOX (homeobox) family members are the most frequent TFs co-localize with an array of other TFs in K562 and HEK293T cells respectively (Fig. 2e and Supplementary Fig. 3f). The conformational configuration of two interacting TFs, and thus the relative orientation of two adjacent motif sequences, may dictate a functional link between the two TFs27. We next investigated the four possible orientations of two adjacent motif sequences at all 2 V sites. The tandem orientation of two adjacent CTCF motifs are highly frequent in both cell lines28 (Fig. 2f, g). In contrast, the two adjacent motifs of CTCF and FOX (forkhead box) family TFs exhibit a preferential divergent orientation (Fig. 2h, i and Supplementary Fig. 3g, h). The preferred orientations between two adjacent motifs suggest preferred dimerization modes between the two TFs, which in turn, may suggest functional cooperative activities between the two TFs29,30. Interestingly, when analyzing 1 V sites annotated with only CTCF motifs, we noticed some left or right skewed “V” shapes on V-plot (Fig. 2j). To examine if these skewed CTCF “V” shapes are due to other TFs interacting with CTCF but not with DNA, we analyzed the various TF ChIP-seq datasets from ENCODE31, and identified the concordantly skewed ChIP-seq signals from TFs like MAX32, BHLHE4033, ELF134, and IRF2 (Fig. 2j). These results indicate a wide-spread cooperativity in TF local residence, with preferences for both interaction partner choice and interaction conformation.
Footprint-C efficiently identifies chromatin structural features
We next investigated the performance of Footprint-C dataset at pairwise contact level. The forward-reverse (FR) orientation of two fragments within ligation products are biased among most Hi-C datasets, ranging from 38-90% frequency (vs 25% expected) within short-distance contact pairs, possibly due to spurious capture of the dangling-end, re-ligated or undigested fragments35,36. Importantly, Footprint-C dataset exhibits the most balanced frequency of fragment orientation across all datasets from different Hi-C methods (Fig. 3a and Supplementary Fig. 4a), suggesting the exclusion of dangling-end products in Footprint-C procedures, and showing the validity of short-distance interactions in Footprint-C results. The Footprint-C contact matrix and heatmap exhibit canonical chromatin structural features like contact domains, chromatin loops and stripes (Fig. 3b). The global correlation coefficient between Footprint-C and in situ Hi-C or Micro-C is a bit lower than the coefficient between in situ Hi-C and Micro-C (Supplementary Fig. 4b), possibly due to TF-centric fragment locations in Footprint-C method (Supplementary Fig. 1h), which is in lack in the other two methods. Other than this, Footprint-C produces chromatin contact maps consistent with in situ Hi-C or Micro-C contact maps at levels of global interaction distance, compartment segregation, and contact domain insulation (Supplementary Fig. 4c–h).
a Frequency of read mate orientation of different Hi-C datasets when distance <1 kb. b A screenshot showing Footprint-C contact heatmap with tracks from H3K27ac, ZNF143, KLF16, CTCF, RAD21 ChIP-seq, total RNA-seq, DNase-seq and Footprint-C 1D in K562. Gray shades mark the positions of non-CTCF loop anchors. c Number of loops (>20 kb) or stripes detected in Footprint-C, Micro-C, or in situ Hi-C datasets at different sampling size or by different methods. d Overlap of Mustache loops between Footprint-C & Micro-C datasets in K562. e Aggregate plots for common or Footprint-C specific loops. f Fraction of different loop types from common or Footprint-C specific loops. Enh, enhancer. Pro, promoter. g Size of common or Footprint-C specific structural A-A or B-B loops. Min: lower end of violin. Q1: lower bound of box. Q2: line in box. Q3: higher bound of box. Max: higher end of violin. P was calculated by two-sided Wilcoxon–Mann–Whitney test. Effect sizes are shown. h Footprint-C (right up) and Micro-C (left down) contact heatmaps at the same sampling size of 1800 M contacts showing one specific A-A (left) or B-B (right) structural loop, with tracks from genome annotation, PC1 value, CTCF, RAD21 or H3K9me3 ChIP-seq. Black squares mark the two loops. i, Aggregate plots of loops in DMSO or dTAG-7 treated HEK293T. The loops were from Footprint-C in WT HEK293T, and were aggregated at the center of a ±200 kb window at 5-kb resolution. The average enrichment score for loops was shown in top left corner. Source data are provided as a Source Data file.
To benchmark the performance of Footprint-C in identifying chromatin structural features, the number of chromatin loops and stripes obtained from Footprint-C, Micro-C, or in situ Hi-C datasets in K562 cells were compared in parallel. Footprint-C identified the most chromatin loops or stripes across these datasets, at all different sizes of samplings or with different methods (Fig. 3c and Supplementary Fig. 4i). Under the same sequencing depth, Footprint-C identified about 50% more chromatin loops than Micro-C (Fig. 3c–e). The Footprint-C specific loops show a larger structural fraction than the common loops with Micro-C (Fig. 3f), and are bigger in loop size (Fig. 3g), urging us to investigate the nature of these specific structural loops. The common structural loops have more from A compartments (A-A) than from B compartments (B-B) (49.3% vs 28.9%), while the specific structural loops are the opposite (19.7% vs 59.4%) (Supplementary Fig. 4j). The B-B structural loops are decorated by repressive H3K9me3 marks throughout, but are demarcated by CTCF and Cohesin as well as the A-A structural loops (Fig. 3h). We further investigated the chromatin stripes identified by Footprint-C by categorizing them into three groups: chromatin loop domains with left-, right-, or both-sided stripes (Supplementary Fig. 4k). The appearance and direction of the stripes correlate with the density of TF footprints at the respective loop anchors (Supplementary Fig. 4l), suggesting a functional link between loop extrusion initiation and TF residence at loop anchors37, and that a broad spectrum of TFs, including CTCF and others, may contribute to the structural framework of high-order genome organization.
To rule out the possibility that the chromatin high-order structural features identified by Footprint-C may contain artifacts from its 1D enrichment at accessible sites, we established a RAD21 (a Cohesin subunit) degron cell line in HEK293T, and performed Footprint-C. Under acute Cohesin depletion, the chromatin loops and their associated stripes were severely impaired (Fig. 3i). Because Cohesin loss does not affect chromatin accessibility in general38, this result clearly demonstrates the validity of the chromatin structural features identified by Footprint-C, and that Footprint-C’s supreme performance in finding these structures is not simply due to its preferential coverage of the accessible portion of the genome.
High-resolution chromatin contact maps built upon TF footprints
The V-plot analysis demonstrates that most Footprint-C fragments exhibit protection by the binding of a variety of TFs other than CTCF (Fig. 1g and Supplementary Fig. 5). The genuine nature of footprints resolved by Footprint-C enabled us to unambiguously categorize the interactions mediated by a variety of TF-TF pairs. CTCF-CTCF pairs show predominant long-range interactions, whereas pairs between CTCF and the other TFs show a biphasic interaction range (Fig. 4a). About half of the TF pairs exclusive of CTCF interact under 10 kilobase (kb) range (Supplementary Fig. 6a). Based on this interaction range distinction, we categorized all TF-TF pairs into short-range (0.3-10 kb) and long-range (>10 kb) groups. In both K562 and HEK293T cells, the contacts mediated by two CTCF motifs are the biggest category only in the long-range group, and MAZ-MAZ contacts are the biggest category in the short-range group (Fig. 4b and Supplementary Fig. 6b). Regarding interaction partner choice, CTCF mostly interacts with another CTCF in the long-range group, while all other TFs appear to have a balanced choice in forming homo- or hetero-pairs (Fig. 4c and Supplementary Fig. 6c). The convergent orientation of two interacting CTCF motifs is a hallmark of CTCF-mediated chromatin loops in mammals (Fig. 1h and Supplementary Fig. 1k). We next examined if such orientation preferences exist in other interacting TF-TF pairs. A mild preference (31–34% vs 25% expected) for the convergent orientation was observed between CTCF and other TFs, like MAZ, EGR139 (Early Growth Response 1), or ZNF14340,41,42, only in the long-range group (Fig. 4d). No such preferences were observed between TF-TF pairs exclusive of CTCF (Supplementary Fig. 6d). The diversity of short- and long-range contacts suggests that a variety of TF-TF interactions may orchestrate the local and global chromatin organization through division of labor6.
a Interaction distance of different TF-TF pairs. b TF-TF pairs ranked by counts of homogeneous pairs when the two mates are 0.3-10 kb (up) or >10 kb (down) apart in K562. Only “inside-V” footprint-bearing fragments shown in Supplementary Fig. 5 are counted. c Correlation of homogeneous and mean heterogeneous pair counts of TFs shown in b. d Frequency of motif orientation of heterogenous pairs between CTCF and any of EGR1, MAZ, or ZNF143 when the two mates are 0.3–10 kb (up) or >10 kb (down) apart in K562. e Screenshots of CTCF motif-based (left) and regular (right) Footprint-C contact maps with CTCF ChIP and Footprint-C fragment density shown. The two CTCF motifs in red are 112 bp apart. f Screenshots of CTCF motif-based and regular Footprint-C contact maps of a three-way interaction. g Frequency of motif orientation of three-way CTCF contacts. h Screenshots of CTCF & MAZ motif-based and regular Footprint-C contact maps of a three-way interaction involving CTCF and MAZ. i Frequency of motif orientation of three-way contacts involving CTCF and one of EGR1, KLF, or MAZ. The red arrows in e, f, h mark the indexed CTCF or MAZ motifs. Source data are provided as a Source Data file.
The resolution of TF identities in most Footprint-C contacts further prompted us to build a chromatin contact map solely upon TF motifs. Due to its predominant role in genome organization, we first built a contact map upon 112,878 CTCF motifs compiled from all human CTCF ChIP-seq datasets from ENCODE43. This motif-based contact map not only largely reproduces chromatin structures in a regular Hi-C contact map, but also surprisingly separates interactions emanating from two adjacent CTCF motifs only ~100 bp apart (Fig. 4e), or resolves interactions from a particular CTCF motif when another inert motif is only ~100 bp away (Supplementary Fig. 6e), both of which are indiscernible in contact maps from other Hi-C methods. The motif-based contact map also helps resolve CTCF-mediated multiway interactions. In total, we identified 6,726 three-way contacts involving three CTCF motifs, including one interacting over the promoter region of the RNA methyltransferase gene Nsun4 (Fig. 4f), which was verified by data from a recent multiway study in K562 cells44. The combination of convergent and tandem motif orientation appears to be the predominant pattern (66% vs 25% expected) among such CTCF three-way interactions (Fig. 4g). Finally, we accommodated both homogeneous and heterogeneous TF-TF interactions to build a contact map upon motifs of both CTCF and MAZ, another architectural TF involved in genome organization22,23. CTCF also forms multiway interactions with MAZ (Fig. 4h), and potentially with other TFs, mainly through convergent motif orientation (Fig. 4i), suggesting that a rich regulatory lexicon of TF-TF interactions involving TF identity, valency, and orientation, underlies complex mammalian genome folding.
Discussion
Since the birth of Hi-C method2, there have been a handful of derivative methods developed for mapping genome organization, based on the principle of proximity ligation. The two major directions for technological evolvement are resolution enhancement, exemplified by Micro-C with nucleosomal DNA units resolved, and the preferential coverage of the accessible portion of the genome, represented by HiCAR and Hi-TrAC using Tn5 transposase. These methods outperform the 1st generation Hi-C and the 2nd generation in situ Hi-C methods resolution-wise, but still have limitations. Most ligation products they capture are either between ~150 bp nucleosomal DNA which often excludes TF binding45,46 (Fig. 1b), or between long fragments either not including the TF binding sites, or having ambiguities in identifying the responsible cis elements (Fig. 1f). Furthermore, in the cases of HiCAR and Hi-TrAC, restriction enzyme digestion or sonication was used after Tn5 transposition to further fragment DNA, potentially wiping out the TF footprint sequences from the DNA fragment, therefore losing the identity information of cis elements and trans TFs responsible for the interactions. In Footprint-C procedures, we fine-tuned DNase digestion to approximate an ideal length region of TF footprints, which further enabled us to use gel excision to strictly select those ligation products directly between two footprints (Supplementary Fig. 1a). This size selection step also helps avoid the sonication or the restriction enzyme digestion step, therefore keeping the integrity of footprints. These optimizations enable the unambiguous identification of cis regulatory elements and trans TFs, and thus the TF-TF interactions underlying genome-wide chromatin interactions in Footprint-C data.
In the loop extrusion model explaining chromatin loop formation in mammals47, a pair of CTCF proteins binding to two convergently oriented motifs are required to stop the extrusion of Cohesin complex, thereby demarcating the chromatin loops along with Cohesin. Intriguingly, Drosophila CTCF proteins sharing highly conserved DNA-binding domain and motif sequence with mammalian CTCF proteins don’t exhibit such preferences for convergently oriented motifs when forming interactions20, suggesting that mammalian CTCF proteins may acquire the function to stop Cohesin extrusion during evolution, to fulfill the requirements of mammalian genome organization. A recent study identified a handful of TFs other than CTCF to serve as barriers in stopping Cohesin extrusion in mammalian cells48. Indeed, in Footprint-C data, we identified a rich collection of long-range TF-TF interactions between CTCF and other TFs, and such interactions do exhibit mild preferences for the convergent orientation of CTCF and the other motifs (Fig. 4d). It is plausible to ponder if these results suggest the emerging of new types of extrusion barrier TFs over the horizon of evolution, to suit the complex needs of gene expression regulation in mammals. Functional studies, like the inducible degron technologies49, will accelerate the resolution of the roles these candidate TFs play in mammalian genome organization.
It is noteworthy that two recent preprints presented works on acute depletion of ZNF143, and found no significant changes to genome organization, suggesting its indirect role in genome organization50,51. They went on to propose a model of collective actions by different TFs (other than CTCF) on genome organization. This model is compatible with our findings by Footprint-C that a collection of different TFs interact with each other in complex but preferential modes, thus may act on genome organization in an additive manner.
In summary, Footprint-C presents high-resolution chromatin contact maps built upon intact and genuine TF footprints, reveals a rich regulatory lexicon of TF-TF interactions underlying chromatin high-order structures, and therefore holds great promise for studying genome organization in biological processes with vigorous gene expression changes driven by TF regulatory cascades, such as cellular differentiation and cell fate reprogramming52.
Methods
Cell culture
K562 (ATCC: CCL-243) cells were cultured at 37 °C with 5% CO2 in RPMI-1640 (Vivacell, C3010-0500) supplemented with 10% FBS (Nobimpex, B118-500). HEK293T (ATCC: CRL-3216) cells were cultured at 37 °C with 5% CO2 in DMEM (Gibco, C11995500BT) supplemented with 10% FBS. Kc167 cells were cultured at 25 °C in SIM SF Expression Medium (Sino Biologica, MSF1) supplemented with 1% penicillin-streptomycin (Gibco, 15140122).
Generation of RAD21 degron cell line
The RAD21 degron cell line (HEK293T: RAD21-linker-FKBPF36V-3xHA-P2A-BSD) was generated by CRISPR/Cas9-mediated genome editing as previously described53 with following modifications. The homology-directed repair template was constructed as follows: (1) A 500 bp left homology arm before stop codon of RAD21 gene was PCR amplified from genomic DNA; (2) A 500 bp fragment including and after the stop codon of RAD21 gene was amplified as the right homology arm; (3) A linker-FKBPF36V-3xHA-P2A-BSD sequence was PCR amplified; (4) The pCRIS-PITChv2-dTAG-BSD (BRD4) (AddGene, 91795) plasmid was digested with the restriction enzymes EcoRV-HF (NEB, R3195S) and XbaI (NEB, R0145V), and the large fragment was purified with agarose gel electrophoresis. The four fragments were assembled through Seamless Cloning (Beyotime, D7010S) to serve as the repair template vector. A single-guide RNA targeting the 3′ end of RAD21 gene was designed with the CRISPOR tool54. Sense and antisense oligos were annealed together (sgDNA) and cloned into the pX458 plasmid. HEK293T cells were seeded to 6-well plates and transfected with VigoFect (Vigorous Biotechnology, T001), 1 μg of sgDNA plasmid, and 2 μg of repair template fragment which was PCR amplified from repair template vector. Transfected cells were cultured for 96 h, and then selected with 10 µg/mL blasticidin. Single colonies were isolated into 96-well format and further cultured for 14 days. Successfully edited homozygous clones were confirmed by genotyping and Western blot. The RAD21 degron cells were treated with 1 μM dTAG-7 (MCE, HY-123941) or DMSO as control for 4 h.
Footprint-C experimental procedures
Briefly, 5 million cells were fixed by freshly made 1% formaldehyde at room temperature for 10 min. Cells were lysed and nuclei were extracted. Nuclei were digested by 16 U of DNase I (NEB, M0303S) for 30 min. The digested DNA was blunt-ended by End Repair Mix (NEB, E6050L) and the DNA overhangs were added by Taq DNA Polymerase (Thermo, EP0406). The A-tailed DNA was ligated with a bridge linker (Forward: /5Phos/GCCCGG/iBiodT/NNACGCCCGT, Reverse: /5Phos/CGGGCGTNNACCGGGCT) by T4 DNA ligase (Thermo, EL0012) at 16 °C for 4 h. The excessive bridge linkers were removed by Lambda Exonuclease (NEB, M0262L) and Exonuclease I (NEB, M0293L). Proteinase K was added and incubated at 65 °C overnight. DNA was purified by phenol:chloroform:isoamyl alcohol extraction and ethanol precipitation. The DNA was run on a 1.5% agarose gel and a gel slice from 80 to 200 bp was excised and the DNA was purified. Linker-ligated DNA was purified with C1 streptavidin beads (Thermo, 65002). Purified DNA was proceeded on streptavidin beads with end repair, A-tailing, and ligation to Illumina adapters. The DNA was then subjected to ~7–10 cycles of PCR using Illumina paired-end primers and KAPA Polymerase (Roche, KK2502). Amplified library was purified and subject to Illumina NovaSeq 6000 paired-end sequencing.
Footprint-C Data preprocessing
Read pairs were first trimmed by the Illumine adapter and bridge linker sequence (AGCCCGGTNNACGCCCGT, both forward and reverse complementary) from both ends by Trim Galore (https://github.com/FelixKrueger/TrimGalore) and Cutadapt (https://github.com/marcelm/cutadapt/). Only read pairs with bridge linker sequence detected and with both mates ≥10 bp after trimming were kept. Valid Footprint-C fragment contact pairs were obtained from the HiC-Pro55 analysis pipeline. The detailed description and code can be found at https://github.com/nservant/HiC-Pro. In brief, a pair of trimmed fastq files were mapped to the human (hg38) or Drosophila (dm6) genome separately by Bowtie256 with ‘--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder’ mode. Aligned reads were paired by the read name. Pairs with multiple hits, low MAPQ (<=10), singleton, dangling end, and self-circle were removed. The fragment contact pairs were obtained from the output BAM files containing paired aligned reads by removing PCR duplicates and were used in downstream analyses. The all 5’ valid pairs were extracted from the fragment contact pairs and were converted to contact matrices in COOL filetype using the Cooler57 package. The Footprint-C, Micro-C, in situ Hi-C, and Hi-TrAC raw contact matrices were generated and visualized in heatmaps by cooltools (https://github.com/open2c/cooltools). The iterative correction (IC) -normalized contact matrices were used for all other bioinformatic analysis58.
Public datasets
Public data used in this study, including in situ Hi-C, Micro-C, BL-Hi-C, DNase Hi-C, CAP-C, Hi-TrAC, HiCAR, HiPore-C, DNase-seq, RNA-seq, and ChIP-seq, were summarized in Supplementary Data 1. The in situ Hi-C and Micro-C datasets were preprocessed in the same way as Footprint-C without going through the steps of trimming linker. The BL-Hi-C, DNase Hi-C and CAP-C datasets were preprocessed in the same way as Footprint-C. HiCAR datasets were processed as described in ref. 59. Briefly, reads were aligned to hg38 reference genome using bwa mem with flags-SP. Alignments were parsed, and the alignment contact pairs were generated using the pairtools (https://github.com/mirnylab/pairtools). The alignment contact pairs with low mapping quality (MAPQ < 10) was filtered out. The alignment contact pairs with the same coordinate on the genome or mapped to the same digestion fragment were removed. The alignment contact pairs of Hi-TrAC and multiway interactions of HiPore-C were directly downloaded from Gene Expression Omnibus (GEO).
Enrichment analysis
The fragment contact pairs of Footprint-C or other Hi-C datasets were first converted to single alignment BED files. Alignments of other Hi-C datasets were scaled to 60 bp, the median fragment length of Footprint-C. genomeCoverageBed in bedtools60 were used to compute normalized signal (reads per million for hg38). The average signal distribution relative to the occupied CTCF motifs and distal DNase I hypersensitive sites (dDHS) was computed across ±500 bp by computeMatrix in deepTools61. The average signal of Read 1 or Read 2 alignments in HiCAR dataset around the occupied CTCF motifs and dDHS were calculated respectively. The list of functional CTCF motifs was generated as described previously43. Briefly, the sets of occupied CTCF motifs in K562 and GM12878 were screened by respective CTCF ChIP-seq datasets, and only those with an average RPM > 1 within a 30 bp window surrounding the motif center base were kept. The set of DHS in K562 or GM12878 cells was called from respective DNase-seq dataset using MACS2. DHS with q values > 20 and distance from TSS > 1 kb were defined dDHS.
Footprint-C 1D peak processing
Each two fragments were extracted from Footprint-C pairs and processed to be compatible with MACS262 input BED files. MACS2 was used to identify peaks with the following parameters: --nomodel --extsize 30 -q 0.05 --call-summits --keep-dup all. The reproducible peaks from multiple biological replicates were kept using ChiP-R63.
Footprint annotation and classification
A list of expressed TFs was selected from genes with top 10000 RPM values from RNA-seq data in K562 or HEK293T. Only motifs of expressed TFs from HOMER’s motif collection18 were kept for analysis. The closest motif to Footprint-C 1D peak summit was obtained by bedtools closestBed with parameters: -t first. Different TFs from the same family were named in the family name (e.g. KLF). Footprint-annotated peaks were merged and grouped into 3 types of sites (1 V, 2 V, and 3 V) based on the distance between two adjacent footprints. 1 V sites were defined as single footprints with distances >200 bp from their nearest footprints. 2 V sites were defined as two footprints with distances <200 bp, but >200 bp from nearest 3rd footprint. 3 V sites were defined as three footprints with distances <200 bp between any two, but >200 bp from nearest 4th footprint.
Classification of CTCF 1 V sites
In Fig. 2j, CTCF motifs within 1 V sites were extended by ±20 bp from the motif center base. Footprint-C fragment ends within the upstream or downstream 20 bp regions were counted. Skewed CTCF 1 V sites were classified into two categories based on the ratio between two regions (ratio ≥4 or ≤0.25). The third category was randomly picked from all CTCF 1 V sites.
Annotation of promoters with footprints
In Supplementary Fig. 3e, the promoters were defined as 500 bp regions upstream of Transcription Start Site (TSS). A promoter was considered as a “1 V” promoter only if the summit of a 1 V site falls within this promoter region. Similarly, a promoter was considered as a “2 V” or “3 V” promoter only if all summits of the 2 V or 3 V site falls within this promoter region.
Calculation of inside-V fragment ratio
Inside-V fragment ratio was calculated as a/(a + b1 + b2) as shown in Supplementary Fig. 3c, where a represents the number of fragments located within the V shape enclosed by two borders extending from the footprint summit with slopes of ±2 (the theoretical slopes of V-plot). b1 and b2 represent the number of fragments located within region of ±50 bp from footprint summit but not in the V region.
TAD calling
TADs were called using hicFindTADs64 at 40-kb resolution with settings: --thresholdComparisons 0.05 --delta 0.01 --correctForMultipleTesting fdr.
Loop calling
Loops were computed using Mustache65 or Chromosight66. For Mustache, loops were called at 5-kb resolution using the options -r 5000 --pThreshold 0.1. For Chromosight, loops were called by Chromosight detect function at 5-kb resolution. Loops smaller than 20 kb were removed. Common and specific loops between Footprint-C and Micro-C were characterized as described previously67. Briefly, loops with both anchors overlapping were called as common loops. Loops with one or neither anchor overlapping were called as specific loops. Anchors were extended ±20 kb, and pairtopair in bedtools were used to characterize common loops with parameters -type both -f 0.5. Footprint-C or Micro-C specific loops were characterized with parameters -type notboth -slop 40000. Pile-up plots of loops were generated using cooltools.
Loop classification
Loops were annotated and categorized into promoter-promoter (Pro-Pro), promoter-enhancer (Pro-Enh), enhancer-enhancer (Enh-Enh), structural between A compartments (A-A structural), structural between B compartments (B-B structural), or other loops based on the annotation of active TSS, H3K27ac ChIP-seq peaks68 and PC1 values from Footprint-C data. H3K27ac peaks were called from K562 H3K27ac ChIP-seq dataset using MACS262. RNA-seq data was used to annotate active TSS. Firstly, we extended anchors ±10 kb. Anchors containing active TSS were defined as Pro. Secondly, remaining anchors containing H3K27ac peaks were defined as Enh. Thirdly, remaining anchors were further defined as structural A or B anchors based on the PC1 value of anchors. Pro-Pro, Pro-Enh, Enh-Enh, A-A structural, and B-B structural loops were defined by the respective annotation of the two anchors.
Stripe calling
Stripes were called from contact matrices using Stripenn69 or StripeCaller70. The stripes were called by Stripenn at 5-kb resolution with the following settings: -m 0.95,0.96,0.97,0.98,0.99 -p 0.1. Stripes were called by StripeCaller with the following settings: --local-num 2 --fold-enrichment 1.1 --min-seed-len 6. Pile-up plots of stripes were obtained using coolpup.py71, with the following settings: --local --rescale. The loop domains in Supplementary Fig. 4k were annotated by horizontal and vertical stripes obtained by Stripenn, and were divided into four types of left-, right-, both-sided, and no stripe loops. The loop domains were divided into 50 equally distanced intervals. The left or right anchors were extended 10 intervals forward or backward respectively. The percentage of loops overlapping with 1 V sites in each interval were calculated.
Insulation score analysis
The insulation scores72 were calculated at 40-kb resolution using the cooltools package.
Compartment analysis
The compartments were identified at 100-kb resolution using cooltools package. The eigenvector of the first principal component represents the compartment profile, with positive and negative values representing A and B compartments respectively2.
Motif preprocessing
Transcription factor motif coordinates were obtained from HOMER18 and CIS-BP73 database. The coordinates from HOMER were used directly. The coordinates from CIS-BP were annotated by FIMO74. The lists of functional CTCF, EGR1, KLF, MAZ and SP motifs were generated as described previously43. Briefly, for K562 cells, the CDC5L, E2F4, TFDP1, VEZF1, and ZNF143 motif coordinates were screened by respective ChIP-seq data, and only those with an average RPM value above 0.2 within a 30 bp window surrounding the motif center base were kept. MTF2, RBAK, and ZFP69B motif coordinates were screened by DNase-seq data, and only those with an average RPM value above 0.1 within a 30 bp window surrounding the motif center base were kept. For HEK293T cells, the RBAK and ZNF69B motif coordinates were screened by respective ChIP-seq data, and only those with an average RPM value above 0.1 within a 30 bp window surrounding the motif center base were kept. E2F4, EGR1, ZNF143, VEZF1, CDC5L, TFDP1, and MTF2 motif coordinates were screened by DNase-seq data, and only those with an average RPM value above 0.1 within a 30 bp window surrounding the motif center base were kept.
Motif uniqueness analysis
The fragment contact pairs of Footprint-C and other Hi-C datasets were first converted to single fragment BED files. The Footprint-C fragments less than or equal to 60 bp in length were kept for analysis. The fragments of in situ Hi-C were extended to the nearest GATC site (DpnII). The fragments of BL-Hi-C were extended to the nearest GGCC sites (HaeIII). The fragments of Hi-TrAC were extended to the nearest AATT or CATG sites (MluCI or NlaIII). The fragments of HiCAR were extended to the nearest GTAC sites (CviQI). The fragments of Micro-C were extended ±75 bp from the center base. Fragment coordinates were then intersected with all HOMER motif coordinates by intersectBed in bedtools. Finally, the proportions of fragments from Footprint-C, in situ Hi-C, BL-Hi-C, Hi-TrAC, HiCAR, or Micro-C datasets annotated with zero, one or multiple motifs were calculated.
Analysis of motif orientation of CTCF-CTCF contacts
The closest CTCF motif to upstream and downstream read pair within contacts from Footprint-C or others Hi-C datasets was obtained by bedtools closestBed with parameters: -t first. Only the contacts with both alignments overlapping with center base of a CTCF motif were counted. The frequencies of CTCF contacts were calculated according to the orientations of the two motifs (+−: convergent, ++: tandem F, −+: divergent, −−: tandem R).
Construction of TF motif-based contact maps
The lists of functional CTCF or MAZ motifs were generated as described previously43 and also under “Motif uniqueness analysis”. The motifs were sorted by genome coordinates, indexed, and used to construct a genome-wide 2D contact map. The Footprint-C fragments were annotated by the indexed motifs. The contact pairs with motifs annotated on both fragments were extracted and dumped to the respective bins in the 2D contact map. The counts in each bin were shaded by four different colors according to the orientations of the two motifs (+−, ++, −+, −−). The motif-based contact maps were plotted using ggplot2, and were merged using Adobe Illustrator.
Three-way interaction analysis
The three-way interactions were inferred from two-way motif interactions using quick-cliques (https://github.com/darrenstrash/quick-cliques). For three-way interactions between CTCF and MAZ, KLF, or EGR1, the obtained four-way interactions were split into three-way interactions in addition to the directly inferred three-way interactions. The inferred three-way interactions were then intersected with HiPore-C data by intersectBed in bedtools, and only interactions with all three motifs located in three different fragments of a single multiway complex were kept for motif orientation analysis.
Statistics and reproducibility
The stratum-adjusted correlation coefficient using HiCRep75 was calculated for 100-kb resolution contact matrices by parameters settings of --h 1 --dBPMax 100000 --binSize 100000. Comparisons were performed between two biological replicates of Footprint-C libraries, or between Footprint-C and Micro-C or in situ Hi-C libraries. Two-sided Wilcoxon rank-sum test was used to analyze differences between groups of data. Statistical details were shown in the figure or legends. No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All raw and processed data can be obtained at Genome Sequence Archive (GSA) under accession number HRA004768. Detailed statistics of Footprint-C datasets are provided in Supplementary Data 2. A list of Footprint-C specific 1D peaks as described in Supplementary Fig. 1i is provided as Supplementary Data 3. Source data are provided with this paper.
Code availability
Scripts used to process the data are available at: https://github.com/xiaokunliu01/Footprint-C and https://doi.org/10.5281/zenodo.14191418.
References
Misteli, T. The Self-Organizing Genome: Principles of Genome Architecture and Function. Cell 183, 28–45 (2020).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Hsieh, T. H. et al. Mapping Nucleosome Resolution Chromosome Folding in Yeast by Micro-C. Cell 162, 108–119 (2015).
Wei, X. et al. HiCAR is a robust and sensitive method to analyze open-chromatin-associated genome organization. Mol. Cell 82, 1225–1238 e6 (2022).
Liu, S., Cao, Y., Cui, K., Tang, Q. & Zhao, K. Hi-TrAC reveals division of labor of transcription factors in organizing chromatin loops. Nat. Commun. 13, 6679 (2022).
Li, T., Jia, L., Cao, Y., Chen, Q. & Li, C. OCEAN-C: mapping hubs of open chromatin interactions across the genome reveals gene regulatory networks. Genome Biol. 19, 54 (2018).
Hsieh, T. S. et al. Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding. Mol. Cell 78, 539–553 e8 (2020).
Krietenstein, N. et al. Ultrastructural Details of Mammalian Chromosome Architecture. Mol. Cell 78, 554–565 e7 (2020).
Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).
Ma, W. et al. Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat. Methods 12, 71–78 (2015).
Gridina, M. et al. A cookbook for DNase Hi-C. Epigenetics Chromatin 14, 15 (2021).
You, Q. et al. Direct DNA crosslinking with CAP-C uncovers transcription-dependent chromatin organization at high resolution. Nat. Biotechnol. 39, 225–235 (2021).
Ong, C. T. & Corces, V. G. CTCF: an architectural protein bridging genome topology and function. Nat. Rev. Genet 15, 234–246 (2014).
Liang, Z. et al. BL-Hi-C is an efficient and sensitive approach for capturing structural and regulatory chromatin interactions. Nat. Commun. 8, 1622 (2017).
Barshad, G. et al. RNA polymerase II dynamics shape enhancer-promoter interactions. Nat. Genet 55, 1370–1380 (2023).
Henikoff, J. G., Belsky, J. A., Krassovsky, K., MacAlpine, D. M. & Henikoff, S. Epigenome characterization at single base-pair resolution. Proc. Natl Acad. Sci. USA 108, 18318–18323 (2011).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Xu, C. & Corces, V. G. Towards a predictive model of chromatin 3D organization. Semin Cell Dev. Biol. 57, 24–30 (2016).
Rowley, M. J. et al. Evolutionarily Conserved Principles Predict 3D Chromatin Organization. Mol. Cell 67, 837–852 e7 (2017).
Phillips, J. E. & Corces, V. G. CTCF: master weaver of the genome. Cell 137, 1194–1211 (2009).
Xiao, T., Li, X. & Felsenfeld, G. The Myc-associated zinc finger protein (MAZ) works together with CTCF to control cohesin positioning and genome organization. Proc. Natl Acad. Sci. USA 118, e2023127118 (2021).
Ortabozkoyun, H. et al. CRISPR and biochemical screens identify MAZ as a cofactor in CTCF-mediated insulation at Hox clusters. Nat. Genet 54, 202–212 (2022).
Akerberg, B. N. et al. A reference map of murine cardiac transcription factor chromatin occupancy identifies dynamic and conserved enhancers. Nat. Commun. 10, 4907 (2019).
Nie, Y., Shu, C. & Sun, X. Cooperative binding of transcription factors in the human genome. Genomics 112, 3427–3434 (2020).
Zhao, Y. et al. Stripe” transcription factors provide accessibility to co-binding partners in mammalian genomes. Mol. Cell 82, 3398–3411 e11 (2022).
Georgakopoulos-Soares, I. et al. Transcription factor binding site orientation and order are major drivers of gene regulatory activity. Nat. Commun. 14, 2333 (2023).
Pugacheva, E. M. et al. Comparative analyses of CTCF and BORIS occupancies uncover two distinct classes of CTCF binding genomic regions. Genome Biol. 16, 161 (2015).
Leng, F. et al. The transcription factor FoxP3 can fold into two dimerization states with divergent implications for regulatory T cell function and immune homeostasis. Immunity 55, 1354–1369 e8 (2022).
Choi, Y. et al. FOXL2 and FOXA1 cooperatively assemble on the TP53 promoter in alternative dimer configurations. Nucleic Acids Res. 50, 8929–8946 (2022).
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Zhang, K., Li, N., Ainsworth, R. I. & Wang, W. Systematic identification of protein combinations mediating chromatin looping. Nat. Commun. 7, 12249 (2016).
Hu, G. et al. Systematic screening of CTCF binding partners identifies that BHLHE40 regulates CTCF genome-wide distribution and long-range chromatin interactions. Nucleic Acids Res. 48, 9606–9620 (2020).
Tang, Z. et al. CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell 163, 1611–1627 (2015).
Ohno, M. et al. Sub-nucleosomal Genome Structure Reveals Distinct Nucleosome Folding Motifs. Cell 176, 520–534 e25 (2019).
Kadota, M. et al. Multifaceted Hi-C benchmarking: what makes a difference in chromosome-scale genome scaffolding? Gigascience 9, giz158 (2020).
Vian, L. et al. The Energetics and Physiological Impact of Cohesin Extrusion. Cell 173, 1165–1178 e20 (2018).
Rao, S. S. P. et al. Cohesin Loss Eliminates All Loop Domains. Cell 171, 305–320 e24 (2017).
Wong, K. M., Song, J. & Wong, Y. H. CTCF and EGR1 suppress breast cancer cell migration through transcriptional control of Nm23-H1. Sci. Rep. 11, 491 (2021).
Bailey, S. D. et al. ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat. Commun. 2, 6186 (2015).
Zhou, Q. et al. ZNF143 mediates CTCF-bound promoter-enhancer loops required for murine hematopoietic stem and progenitor cell function. Nat. Commun. 12, 43 (2021).
Zhang, M., Huang, H., Li, J. & Wu, Q. ZNF143 deletion alters enhancer/promoter looping and CTCF/cohesin geometry. Cell Rep. 43, 113663 (2024).
Xu, C. & Corces, V. G. Nascent DNA methylome mapping reveals inheritance of hemimethylation at CTCF/cohesin sites. Science 359, 1166–1170 (2018).
Zhong, J. Y. et al. High-throughput Pore-C reveals the single-allele topology and cell type-specificity of 3D genome folding. Nat. Commun. 14, 1250 (2023).
Lambert, S. A. et al. The Human Transcription Factors. Cell 172, 650–665 (2018).
Yuan, G. C. et al. Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309, 626–630 (2005).
Davidson, I. F. & Peters, J. M. Genome folding through loop extrusion by SMC complexes. Nat. Rev. Mol. Cell Biol. 22, 445–464 (2021).
Ortabozkoyun, H. et al. Members of an array of zinc-finger proteins specify distinct Hox chromatin boundaries. Mol. Cell 84, 3406–3422 e6 (2024).
de Wit, E. & Nora, E. P. New insights into genome folding by loop extrusion from inducible degron technologies. Nat. Rev. Genet 24, 73–85 (2023).
Magnitov, M. D. et al. ZNF143 is a transcriptional regulator of nuclear-encoded mitochondrial genes that acts independently of looping and CTCF. bioRxiv, https://www.biorxiv.org/content/10.1101/2024.03.08.583864v1 (2024).
Narducci, D. N. & Hansen, A. S. Putative Looping Factor ZNF143/ZFP143 is an Essential Transcriptional Regulator with No Looping Function. bioRxiv, https://www.biorxiv.org/content/10.1101/2024.03.08.583987v1 (2024).
Stadhouders, R., Filion, G. J. & Graf, T. Transcription factors and 3D genome conformation in cell-fate decisions. Nature 569, 345–354 (2019).
Nabet, B. et al. The dTAG system for immediate and target-specific protein degradation. Nat. Chem. Biol. 14, 431–441 (2018).
Concordet, J. P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res 46, W242–W245 (2018).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Abdennur, N. & Mirny, L. A. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics 36, 311–316 (2020).
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Wei, X. L. et al. HiCAR is a robust and sensitive method to analyze open-chromatin-associated genome organization. Mol. Cell 82, 1225–122 (2022).
Quinlan, A. R. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr. Protoc. Bioinforma. 47, 1–34 (2014).
Ramirez, F., Dundar, F., Diehl, S., Gruning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res 42, W187–W191 (2014).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Newell, R. et al. ChIP-R: Assembling reproducible sets of ChIP-seq and ATAC-seq peaks from multiple replicates. Genomics 113, 1855–1866 (2021).
Ramirez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9, 189 (2018).
Roayaei Ardakany, A., Gezer, H. T., Lonardi, S. & Ay, F. Mustache: multi-scale detection of chromatin loops from Hi-C and Micro-C maps using scale-space representation. Genome Biol. 21, 256 (2020).
Matthey-Doret, C. et al. Computer vision for pattern detection in chromosome contact maps. Nat. Commun. 11, 5795 (2020).
Yang, H. et al. A map of cis-regulatory elements and 3D genome structures in zebrafish. Nature 588, 337–343 (2020).
Xu, J. et al. Subtype-specific 3D genome alteration in acute myeloid leukaemia. Nature 611, 387–398 (2022).
Yoon, S., Chandra, A. & Vahedi, G. Stripenn detects architectural stripes from chromatin conformation data using computer vision. Nat. Commun. 13, 1602 (2022).
Vian, L. et al. The Energetics and Physiological Impact of Cohesin Extrusion. Cell 175, 292–294 (2018).
Flyamer, I. M., Illingworth, R. S. & Bickmore, W. A. Coolpup.py: versatile pile-up analysis of Hi-C data. Bioinformatics 36, 2980–2985 (2020).
Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015).
Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017).
Acknowledgements
This work was supported by grants from the Ministry of Science and Technology of China (2022YFC2703303 and 2020YFA0803401 to C.X., and 2022YFC2703300 to Q.W.), and the National Natural Science Foundation of China (32070611 and 32370624 to C.X.).
Author information
Authors and Affiliations
Contributions
C.X. conceived and supervised the project. C.X. and Q.W. acquired funding. C.X. and X.L. designed experiments. X.L. performed the experiments. X.L., H.W., and Q.Z. conducted the analyses and visualizations. N.Z. provided resources. C.X. wrote the manuscript with input from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Ferhat Ay and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, X., Wei, H., Zhang, Q. et al. Footprint-C reveals transcription factor modes in local clusters and long-range chromatin interactions. Nat Commun 15, 10922 (2024). https://doi.org/10.1038/s41467-024-55403-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-55403-7