Introduction

Chromatin interactions between discrete regulatory elements, mediated by architectural type or tissue-specific transcription factors (TFs), constitute the three-dimensional genome organization, and underlie many important nuclear processes, such as transcription, DNA replication, and DNA damage repair1. Proximity ligation-based Hi-C and its derivative methods have been widely used to map chromatin interactions2,3,4,5,6. These methods often involve sequence-specific restriction enzymes to digest the genome before ligating two digested genomic fragments2,3. In such cases, the choice of enzyme, and the frequency and genomic distribution of the cutting sites, affect the efficiency of digestion and ligation. A chromatin interaction is potentially represented in a Hi-C contact map only when the cutting sequences are in close proximity to the two interacting cis elements. Other derivative methods often employ Tn5 transposase to achieve preferential coverage of the accessible portion of the genome5,6,7. These methods all require a further fragmentation step, either using sonication or other restriction enzymes, to break down the ligated chromatin to facilitate library preparation and sequencing. Therefore, many captured and sequenced DNA ligation products do not contain the binding sites of TFs mediating interactions (Supplementary Fig. 1a). The usage of such enzymes functionally irrelevant to the nature of interactions restrains the efficiency of discovering significant structural features from a chromatin contact map, and obscures the identity of underlying cis regulatory elements and trans TFs. The development of Micro-C partially circumvents this problem by employing Micrococcal Nuclease (MNase) to digest the genome, and preferentially captures ligations, and thus interactions between pairs of nucleosomal DNA4,8,9. In this study, we utilize Deoxyribonuclease I (DNase I), a nuclease previously used for mapping TF footprints10, and develop Footprint-C, which specifically captures chromatin interactions between TF footprints, and achieves higher efficiency in identifying chromatin structural features, such as chromatin loops and stripes, compared with other Hi-C methods. The analysis in one-dimensional and pairwise contact level of the Footprint-C datasets suggests extensive TF regulatory modes in both local residence and long-range chromatin interactions.

Results

Footprint-C captures chromatin interactions between TF footprints

A few studies have used DNase to fragment the genome in Hi-C experiments11,12,13. These methods bear no rational steps to preserve TF footprint sequence. They either captured ligation products between long DNA fragments (Supplementary Fig. 1b), possibly due to insufficient DNase digestion, or did DNase digestion on purified DNA molecules in the absence of TFs13. Furthermore, they all used sonication to fragment the genome after proximity ligation, possibly wiping out the footprint sequences from the Hi-C library. Indeed, when inspected with genome-wide coverage of one-dimensional (1D) fragments, these datasets show no enrichment at motifs of CTCF (CCCTC-binding factor), the master regulator of genome folding14, or at distal DNase Hypersensitive Sites (dDHS) (Supplementary Fig. 1c). In our strategy, we performed extensive DNase digestion to approximate an ideal length region of TF footprints, which further enabled us to use gel excision to strictly select those ligation products directly between two footprints. This size selection step also helps avoid the sonication step often used in other Hi-C methods, thereby keeping the integrity of footprints (Supplementary Fig. 1a).

We applied Footprint-C in human chronic myelogenous leukemia cell line K562, and human embryonic kidney cell line HEK293T, and obtained 1.93 and 0.93 billion contacts, respectively. About 86.7% of K562 and 84.5% of HEK293T contacts are cis (Supplementary Fig. 1d, e). The vast majority of Footprint-C contacts are between short fragments with a mean length of 50 base pairs (bp) (Fig. 1a and Supplementary Fig. 1f), consistent with the length of digested fragments in DNase-seq (Supplementary Fig. 1g). Next, we benchmarked K562 Footprint-C dataset with datasets acquired through different Hi-C and derivative methods in K562 cells by previous studies. When compared with in situ Hi-C3, BL-Hi-C15, and Micro-C16 datasets, Footprint-C shows an outstanding 1D enrichment at CTCF motifs and dDHS (Fig. 1b). Notably, HiCAR5 and Hi-TrAC6 are the two methods recently developed for mapping open chromatin interactions by employing Tn5 transposase. Neither exhibits the same degree of enrichment at CTCF motifs or dDHS as Footprint-C (Fig. 1b). V-plot is a form of visual representation of footprints protected by TF binding under DNase digestion, evidenced by the V-shaped enrichment of footprint-bearing fragments17. The canonical “V” patterns of Footprint-C fragments at both CTCF motifs and dDHS strongly indicate that a significant portion of the interacting short fragments ligated in Footprint-C were protected by TF binding under DNase digestion (Fig. 1c). The genome-wide 1D coverage of Footprint-C fragments is highly correlated with the coverage of DNase-seq fragments (Supplementary Fig. 1h). The set of peaks acquired from Footprint-C 1D coverage also largely overlap with the peaks from DNase-seq (Fig. 1d, e). We analyzed the common and specific peaks in K562 cells. The Footprint-C specific peaks exhibit higher enrichment of Footprint-C 1D coverage than DNase-seq, but relatively weak interaction frequency, possibly due to their overall low occupancy level compared with the common peaks (Supplementary Fig. 1i, j). The common peaks show the highest 1D enrichment and interaction frequency among all three sets of peaks (Supplementary Fig. 1i, j). Therefore, while the TF occupancy level seems to be independent of its interaction frequency, the interaction frequency possibly builds on top of the TF occupancy level.

Fig. 1: Footprint-C captures genuine and intact TF footprints.
figure 1

a Length distribution of Footprint-C fragments in K562. b Pile-up of fragments from different datasets at occupied CTCF motifs or dDHS. c V-plot of Footprint-C fragments at CTCF motifs or dDHS. d Venn diagram showing overlap of MACS2 peaks between Footprint-C 1D and DNase-seq in K562 or HEK293T. e A screenshot showing Footprint-C 1D and DNase-seq signals in K562 or HEK293T. The red arrow marks the indexed CTCF motif. f Frequency of unique motif annotation of fragments from different Hi-C datasets. g Frequency of CTCF motif annotation of fragments (blue) or contacts (orange) from different Hi-C datasets. h Frequency of motif orientation of CTCF-CTCF contacts from different Hi-C datasets. i Frequency of motif orientation of CTCF-CTCF contacts from Footprint-C datasets in different cell lines. Source data are provided as a Source Data file.

We went on to annotate the Footprint-C fragments with a whole collection of consensus TF motif sequences18. Across different Hi-C methods, Footprint-C fragments show the highest rate of unique TF motif annotation (83.0%) (Fig. 1f), largely due to the short fragment length and the preservation of intact footprint sequences. About 1.25% of Footprint-C fragments contain a single CTCF motif, and 0.33‰ of Footprint-C contacts are between pairs of CTCF motifs (Fig. 1g). To demonstrate the high accuracy of motif annotation, we calculated the frequency of pairs of convergent CTCF motifs, a hallmark of CTCF-mediated chromatin loops in mammals19. Surprisingly, Footprint-C is the only Hi-C method showing a predominant (60.2% vs 25% expected) frequency of convergent CTCF motif pairs at single contact level (Fig. 1h), which becomes apparent at aggregated chromatin loop level in all other Hi-C datasets (Supplementary Fig. 1k). In contrast, Footprint-C data in fly cell line Kc167 does not show a predominant convergent pattern between interacting CTCF motifs (Fig. 1i), consistent with the findings that CTCF proteins work differently in organizing Drosophila genome than in human genome20. The above results showcase the power of Footprint-C to efficiently capture chromatin contacts between genuine TF footprints functionally relevant to the interactions.

Preferential modes of local TF clustering

The billions of chromatin contacts from Footprint-C, when viewed at 1D level, present a rich collection of TF footprints and an opportunity to analyze TF local residence with unprecedented resolution. We first obtained 1D peaks from the Footprint-C datasets, and classified them into single-, double-, or triple-footprint (1 V, 2 V, 3 V, respectively) sites based on the proximity of two adjacent peak summits (Fig. 2a and see Methods). V-plots at single sites and sampled collections of sites demonstrate the existence of footprints across the whole 1D peak set (Supplementary Figs. 2 and 3a, b). The locations of the footprints within these sites well correspond to the peaks from DNase-seq and TF ChIP-seq datasets (Fig. 2b, c). About 70% of the fragments from these sites are located within the theoretical inside-V area (Supplementary Fig. 3c), showing that footprint-bearing fragments account for a significant portion of the dataset. The consensus motif sequences of TFs involved in genome organization, like CTCF19,21 and MAZ22,23 (MYC-associated Zinc Finger Protein), are within the most represented motifs at these sites (Supplementary Fig. 3d). Interestingly, the density of TF footprints at promoter regions appears to correlate with gene expression level (Supplementary Fig. 3e), suggesting a quantitative effect of TF co-occupancy level on transcriptional output24,25.

Fig. 2: TFs exhibit preferential modes of local cooperative binding.
figure 2

a Frequency of Footprint-C 1D peak sites with 1, 2, 3, or 4+ footprints in K562 or HEK293T. b V-plots are shown for two 1 V sites with tracks from Footprint-C fragment ends, 1D signal, DNase-seq, CTCF and MAZ ChIP-seq. c V-plot is shown for one 3 V site with tracks from Footprint-C fragment ends, 1D signal, DNase-seq, TBP, CTCF and ZNF384 ChIP-seq. d Frequency of each TF with footprints in 1 V or 2 V+ sites in K562 or HEK293T. e Frequency of top 40 TF-TF pairs, from all 2 V sites in K562 or HEK293T. f, h Frequency of motif orientation of all CTCF:CTCF (f) or CTCF:FOX (h) pairs from 2 V sites in K562 or HEK293T. g, i Screenshots showing tracks from Footprint-C 1D, CTCF, RAD21, FOXK2 ChIP-seq and MNase-seq in K562. j V-plots of three types of CTCF 1 V sites are shown with CTCF and other TF ChIP-seq, and Footprint-C 1D average signal in K562. A diagram is shown at the bottom. The red arrows in b, c, g, i mark the indexed CTCF or FOX motifs.

The definition of 1 V, 2 V, and 3 V sites prompted us to investigate modes of TF local co-occupancy. TFs appear to possess preferences for binding alone or with other TFs in close proximity, and such preferences seem to be largely conserved between K562 and HEK293T cells (Fig. 2d). The members from the RFX (regulatory factor X) and CUX (cut like homeobox) families have more footprints at 1 V sites than the rest of the sites, while NRF1 and SP family members prefer to co-localize with other TFs26 (Fig. 2d). In contrast, the frequent duos of TFs at the 2 V and 3 V sites are highly cell type-specific. KLF (Krüppel-like factors) and HOX (homeobox) family members are the most frequent TFs co-localize with an array of other TFs in K562 and HEK293T cells respectively (Fig. 2e and Supplementary Fig. 3f). The conformational configuration of two interacting TFs, and thus the relative orientation of two adjacent motif sequences, may dictate a functional link between the two TFs27. We next investigated the four possible orientations of two adjacent motif sequences at all 2 V sites. The tandem orientation of two adjacent CTCF motifs are highly frequent in both cell lines28 (Fig. 2f, g). In contrast, the two adjacent motifs of CTCF and FOX (forkhead box) family TFs exhibit a preferential divergent orientation (Fig. 2h, i and Supplementary Fig. 3g, h). The preferred orientations between two adjacent motifs suggest preferred dimerization modes between the two TFs, which in turn, may suggest functional cooperative activities between the two TFs29,30. Interestingly, when analyzing 1 V sites annotated with only CTCF motifs, we noticed some left or right skewed “V” shapes on V-plot (Fig. 2j). To examine if these skewed CTCF “V” shapes are due to other TFs interacting with CTCF but not with DNA, we analyzed the various TF ChIP-seq datasets from ENCODE31, and identified the concordantly skewed ChIP-seq signals from TFs like MAX32, BHLHE4033, ELF134, and IRF2 (Fig. 2j). These results indicate a wide-spread cooperativity in TF local residence, with preferences for both interaction partner choice and interaction conformation.

Footprint-C efficiently identifies chromatin structural features

We next investigated the performance of Footprint-C dataset at pairwise contact level. The forward-reverse (FR) orientation of two fragments within ligation products are biased among most Hi-C datasets, ranging from 38-90% frequency (vs 25% expected) within short-distance contact pairs, possibly due to spurious capture of the dangling-end, re-ligated or undigested fragments35,36. Importantly, Footprint-C dataset exhibits the most balanced frequency of fragment orientation across all datasets from different Hi-C methods (Fig. 3a and Supplementary Fig. 4a), suggesting the exclusion of dangling-end products in Footprint-C procedures, and showing the validity of short-distance interactions in Footprint-C results. The Footprint-C contact matrix and heatmap exhibit canonical chromatin structural features like contact domains, chromatin loops and stripes (Fig. 3b). The global correlation coefficient between Footprint-C and in situ Hi-C or Micro-C is a bit lower than the coefficient between in situ Hi-C and Micro-C (Supplementary Fig. 4b), possibly due to TF-centric fragment locations in Footprint-C method (Supplementary Fig. 1h), which is in lack in the other two methods. Other than this, Footprint-C produces chromatin contact maps consistent with in situ Hi-C or Micro-C contact maps at levels of global interaction distance, compartment segregation, and contact domain insulation (Supplementary Fig. 4c–h).

Fig. 3: Footprint-C efficiently identifies chromatin structural features.
figure 3

a Frequency of read mate orientation of different Hi-C datasets when distance <1 kb. b A screenshot showing Footprint-C contact heatmap with tracks from H3K27ac, ZNF143, KLF16, CTCF, RAD21 ChIP-seq, total RNA-seq, DNase-seq and Footprint-C 1D in K562. Gray shades mark the positions of non-CTCF loop anchors. c Number of loops (>20 kb) or stripes detected in Footprint-C, Micro-C, or in situ Hi-C datasets at different sampling size or by different methods. d Overlap of Mustache loops between Footprint-C & Micro-C datasets in K562. e Aggregate plots for common or Footprint-C specific loops. f Fraction of different loop types from common or Footprint-C specific loops. Enh, enhancer. Pro, promoter. g Size of common or Footprint-C specific structural A-A or B-B loops. Min: lower end of violin. Q1: lower bound of box. Q2: line in box. Q3: higher bound of box. Max: higher end of violin. P was calculated by two-sided Wilcoxon–Mann–Whitney test. Effect sizes are shown. h Footprint-C (right up) and Micro-C (left down) contact heatmaps at the same sampling size of 1800 M contacts showing one specific A-A (left) or B-B (right) structural loop, with tracks from genome annotation, PC1 value, CTCF, RAD21 or H3K9me3 ChIP-seq. Black squares mark the two loops. i, Aggregate plots of loops in DMSO or dTAG-7 treated HEK293T. The loops were from Footprint-C in WT HEK293T, and were aggregated at the center of a ±200 kb window at 5-kb resolution. The average enrichment score for loops was shown in top left corner. Source data are provided as a Source Data file.

To benchmark the performance of Footprint-C in identifying chromatin structural features, the number of chromatin loops and stripes obtained from Footprint-C, Micro-C, or in situ Hi-C datasets in K562 cells were compared in parallel. Footprint-C identified the most chromatin loops or stripes across these datasets, at all different sizes of samplings or with different methods (Fig. 3c and Supplementary Fig. 4i). Under the same sequencing depth, Footprint-C identified about 50% more chromatin loops than Micro-C (Fig. 3c–e). The Footprint-C specific loops show a larger structural fraction than the common loops with Micro-C (Fig. 3f), and are bigger in loop size (Fig. 3g), urging us to investigate the nature of these specific structural loops. The common structural loops have more from A compartments (A-A) than from B compartments (B-B) (49.3% vs 28.9%), while the specific structural loops are the opposite (19.7% vs 59.4%) (Supplementary Fig. 4j). The B-B structural loops are decorated by repressive H3K9me3 marks throughout, but are demarcated by CTCF and Cohesin as well as the A-A structural loops (Fig. 3h). We further investigated the chromatin stripes identified by Footprint-C by categorizing them into three groups: chromatin loop domains with left-, right-, or both-sided stripes (Supplementary Fig. 4k). The appearance and direction of the stripes correlate with the density of TF footprints at the respective loop anchors (Supplementary Fig. 4l), suggesting a functional link between loop extrusion initiation and TF residence at loop anchors37, and that a broad spectrum of TFs, including CTCF and others, may contribute to the structural framework of high-order genome organization.

To rule out the possibility that the chromatin high-order structural features identified by Footprint-C may contain artifacts from its 1D enrichment at accessible sites, we established a RAD21 (a Cohesin subunit) degron cell line in HEK293T, and performed Footprint-C. Under acute Cohesin depletion, the chromatin loops and their associated stripes were severely impaired (Fig. 3i). Because Cohesin loss does not affect chromatin accessibility in general38, this result clearly demonstrates the validity of the chromatin structural features identified by Footprint-C, and that Footprint-C’s supreme performance in finding these structures is not simply due to its preferential coverage of the accessible portion of the genome.

High-resolution chromatin contact maps built upon TF footprints

The V-plot analysis demonstrates that most Footprint-C fragments exhibit protection by the binding of a variety of TFs other than CTCF (Fig. 1g and Supplementary Fig. 5). The genuine nature of footprints resolved by Footprint-C enabled us to unambiguously categorize the interactions mediated by a variety of TF-TF pairs. CTCF-CTCF pairs show predominant long-range interactions, whereas pairs between CTCF and the other TFs show a biphasic interaction range (Fig. 4a). About half of the TF pairs exclusive of CTCF interact under 10 kilobase (kb) range (Supplementary Fig. 6a). Based on this interaction range distinction, we categorized all TF-TF pairs into short-range (0.3-10 kb) and long-range (>10 kb) groups. In both K562 and HEK293T cells, the contacts mediated by two CTCF motifs are the biggest category only in the long-range group, and MAZ-MAZ contacts are the biggest category in the short-range group (Fig. 4b and Supplementary Fig. 6b). Regarding interaction partner choice, CTCF mostly interacts with another CTCF in the long-range group, while all other TFs appear to have a balanced choice in forming homo- or hetero-pairs (Fig. 4c and Supplementary Fig. 6c). The convergent orientation of two interacting CTCF motifs is a hallmark of CTCF-mediated chromatin loops in mammals (Fig. 1h and Supplementary Fig. 1k). We next examined if such orientation preferences exist in other interacting TF-TF pairs. A mild preference (31–34% vs 25% expected) for the convergent orientation was observed between CTCF and other TFs, like MAZ, EGR139 (Early Growth Response 1), or ZNF14340,41,42, only in the long-range group (Fig. 4d). No such preferences were observed between TF-TF pairs exclusive of CTCF (Supplementary Fig. 6d). The diversity of short- and long-range contacts suggests that a variety of TF-TF interactions may orchestrate the local and global chromatin organization through division of labor6.

Fig. 4: Chromatin contact maps built upon TF footprints.
figure 4

a Interaction distance of different TF-TF pairs. b TF-TF pairs ranked by counts of homogeneous pairs when the two mates are 0.3-10 kb (up) or >10 kb (down) apart in K562. Only “inside-V” footprint-bearing fragments shown in Supplementary Fig. 5 are counted. c Correlation of homogeneous and mean heterogeneous pair counts of TFs shown in b. d Frequency of motif orientation of heterogenous pairs between CTCF and any of EGR1, MAZ, or ZNF143 when the two mates are 0.3–10 kb (up) or >10 kb (down) apart in K562. e Screenshots of CTCF motif-based (left) and regular (right) Footprint-C contact maps with CTCF ChIP and Footprint-C fragment density shown. The two CTCF motifs in red are 112 bp apart. f Screenshots of CTCF motif-based and regular Footprint-C contact maps of a three-way interaction. g Frequency of motif orientation of three-way CTCF contacts. h Screenshots of CTCF & MAZ motif-based and regular Footprint-C contact maps of a three-way interaction involving CTCF and MAZ. i Frequency of motif orientation of three-way contacts involving CTCF and one of EGR1, KLF, or MAZ. The red arrows in e, f, h mark the indexed CTCF or MAZ motifs. Source data are provided as a Source Data file.

The resolution of TF identities in most Footprint-C contacts further prompted us to build a chromatin contact map solely upon TF motifs. Due to its predominant role in genome organization, we first built a contact map upon 112,878 CTCF motifs compiled from all human CTCF ChIP-seq datasets from ENCODE43. This motif-based contact map not only largely reproduces chromatin structures in a regular Hi-C contact map, but also surprisingly separates interactions emanating from two adjacent CTCF motifs only ~100 bp apart (Fig. 4e), or resolves interactions from a particular CTCF motif when another inert motif is only ~100 bp away (Supplementary Fig. 6e), both of which are indiscernible in contact maps from other Hi-C methods. The motif-based contact map also helps resolve CTCF-mediated multiway interactions. In total, we identified 6,726 three-way contacts involving three CTCF motifs, including one interacting over the promoter region of the RNA methyltransferase gene Nsun4 (Fig. 4f), which was verified by data from a recent multiway study in K562 cells44. The combination of convergent and tandem motif orientation appears to be the predominant pattern (66% vs 25% expected) among such CTCF three-way interactions (Fig. 4g). Finally, we accommodated both homogeneous and heterogeneous TF-TF interactions to build a contact map upon motifs of both CTCF and MAZ, another architectural TF involved in genome organization22,23. CTCF also forms multiway interactions with MAZ (Fig. 4h), and potentially with other TFs, mainly through convergent motif orientation (Fig. 4i), suggesting that a rich regulatory lexicon of TF-TF interactions involving TF identity, valency, and orientation, underlies complex mammalian genome folding.

Discussion

Since the birth of Hi-C method2, there have been a handful of derivative methods developed for mapping genome organization, based on the principle of proximity ligation. The two major directions for technological evolvement are resolution enhancement, exemplified by Micro-C with nucleosomal DNA units resolved, and the preferential coverage of the accessible portion of the genome, represented by HiCAR and Hi-TrAC using Tn5 transposase. These methods outperform the 1st generation Hi-C and the 2nd generation in situ Hi-C methods resolution-wise, but still have limitations. Most ligation products they capture are either between ~150 bp nucleosomal DNA which often excludes TF binding45,46 (Fig. 1b), or between long fragments either not including the TF binding sites, or having ambiguities in identifying the responsible cis elements (Fig. 1f). Furthermore, in the cases of HiCAR and Hi-TrAC, restriction enzyme digestion or sonication was used after Tn5 transposition to further fragment DNA, potentially wiping out the TF footprint sequences from the DNA fragment, therefore losing the identity information of cis elements and trans TFs responsible for the interactions. In Footprint-C procedures, we fine-tuned DNase digestion to approximate an ideal length region of TF footprints, which further enabled us to use gel excision to strictly select those ligation products directly between two footprints (Supplementary Fig. 1a). This size selection step also helps avoid the sonication or the restriction enzyme digestion step, therefore keeping the integrity of footprints. These optimizations enable the unambiguous identification of cis regulatory elements and trans TFs, and thus the TF-TF interactions underlying genome-wide chromatin interactions in Footprint-C data.

In the loop extrusion model explaining chromatin loop formation in mammals47, a pair of CTCF proteins binding to two convergently oriented motifs are required to stop the extrusion of Cohesin complex, thereby demarcating the chromatin loops along with Cohesin. Intriguingly, Drosophila CTCF proteins sharing highly conserved DNA-binding domain and motif sequence with mammalian CTCF proteins don’t exhibit such preferences for convergently oriented motifs when forming interactions20, suggesting that mammalian CTCF proteins may acquire the function to stop Cohesin extrusion during evolution, to fulfill the requirements of mammalian genome organization. A recent study identified a handful of TFs other than CTCF to serve as barriers in stopping Cohesin extrusion in mammalian cells48. Indeed, in Footprint-C data, we identified a rich collection of long-range TF-TF interactions between CTCF and other TFs, and such interactions do exhibit mild preferences for the convergent orientation of CTCF and the other motifs (Fig. 4d). It is plausible to ponder if these results suggest the emerging of new types of extrusion barrier TFs over the horizon of evolution, to suit the complex needs of gene expression regulation in mammals. Functional studies, like the inducible degron technologies49, will accelerate the resolution of the roles these candidate TFs play in mammalian genome organization.

It is noteworthy that two recent preprints presented works on acute depletion of ZNF143, and found no significant changes to genome organization, suggesting its indirect role in genome organization50,51. They went on to propose a model of collective actions by different TFs (other than CTCF) on genome organization. This model is compatible with our findings by Footprint-C that a collection of different TFs interact with each other in complex but preferential modes, thus may act on genome organization in an additive manner.

In summary, Footprint-C presents high-resolution chromatin contact maps built upon intact and genuine TF footprints, reveals a rich regulatory lexicon of TF-TF interactions underlying chromatin high-order structures, and therefore holds great promise for studying genome organization in biological processes with vigorous gene expression changes driven by TF regulatory cascades, such as cellular differentiation and cell fate reprogramming52.

Methods

Cell culture

K562 (ATCC: CCL-243) cells were cultured at 37 °C with 5% CO2 in RPMI-1640 (Vivacell, C3010-0500) supplemented with 10% FBS (Nobimpex, B118-500). HEK293T (ATCC: CRL-3216) cells were cultured at 37 °C with 5% CO2 in DMEM (Gibco, C11995500BT) supplemented with 10% FBS. Kc167 cells were cultured at 25 °C in SIM SF Expression Medium (Sino Biologica, MSF1) supplemented with 1% penicillin-streptomycin (Gibco, 15140122).

Generation of RAD21 degron cell line

The RAD21 degron cell line (HEK293T: RAD21-linker-FKBPF36V-3xHA-P2A-BSD) was generated by CRISPR/Cas9-mediated genome editing as previously described53 with following modifications. The homology-directed repair template was constructed as follows: (1) A 500 bp left homology arm before stop codon of RAD21 gene was PCR amplified from genomic DNA; (2) A 500 bp fragment including and after the stop codon of RAD21 gene was amplified as the right homology arm; (3) A linker-FKBPF36V-3xHA-P2A-BSD sequence was PCR amplified; (4) The pCRIS-PITChv2-dTAG-BSD (BRD4) (AddGene, 91795) plasmid was digested with the restriction enzymes EcoRV-HF (NEB, R3195S) and XbaI (NEB, R0145V), and the large fragment was purified with agarose gel electrophoresis. The four fragments were assembled through Seamless Cloning (Beyotime, D7010S) to serve as the repair template vector. A single-guide RNA targeting the 3′ end of RAD21 gene was designed with the CRISPOR tool54. Sense and antisense oligos were annealed together (sgDNA) and cloned into the pX458 plasmid. HEK293T cells were seeded to 6-well plates and transfected with VigoFect (Vigorous Biotechnology, T001), 1 μg of sgDNA plasmid, and 2 μg of repair template fragment which was PCR amplified from repair template vector. Transfected cells were cultured for 96 h, and then selected with 10 µg/mL blasticidin. Single colonies were isolated into 96-well format and further cultured for 14 days. Successfully edited homozygous clones were confirmed by genotyping and Western blot. The RAD21 degron cells were treated with 1 μM dTAG-7 (MCE, HY-123941) or DMSO as control for 4 h.

Footprint-C experimental procedures

Briefly, 5 million cells were fixed by freshly made 1% formaldehyde at room temperature for 10 min. Cells were lysed and nuclei were extracted. Nuclei were digested by 16 U of DNase I (NEB, M0303S) for 30 min. The digested DNA was blunt-ended by End Repair Mix (NEB, E6050L) and the DNA overhangs were added by Taq DNA Polymerase (Thermo, EP0406). The A-tailed DNA was ligated with a bridge linker (Forward: /5Phos/GCCCGG/iBiodT/NNACGCCCGT, Reverse: /5Phos/CGGGCGTNNACCGGGCT) by T4 DNA ligase (Thermo, EL0012) at 16 °C for 4 h. The excessive bridge linkers were removed by Lambda Exonuclease (NEB, M0262L) and Exonuclease I (NEB, M0293L). Proteinase K was added and incubated at 65 °C overnight. DNA was purified by phenol:chloroform:isoamyl alcohol extraction and ethanol precipitation. The DNA was run on a 1.5% agarose gel and a gel slice from 80 to 200 bp was excised and the DNA was purified. Linker-ligated DNA was purified with C1 streptavidin beads (Thermo, 65002). Purified DNA was proceeded on streptavidin beads with end repair, A-tailing, and ligation to Illumina adapters. The DNA was then subjected to ~7–10 cycles of PCR using Illumina paired-end primers and KAPA Polymerase (Roche, KK2502). Amplified library was purified and subject to Illumina NovaSeq 6000 paired-end sequencing.

Footprint-C Data preprocessing

Read pairs were first trimmed by the Illumine adapter and bridge linker sequence (AGCCCGGTNNACGCCCGT, both forward and reverse complementary) from both ends by Trim Galore (https://github.com/FelixKrueger/TrimGalore) and Cutadapt (https://github.com/marcelm/cutadapt/). Only read pairs with bridge linker sequence detected and with both mates ≥10 bp after trimming were kept. Valid Footprint-C fragment contact pairs were obtained from the HiC-Pro55 analysis pipeline. The detailed description and code can be found at https://github.com/nservant/HiC-Pro. In brief, a pair of trimmed fastq files were mapped to the human (hg38) or Drosophila (dm6) genome separately by Bowtie256 with ‘--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder’ mode. Aligned reads were paired by the read name. Pairs with multiple hits, low MAPQ (<=10), singleton, dangling end, and self-circle were removed. The fragment contact pairs were obtained from the output BAM files containing paired aligned reads by removing PCR duplicates and were used in downstream analyses. The all 5’ valid pairs were extracted from the fragment contact pairs and were converted to contact matrices in COOL filetype using the Cooler57 package. The Footprint-C, Micro-C, in situ Hi-C, and Hi-TrAC raw contact matrices were generated and visualized in heatmaps by cooltools (https://github.com/open2c/cooltools). The iterative correction (IC) -normalized contact matrices were used for all other bioinformatic analysis58.

Public datasets

Public data used in this study, including in situ Hi-C, Micro-C, BL-Hi-C, DNase Hi-C, CAP-C, Hi-TrAC, HiCAR, HiPore-C, DNase-seq, RNA-seq, and ChIP-seq, were summarized in Supplementary Data 1. The in situ Hi-C and Micro-C datasets were preprocessed in the same way as Footprint-C without going through the steps of trimming linker. The BL-Hi-C, DNase Hi-C and CAP-C datasets were preprocessed in the same way as Footprint-C. HiCAR datasets were processed as described in ref. 59. Briefly, reads were aligned to hg38 reference genome using bwa mem with flags-SP. Alignments were parsed, and the alignment contact pairs were generated using the pairtools (https://github.com/mirnylab/pairtools). The alignment contact pairs with low mapping quality (MAPQ < 10) was filtered out. The alignment contact pairs with the same coordinate on the genome or mapped to the same digestion fragment were removed. The alignment contact pairs of Hi-TrAC and multiway interactions of HiPore-C were directly downloaded from Gene Expression Omnibus (GEO).

Enrichment analysis

The fragment contact pairs of Footprint-C or other Hi-C datasets were first converted to single alignment BED files. Alignments of other Hi-C datasets were scaled to 60 bp, the median fragment length of Footprint-C. genomeCoverageBed in bedtools60 were used to compute normalized signal (reads per million for hg38). The average signal distribution relative to the occupied CTCF motifs and distal DNase I hypersensitive sites (dDHS) was computed across ±500 bp by computeMatrix in deepTools61. The average signal of Read 1 or Read 2 alignments in HiCAR dataset around the occupied CTCF motifs and dDHS were calculated respectively. The list of functional CTCF motifs was generated as described previously43. Briefly, the sets of occupied CTCF motifs in K562 and GM12878 were screened by respective CTCF ChIP-seq datasets, and only those with an average RPM > 1 within a 30 bp window surrounding the motif center base were kept. The set of DHS in K562 or GM12878 cells was called from respective DNase-seq dataset using MACS2. DHS with q values > 20 and distance from TSS > 1 kb were defined dDHS.

Footprint-C 1D peak processing

Each two fragments were extracted from Footprint-C pairs and processed to be compatible with MACS262 input BED files. MACS2 was used to identify peaks with the following parameters: --nomodel --extsize 30 -q 0.05 --call-summits --keep-dup all. The reproducible peaks from multiple biological replicates were kept using ChiP-R63.

Footprint annotation and classification

A list of expressed TFs was selected from genes with top 10000 RPM values from RNA-seq data in K562 or HEK293T. Only motifs of expressed TFs from HOMER’s motif collection18 were kept for analysis. The closest motif to Footprint-C 1D peak summit was obtained by bedtools closestBed with parameters: -t first. Different TFs from the same family were named in the family name (e.g. KLF). Footprint-annotated peaks were merged and grouped into 3 types of sites (1 V, 2 V, and 3 V) based on the distance between two adjacent footprints. 1 V sites were defined as single footprints with distances >200 bp from their nearest footprints. 2 V sites were defined as two footprints with distances <200 bp, but >200 bp from nearest 3rd footprint. 3 V sites were defined as three footprints with distances <200 bp between any two, but >200 bp from nearest 4th footprint.

Classification of CTCF 1 V sites

In Fig. 2j, CTCF motifs within 1 V sites were extended by ±20 bp from the motif center base. Footprint-C fragment ends within the upstream or downstream 20 bp regions were counted. Skewed CTCF 1 V sites were classified into two categories based on the ratio between two regions (ratio ≥4 or ≤0.25). The third category was randomly picked from all CTCF 1 V sites.

Annotation of promoters with footprints

In Supplementary Fig. 3e, the promoters were defined as 500 bp regions upstream of Transcription Start Site (TSS). A promoter was considered as a “1 V” promoter only if the summit of a 1 V site falls within this promoter region. Similarly, a promoter was considered as a “2 V” or “3 V” promoter only if all summits of the 2 V or 3 V site falls within this promoter region.

Calculation of inside-V fragment ratio

Inside-V fragment ratio was calculated as a/(a + b1 + b2) as shown in Supplementary Fig. 3c, where a represents the number of fragments located within the V shape enclosed by two borders extending from the footprint summit with slopes of ±2 (the theoretical slopes of V-plot). b1 and b2 represent the number of fragments located within region of ±50 bp from footprint summit but not in the V region.

TAD calling

TADs were called using hicFindTADs64 at 40-kb resolution with settings: --thresholdComparisons 0.05 --delta 0.01 --correctForMultipleTesting fdr.

Loop calling

Loops were computed using Mustache65 or Chromosight66. For Mustache, loops were called at 5-kb resolution using the options -r 5000 --pThreshold 0.1. For Chromosight, loops were called by Chromosight detect function at 5-kb resolution. Loops smaller than 20 kb were removed. Common and specific loops between Footprint-C and Micro-C were characterized as described previously67. Briefly, loops with both anchors overlapping were called as common loops. Loops with one or neither anchor overlapping were called as specific loops. Anchors were extended ±20 kb, and pairtopair in bedtools were used to characterize common loops with parameters -type both -f 0.5. Footprint-C or Micro-C specific loops were characterized with parameters -type notboth -slop 40000. Pile-up plots of loops were generated using cooltools.

Loop classification

Loops were annotated and categorized into promoter-promoter (Pro-Pro), promoter-enhancer (Pro-Enh), enhancer-enhancer (Enh-Enh), structural between A compartments (A-A structural), structural between B compartments (B-B structural), or other loops based on the annotation of active TSS, H3K27ac ChIP-seq peaks68 and PC1 values from Footprint-C data. H3K27ac peaks were called from K562 H3K27ac ChIP-seq dataset using MACS262. RNA-seq data was used to annotate active TSS. Firstly, we extended anchors ±10 kb. Anchors containing active TSS were defined as Pro. Secondly, remaining anchors containing H3K27ac peaks were defined as Enh. Thirdly, remaining anchors were further defined as structural A or B anchors based on the PC1 value of anchors. Pro-Pro, Pro-Enh, Enh-Enh, A-A structural, and B-B structural loops were defined by the respective annotation of the two anchors.

Stripe calling

Stripes were called from contact matrices using Stripenn69 or StripeCaller70. The stripes were called by Stripenn at 5-kb resolution with the following settings: -m 0.95,0.96,0.97,0.98,0.99 -p 0.1. Stripes were called by StripeCaller with the following settings: --local-num 2 --fold-enrichment 1.1 --min-seed-len 6. Pile-up plots of stripes were obtained using coolpup.py71, with the following settings: --local --rescale. The loop domains in Supplementary Fig. 4k were annotated by horizontal and vertical stripes obtained by Stripenn, and were divided into four types of left-, right-, both-sided, and no stripe loops. The loop domains were divided into 50 equally distanced intervals. The left or right anchors were extended 10 intervals forward or backward respectively. The percentage of loops overlapping with 1 V sites in each interval were calculated.

Insulation score analysis

The insulation scores72 were calculated at 40-kb resolution using the cooltools package.

Compartment analysis

The compartments were identified at 100-kb resolution using cooltools package. The eigenvector of the first principal component represents the compartment profile, with positive and negative values representing A and B compartments respectively2.

Motif preprocessing

Transcription factor motif coordinates were obtained from HOMER18 and CIS-BP73 database. The coordinates from HOMER were used directly. The coordinates from CIS-BP were annotated by FIMO74. The lists of functional CTCF, EGR1, KLF, MAZ and SP motifs were generated as described previously43. Briefly, for K562 cells, the CDC5L, E2F4, TFDP1, VEZF1, and ZNF143 motif coordinates were screened by respective ChIP-seq data, and only those with an average RPM value above 0.2 within a 30 bp window surrounding the motif center base were kept. MTF2, RBAK, and ZFP69B motif coordinates were screened by DNase-seq data, and only those with an average RPM value above 0.1 within a 30 bp window surrounding the motif center base were kept. For HEK293T cells, the RBAK and ZNF69B motif coordinates were screened by respective ChIP-seq data, and only those with an average RPM value above 0.1 within a 30 bp window surrounding the motif center base were kept. E2F4, EGR1, ZNF143, VEZF1, CDC5L, TFDP1, and MTF2 motif coordinates were screened by DNase-seq data, and only those with an average RPM value above 0.1 within a 30 bp window surrounding the motif center base were kept.

Motif uniqueness analysis

The fragment contact pairs of Footprint-C and other Hi-C datasets were first converted to single fragment BED files. The Footprint-C fragments less than or equal to 60 bp in length were kept for analysis. The fragments of in situ Hi-C were extended to the nearest GATC site (DpnII). The fragments of BL-Hi-C were extended to the nearest GGCC sites (HaeIII). The fragments of Hi-TrAC were extended to the nearest AATT or CATG sites (MluCI or NlaIII). The fragments of HiCAR were extended to the nearest GTAC sites (CviQI). The fragments of Micro-C were extended ±75 bp from the center base. Fragment coordinates were then intersected with all HOMER motif coordinates by intersectBed in bedtools. Finally, the proportions of fragments from Footprint-C, in situ Hi-C, BL-Hi-C, Hi-TrAC, HiCAR, or Micro-C datasets annotated with zero, one or multiple motifs were calculated.

Analysis of motif orientation of CTCF-CTCF contacts

The closest CTCF motif to upstream and downstream read pair within contacts from Footprint-C or others Hi-C datasets was obtained by bedtools closestBed with parameters: -t first. Only the contacts with both alignments overlapping with center base of a CTCF motif were counted. The frequencies of CTCF contacts were calculated according to the orientations of the two motifs (+−: convergent, ++: tandem F, −+: divergent, −−: tandem R).

Construction of TF motif-based contact maps

The lists of functional CTCF or MAZ motifs were generated as described previously43 and also under “Motif uniqueness analysis”. The motifs were sorted by genome coordinates, indexed, and used to construct a genome-wide 2D contact map. The Footprint-C fragments were annotated by the indexed motifs. The contact pairs with motifs annotated on both fragments were extracted and dumped to the respective bins in the 2D contact map. The counts in each bin were shaded by four different colors according to the orientations of the two motifs (+−, ++, −+, −−). The motif-based contact maps were plotted using ggplot2, and were merged using Adobe Illustrator.

Three-way interaction analysis

The three-way interactions were inferred from two-way motif interactions using quick-cliques (https://github.com/darrenstrash/quick-cliques). For three-way interactions between CTCF and MAZ, KLF, or EGR1, the obtained four-way interactions were split into three-way interactions in addition to the directly inferred three-way interactions. The inferred three-way interactions were then intersected with HiPore-C data by intersectBed in bedtools, and only interactions with all three motifs located in three different fragments of a single multiway complex were kept for motif orientation analysis.

Statistics and reproducibility

The stratum-adjusted correlation coefficient using HiCRep75 was calculated for 100-kb resolution contact matrices by parameters settings of --h 1 --dBPMax 100000 --binSize 100000. Comparisons were performed between two biological replicates of Footprint-C libraries, or between Footprint-C and Micro-C or in situ Hi-C libraries. Two-sided Wilcoxon rank-sum test was used to analyze differences between groups of data. Statistical details were shown in the figure or legends. No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.