Footprint-C reveals transcription factor modes in local clusters and long-range chromatin interactions

Liu, Xiaokun; Wei, Hanhan; Zhang, Qifan; Zhang, Na; Wu, Qingqing; Xu, Chenhuan

doi:10.1038/s41467-024-55403-7

Download PDF

Article
Open access
Published: 30 December 2024

Footprint-C reveals transcription factor modes in local clusters and long-range chromatin interactions

Xiaokun Liu^1,2,3^na1,
Hanhan Wei^1,2,3^na1,
Qifan Zhang^1,2,3,
Na Zhang⁴,
Qingqing Wu⁴ &
…
Chenhuan Xu ORCID: orcid.org/0000-0002-3186-494X^1,2,3

Nature Communications volume 15, Article number: 10922 (2024) Cite this article

8817 Accesses
1 Citations
13 Altmetric
Metrics details

Subjects

Abstract

The proximity ligation-based Hi-C and derivative methods are the mainstream tools to study genome-wide chromatin interactions. These methods often fragment the genome using enzymes functionally irrelevant to the interactions per se, restraining the efficiency in identifying structural features and the underlying regulatory elements. Here we present Footprint-C, which yields high-resolution chromatin contact maps built upon intact and genuine footprints protected by transcription factor (TF) binding. When analyzed at one-dimensional level, the billions of chromatin contacts from Footprint-C enable genome-wide analysis at single footprint resolution, and reveal preferential modes of local TF co-occupancy. At pairwise contact level, Footprint-C exhibits higher efficiency in identifying chromatin structural features when compared with other Hi-C methods, segregates chromatin interactions emanating from adjacent TF footprints, and uncovers multiway interactions involving different TFs. Altogether, Footprint-C results suggest that rich regulatory modes of TF may underlie both local residence and distal chromatin interactions, in terms of TF identity, valency, and conformational configuration.

Systematic evaluation of chromosome conformation capture assays

Article Open access 03 September 2021

High-resolution CTCF footprinting reveals impact of chromatin state on cohesin extrusion

Article Open access 15 May 2025

Learning representations of chromatin contacts using a recurrent neural network identifies genomic drivers of conformation

Article Open access 28 June 2022

Introduction

Chromatin interactions between discrete regulatory elements, mediated by architectural type or tissue-specific transcription factors (TFs), constitute the three-dimensional genome organization, and underlie many important nuclear processes, such as transcription, DNA replication, and DNA damage repair¹. Proximity ligation-based Hi-C and its derivative methods have been widely used to map chromatin interactions^2,3,4,5,6. These methods often involve sequence-specific restriction enzymes to digest the genome before ligating two digested genomic fragments^2,3. In such cases, the choice of enzyme, and the frequency and genomic distribution of the cutting sites, affect the efficiency of digestion and ligation. A chromatin interaction is potentially represented in a Hi-C contact map only when the cutting sequences are in close proximity to the two interacting cis elements. Other derivative methods often employ Tn5 transposase to achieve preferential coverage of the accessible portion of the genome^5,6,7. These methods all require a further fragmentation step, either using sonication or other restriction enzymes, to break down the ligated chromatin to facilitate library preparation and sequencing. Therefore, many captured and sequenced DNA ligation products do not contain the binding sites of TFs mediating interactions (Supplementary Fig. 1a). The usage of such enzymes functionally irrelevant to the nature of interactions restrains the efficiency of discovering significant structural features from a chromatin contact map, and obscures the identity of underlying cis regulatory elements and trans TFs. The development of Micro-C partially circumvents this problem by employing Micrococcal Nuclease (MNase) to digest the genome, and preferentially captures ligations, and thus interactions between pairs of nucleosomal DNA^4,8,9. In this study, we utilize Deoxyribonuclease I (DNase I), a nuclease previously used for mapping TF footprints¹⁰, and develop Footprint-C, which specifically captures chromatin interactions between TF footprints, and achieves higher efficiency in identifying chromatin structural features, such as chromatin loops and stripes, compared with other Hi-C methods. The analysis in one-dimensional and pairwise contact level of the Footprint-C datasets suggests extensive TF regulatory modes in both local residence and long-range chromatin interactions.

Results

Footprint-C captures chromatin interactions between TF footprints

A few studies have used DNase to fragment the genome in Hi-C experiments^11,12,13. These methods bear no rational steps to preserve TF footprint sequence. They either captured ligation products between long DNA fragments (Supplementary Fig. 1b), possibly due to insufficient DNase digestion, or did DNase digestion on purified DNA molecules in the absence of TFs¹³. Furthermore, they all used sonication to fragment the genome after proximity ligation, possibly wiping out the footprint sequences from the Hi-C library. Indeed, when inspected with genome-wide coverage of one-dimensional (1D) fragments, these datasets show no enrichment at motifs of CTCF (CCCTC-binding factor), the master regulator of genome folding¹⁴, or at distal DNase Hypersensitive Sites (dDHS) (Supplementary Fig. 1c). In our strategy, we performed extensive DNase digestion to approximate an ideal length region of TF footprints, which further enabled us to use gel excision to strictly select those ligation products directly between two footprints. This size selection step also helps avoid the sonication step often used in other Hi-C methods, thereby keeping the integrity of footprints (Supplementary Fig. 1a).

We applied Footprint-C in human chronic myelogenous leukemia cell line K562, and human embryonic kidney cell line HEK293T, and obtained 1.93 and 0.93 billion contacts, respectively. About 86.7% of K562 and 84.5% of HEK293T contacts are cis (Supplementary Fig. 1d, e). The vast majority of Footprint-C contacts are between short fragments with a mean length of 50 base pairs (bp) (Fig. 1a and Supplementary Fig. 1f), consistent with the length of digested fragments in DNase-seq (Supplementary Fig. 1g). Next, we benchmarked K562 Footprint-C dataset with datasets acquired through different Hi-C and derivative methods in K562 cells by previous studies. When compared with in situ Hi-C³, BL-Hi-C¹⁵, and Micro-C¹⁶ datasets, Footprint-C shows an outstanding 1D enrichment at CTCF motifs and dDHS (Fig. 1b). Notably, HiCAR⁵ and Hi-TrAC⁶ are the two methods recently developed for mapping open chromatin interactions by employing Tn5 transposase. Neither exhibits the same degree of enrichment at CTCF motifs or dDHS as Footprint-C (Fig. 1b). V-plot is a form of visual representation of footprints protected by TF binding under DNase digestion, evidenced by the V-shaped enrichment of footprint-bearing fragments¹⁷. The canonical “V” patterns of Footprint-C fragments at both CTCF motifs and dDHS strongly indicate that a significant portion of the interacting short fragments ligated in Footprint-C were protected by TF binding under DNase digestion (Fig. 1c). The genome-wide 1D coverage of Footprint-C fragments is highly correlated with the coverage of DNase-seq fragments (Supplementary Fig. 1h). The set of peaks acquired from Footprint-C 1D coverage also largely overlap with the peaks from DNase-seq (Fig. 1d, e). We analyzed the common and specific peaks in K562 cells. The Footprint-C specific peaks exhibit higher enrichment of Footprint-C 1D coverage than DNase-seq, but relatively weak interaction frequency, possibly due to their overall low occupancy level compared with the common peaks (Supplementary Fig. 1i, j). The common peaks show the highest 1D enrichment and interaction frequency among all three sets of peaks (Supplementary Fig. 1i, j). Therefore, while the TF occupancy level seems to be independent of its interaction frequency, the interaction frequency possibly builds on top of the TF occupancy level.

**Fig. 1: Footprint-C captures genuine and intact TF footprints.**

We went on to annotate the Footprint-C fragments with a whole collection of consensus TF motif sequences¹⁸. Across different Hi-C methods, Footprint-C fragments show the highest rate of unique TF motif annotation (83.0%) (Fig. 1f), largely due to the short fragment length and the preservation of intact footprint sequences. About 1.25% of Footprint-C fragments contain a single CTCF motif, and 0.33‰ of Footprint-C contacts are between pairs of CTCF motifs (Fig. 1g). To demonstrate the high accuracy of motif annotation, we calculated the frequency of pairs of convergent CTCF motifs, a hallmark of CTCF-mediated chromatin loops in mammals¹⁹. Surprisingly, Footprint-C is the only Hi-C method showing a predominant (60.2% vs 25% expected) frequency of convergent CTCF motif pairs at single contact level (Fig. 1h), which becomes apparent at aggregated chromatin loop level in all other Hi-C datasets (Supplementary Fig. 1k). In contrast, Footprint-C data in fly cell line Kc167 does not show a predominant convergent pattern between interacting CTCF motifs (Fig. 1i), consistent with the findings that CTCF proteins work differently in organizing Drosophila genome than in human genome²⁰. The above results showcase the power of Footprint-C to efficiently capture chromatin contacts between genuine TF footprints functionally relevant to the interactions.

Preferential modes of local TF clustering

The billions of chromatin contacts from Footprint-C, when viewed at 1D level, present a rich collection of TF footprints and an opportunity to analyze TF local residence with unprecedented resolution. We first obtained 1D peaks from the Footprint-C datasets, and classified them into single-, double-, or triple-footprint (1 V, 2 V, 3 V, respectively) sites based on the proximity of two adjacent peak summits (Fig. 2a and see Methods). V-plots at single sites and sampled collections of sites demonstrate the existence of footprints across the whole 1D peak set (Supplementary Figs. 2 and 3a, b). The locations of the footprints within these sites well correspond to the peaks from DNase-seq and TF ChIP-seq datasets (Fig. 2b, c). About 70% of the fragments from these sites are located within the theoretical inside-V area (Supplementary Fig. 3c), showing that footprint-bearing fragments account for a significant portion of the dataset. The consensus motif sequences of TFs involved in genome organization, like CTCF^19,21 and MAZ^22,23 (MYC-associated Zinc Finger Protein), are within the most represented motifs at these sites (Supplementary Fig. 3d). Interestingly, the density of TF footprints at promoter regions appears to correlate with gene expression level (Supplementary Fig. 3e), suggesting a quantitative effect of TF co-occupancy level on transcriptional output^24,25.

**Fig. 2: TFs exhibit preferential modes of local cooperative binding.**

The definition of 1 V, 2 V, and 3 V sites prompted us to investigate modes of TF local co-occupancy. TFs appear to possess preferences for binding alone or with other TFs in close proximity, and such preferences seem to be largely conserved between K562 and HEK293T cells (Fig. 2d). The members from the RFX (regulatory factor X) and CUX (cut like homeobox) families have more footprints at 1 V sites than the rest of the sites, while NRF1 and SP family members prefer to co-localize with other TFs²⁶ (Fig. 2d). In contrast, the frequent duos of TFs at the 2 V and 3 V sites are highly cell type-specific. KLF (Krüppel-like factors) and HOX (homeobox) family members are the most frequent TFs co-localize with an array of other TFs in K562 and HEK293T cells respectively (Fig. 2e and Supplementary Fig. 3f). The conformational configuration of two interacting TFs, and thus the relative orientation of two adjacent motif sequences, may dictate a functional link between the two TFs²⁷. We next investigated the four possible orientations of two adjacent motif sequences at all 2 V sites. The tandem orientation of two adjacent CTCF motifs are highly frequent in both cell lines²⁸ (Fig. 2f, g). In contrast, the two adjacent motifs of CTCF and FOX (forkhead box) family TFs exhibit a preferential divergent orientation (Fig. 2h, i and Supplementary Fig. 3g, h). The preferred orientations between two adjacent motifs suggest preferred dimerization modes between the two TFs, which in turn, may suggest functional cooperative activities between the two TFs^29,30. Interestingly, when analyzing 1 V sites annotated with only CTCF motifs, we noticed some left or right skewed “V” shapes on V-plot (Fig. 2j). To examine if these skewed CTCF “V” shapes are due to other TFs interacting with CTCF but not with DNA, we analyzed the various TF ChIP-seq datasets from ENCODE³¹, and identified the concordantly skewed ChIP-seq signals from TFs like MAX³², BHLHE40³³, ELF1³⁴, and IRF2 (Fig. 2j). These results indicate a wide-spread cooperativity in TF local residence, with preferences for both interaction partner choice and interaction conformation.

Footprint-C efficiently identifies chromatin structural features

We next investigated the performance of Footprint-C dataset at pairwise contact level. The forward-reverse (FR) orientation of two fragments within ligation products are biased among most Hi-C datasets, ranging from 38-90% frequency (vs 25% expected) within short-distance contact pairs, possibly due to spurious capture of the dangling-end, re-ligated or undigested fragments^35,36. Importantly, Footprint-C dataset exhibits the most balanced frequency of fragment orientation across all datasets from different Hi-C methods (Fig. 3a and Supplementary Fig. 4a), suggesting the exclusion of dangling-end products in Footprint-C procedures, and showing the validity of short-distance interactions in Footprint-C results. The Footprint-C contact matrix and heatmap exhibit canonical chromatin structural features like contact domains, chromatin loops and stripes (Fig. 3b). The global correlation coefficient between Footprint-C and in situ Hi-C or Micro-C is a bit lower than the coefficient between in situ Hi-C and Micro-C (Supplementary Fig. 4b), possibly due to TF-centric fragment locations in Footprint-C method (Supplementary Fig. 1h), which is in lack in the other two methods. Other than this, Footprint-C produces chromatin contact maps consistent with in situ Hi-C or Micro-C contact maps at levels of global interaction distance, compartment segregation, and contact domain insulation (Supplementary Fig. 4c–h).

To benchmark the performance of Footprint-C in identifying chromatin structural features, the number of chromatin loops and stripes obtained from Footprint-C, Micro-C, or in situ Hi-C datasets in K562 cells were compared in parallel. Footprint-C identified the most chromatin loops or stripes across these datasets, at all different sizes of samplings or with different methods (Fig. 3c and Supplementary Fig. 4i). Under the same sequencing depth, Footprint-C identified about 50% more chromatin loops than Micro-C (Fig. 3c–e). The Footprint-C specific loops show a larger structural fraction than the common loops with Micro-C (Fig. 3f), and are bigger in loop size (Fig. 3g), urging us to investigate the nature of these specific structural loops. The common structural loops have more from A compartments (A-A) than from B compartments (B-B) (49.3% vs 28.9%), while the specific structural loops are the opposite (19.7% vs 59.4%) (Supplementary Fig. 4j). The B-B structural loops are decorated by repressive H3K9me3 marks throughout, but are demarcated by CTCF and Cohesin as well as the A-A structural loops (Fig. 3h). We further investigated the chromatin stripes identified by Footprint-C by categorizing them into three groups: chromatin loop domains with left-, right-, or both-sided stripes (Supplementary Fig. 4k). The appearance and direction of the stripes correlate with the density of TF footprints at the respective loop anchors (Supplementary Fig. 4l), suggesting a functional link between loop extrusion initiation and TF residence at loop anchors³⁷, and that a broad spectrum of TFs, including CTCF and others, may contribute to the structural framework of high-order genome organization.

To rule out the possibility that the chromatin high-order structural features identified by Footprint-C may contain artifacts from its 1D enrichment at accessible sites, we established a RAD21 (a Cohesin subunit) degron cell line in HEK293T, and performed Footprint-C. Under acute Cohesin depletion, the chromatin loops and their associated stripes were severely impaired (Fig. 3i). Because Cohesin loss does not affect chromatin accessibility in general³⁸, this result clearly demonstrates the validity of the chromatin structural features identified by Footprint-C, and that Footprint-C’s supreme performance in finding these structures is not simply due to its preferential coverage of the accessible portion of the genome.

High-resolution chromatin contact maps built upon TF footprints

The V-plot analysis demonstrates that most Footprint-C fragments exhibit protection by the binding of a variety of TFs other than CTCF (Fig. 1g and Supplementary Fig. 5). The genuine nature of footprints resolved by Footprint-C enabled us to unambiguously categorize the interactions mediated by a variety of TF-TF pairs. CTCF-CTCF pairs show predominant long-range interactions, whereas pairs between CTCF and the other TFs show a biphasic interaction range (Fig. 4a). About half of the TF pairs exclusive of CTCF interact under 10 kilobase (kb) range (Supplementary Fig. 6a). Based on this interaction range distinction, we categorized all TF-TF pairs into short-range (0.3-10 kb) and long-range (>10 kb) groups. In both K562 and HEK293T cells, the contacts mediated by two CTCF motifs are the biggest category only in the long-range group, and MAZ-MAZ contacts are the biggest category in the short-range group (Fig. 4b and Supplementary Fig. 6b). Regarding interaction partner choice, CTCF mostly interacts with another CTCF in the long-range group, while all other TFs appear to have a balanced choice in forming homo- or hetero-pairs (Fig. 4c and Supplementary Fig. 6c). The convergent orientation of two interacting CTCF motifs is a hallmark of CTCF-mediated chromatin loops in mammals (Fig. 1h and Supplementary Fig. 1k). We next examined if such orientation preferences exist in other interacting TF-TF pairs. A mild preference (31–34% vs 25% expected) for the convergent orientation was observed between CTCF and other TFs, like MAZ, EGR1³⁹ (Early Growth Response 1), or ZNF143^40,41,42, only in the long-range group (Fig. 4d). No such preferences were observed between TF-TF pairs exclusive of CTCF (Supplementary Fig. 6d). The diversity of short- and long-range contacts suggests that a variety of TF-TF interactions may orchestrate the local and global chromatin organization through division of labor⁶.

**Fig. 4: Chromatin contact maps built upon TF footprints.**

The resolution of TF identities in most Footprint-C contacts further prompted us to build a chromatin contact map solely upon TF motifs. Due to its predominant role in genome organization, we first built a contact map upon 112,878 CTCF motifs compiled from all human CTCF ChIP-seq datasets from ENCODE⁴³. This motif-based contact map not only largely reproduces chromatin structures in a regular Hi-C contact map, but also surprisingly separates interactions emanating from two adjacent CTCF motifs only ~100 bp apart (Fig. 4e), or resolves interactions from a particular CTCF motif when another inert motif is only ~100 bp away (Supplementary Fig. 6e), both of which are indiscernible in contact maps from other Hi-C methods. The motif-based contact map also helps resolve CTCF-mediated multiway interactions. In total, we identified 6,726 three-way contacts involving three CTCF motifs, including one interacting over the promoter region of the RNA methyltransferase gene Nsun4 (Fig. 4f), which was verified by data from a recent multiway study in K562 cells⁴⁴. The combination of convergent and tandem motif orientation appears to be the predominant pattern (66% vs 25% expected) among such CTCF three-way interactions (Fig. 4g). Finally, we accommodated both homogeneous and heterogeneous TF-TF interactions to build a contact map upon motifs of both CTCF and MAZ, another architectural TF involved in genome organization^22,23. CTCF also forms multiway interactions with MAZ (Fig. 4h), and potentially with other TFs, mainly through convergent motif orientation (Fig. 4i), suggesting that a rich regulatory lexicon of TF-TF interactions involving TF identity, valency, and orientation, underlies complex mammalian genome folding.

Discussion

Since the birth of Hi-C method², there have been a handful of derivative methods developed for mapping genome organization, based on the principle of proximity ligation. The two major directions for technological evolvement are resolution enhancement, exemplified by Micro-C with nucleosomal DNA units resolved, and the preferential coverage of the accessible portion of the genome, represented by HiCAR and Hi-TrAC using Tn5 transposase. These methods outperform the 1st generation Hi-C and the 2nd generation in situ Hi-C methods resolution-wise, but still have limitations. Most ligation products they capture are either between ~150 bp nucleosomal DNA which often excludes TF binding^45,46 (Fig. 1b), or between long fragments either not including the TF binding sites, or having ambiguities in identifying the responsible cis elements (Fig. 1f). Furthermore, in the cases of HiCAR and Hi-TrAC, restriction enzyme digestion or sonication was used after Tn5 transposition to further fragment DNA, potentially wiping out the TF footprint sequences from the DNA fragment, therefore losing the identity information of cis elements and trans TFs responsible for the interactions. In Footprint-C procedures, we fine-tuned DNase digestion to approximate an ideal length region of TF footprints, which further enabled us to use gel excision to strictly select those ligation products directly between two footprints (Supplementary Fig. 1a). This size selection step also helps avoid the sonication or the restriction enzyme digestion step, therefore keeping the integrity of footprints. These optimizations enable the unambiguous identification of cis regulatory elements and trans TFs, and thus the TF-TF interactions underlying genome-wide chromatin interactions in Footprint-C data.

In the loop extrusion model explaining chromatin loop formation in mammals⁴⁷, a pair of CTCF proteins binding to two convergently oriented motifs are required to stop the extrusion of Cohesin complex, thereby demarcating the chromatin loops along with Cohesin. Intriguingly, Drosophila CTCF proteins sharing highly conserved DNA-binding domain and motif sequence with mammalian CTCF proteins don’t exhibit such preferences for convergently oriented motifs when forming interactions²⁰, suggesting that mammalian CTCF proteins may acquire the function to stop Cohesin extrusion during evolution, to fulfill the requirements of mammalian genome organization. A recent study identified a handful of TFs other than CTCF to serve as barriers in stopping Cohesin extrusion in mammalian cells⁴⁸. Indeed, in Footprint-C data, we identified a rich collection of long-range TF-TF interactions between CTCF and other TFs, and such interactions do exhibit mild preferences for the convergent orientation of CTCF and the other motifs (Fig. 4d). It is plausible to ponder if these results suggest the emerging of new types of extrusion barrier TFs over the horizon of evolution, to suit the complex needs of gene expression regulation in mammals. Functional studies, like the inducible degron technologies⁴⁹, will accelerate the resolution of the roles these candidate TFs play in mammalian genome organization.

It is noteworthy that two recent preprints presented works on acute depletion of ZNF143, and found no significant changes to genome organization, suggesting its indirect role in genome organization^50,51. They went on to propose a model of collective actions by different TFs (other than CTCF) on genome organization. This model is compatible with our findings by Footprint-C that a collection of different TFs interact with each other in complex but preferential modes, thus may act on genome organization in an additive manner.

In summary, Footprint-C presents high-resolution chromatin contact maps built upon intact and genuine TF footprints, reveals a rich regulatory lexicon of TF-TF interactions underlying chromatin high-order structures, and therefore holds great promise for studying genome organization in biological processes with vigorous gene expression changes driven by TF regulatory cascades, such as cellular differentiation and cell fate reprogramming⁵².

Methods

Cell culture

K562 (ATCC: CCL-243) cells were cultured at 37 °C with 5% CO₂ in RPMI-1640 (Vivacell, C3010-0500) supplemented with 10% FBS (Nobimpex, B118-500). HEK293T (ATCC: CRL-3216) cells were cultured at 37 °C with 5% CO₂ in DMEM (Gibco, C11995500BT) supplemented with 10% FBS. Kc167 cells were cultured at 25 °C in SIM SF Expression Medium (Sino Biologica, MSF1) supplemented with 1% penicillin-streptomycin (Gibco, 15140122).

Generation of RAD21 degron cell line

The RAD21 degron cell line (HEK293T: RAD21-linker-FKBP^F36V-3xHA-P2A-BSD) was generated by CRISPR/Cas9-mediated genome editing as previously described⁵³ with following modifications. The homology-directed repair template was constructed as follows: (1) A 500 bp left homology arm before stop codon of RAD21 gene was PCR amplified from genomic DNA; (2) A 500 bp fragment including and after the stop codon of RAD21 gene was amplified as the right homology arm; (3) A linker-FKBP^F36V-3xHA-P2A-BSD sequence was PCR amplified; (4) The pCRIS-PITChv2-dTAG-BSD (BRD4) (AddGene, 91795) plasmid was digested with the restriction enzymes EcoRV-HF (NEB, R3195S) and XbaI (NEB, R0145V), and the large fragment was purified with agarose gel electrophoresis. The four fragments were assembled through Seamless Cloning (Beyotime, D7010S) to serve as the repair template vector. A single-guide RNA targeting the 3′ end of RAD21 gene was designed with the CRISPOR tool⁵⁴. Sense and antisense oligos were annealed together (sgDNA) and cloned into the pX458 plasmid. HEK293T cells were seeded to 6-well plates and transfected with VigoFect (Vigorous Biotechnology, T001), 1 μg of sgDNA plasmid, and 2 μg of repair template fragment which was PCR amplified from repair template vector. Transfected cells were cultured for 96 h, and then selected with 10 µg/mL blasticidin. Single colonies were isolated into 96-well format and further cultured for 14 days. Successfully edited homozygous clones were confirmed by genotyping and Western blot. The RAD21 degron cells were treated with 1 μM dTAG-7 (MCE, HY-123941) or DMSO as control for 4 h.

Footprint-C experimental procedures

Briefly, 5 million cells were fixed by freshly made 1% formaldehyde at room temperature for 10 min. Cells were lysed and nuclei were extracted. Nuclei were digested by 16 U of DNase I (NEB, M0303S) for 30 min. The digested DNA was blunt-ended by End Repair Mix (NEB, E6050L) and the DNA overhangs were added by Taq DNA Polymerase (Thermo, EP0406). The A-tailed DNA was ligated with a bridge linker (Forward: /5Phos/GCCCGG/iBiodT/NNACGCCCGT, Reverse: /5Phos/CGGGCGTNNACCGGGCT) by T4 DNA ligase (Thermo, EL0012) at 16 °C for 4 h. The excessive bridge linkers were removed by Lambda Exonuclease (NEB, M0262L) and Exonuclease I (NEB, M0293L). Proteinase K was added and incubated at 65 °C overnight. DNA was purified by phenol:chloroform:isoamyl alcohol extraction and ethanol precipitation. The DNA was run on a 1.5% agarose gel and a gel slice from 80 to 200 bp was excised and the DNA was purified. Linker-ligated DNA was purified with C1 streptavidin beads (Thermo, 65002). Purified DNA was proceeded on streptavidin beads with end repair, A-tailing, and ligation to Illumina adapters. The DNA was then subjected to ~7–10 cycles of PCR using Illumina paired-end primers and KAPA Polymerase (Roche, KK2502). Amplified library was purified and subject to Illumina NovaSeq 6000 paired-end sequencing.

Footprint-C Data preprocessing

Read pairs were first trimmed by the Illumine adapter and bridge linker sequence (AGCCCGGTNNACGCCCGT, both forward and reverse complementary) from both ends by Trim Galore (https://github.com/FelixKrueger/TrimGalore) and Cutadapt (https://github.com/marcelm/cutadapt/). Only read pairs with bridge linker sequence detected and with both mates ≥10 bp after trimming were kept. Valid Footprint-C fragment contact pairs were obtained from the HiC-Pro⁵⁵ analysis pipeline. The detailed description and code can be found at https://github.com/nservant/HiC-Pro. In brief, a pair of trimmed fastq files were mapped to the human (hg38) or Drosophila (dm6) genome separately by Bowtie2⁵⁶ with ‘--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder’ mode. Aligned reads were paired by the read name. Pairs with multiple hits, low MAPQ (<=10), singleton, dangling end, and self-circle were removed. The fragment contact pairs were obtained from the output BAM files containing paired aligned reads by removing PCR duplicates and were used in downstream analyses. The all 5’ valid pairs were extracted from the fragment contact pairs and were converted to contact matrices in COOL filetype using the Cooler⁵⁷ package. The Footprint-C, Micro-C, in situ Hi-C, and Hi-TrAC raw contact matrices were generated and visualized in heatmaps by cooltools (https://github.com/open2c/cooltools). The iterative correction (IC) -normalized contact matrices were used for all other bioinformatic analysis⁵⁸.

Public datasets

Public data used in this study, including in situ Hi-C, Micro-C, BL-Hi-C, DNase Hi-C, CAP-C, Hi-TrAC, HiCAR, HiPore-C, DNase-seq, RNA-seq, and ChIP-seq, were summarized in Supplementary Data 1. The in situ Hi-C and Micro-C datasets were preprocessed in the same way as Footprint-C without going through the steps of trimming linker. The BL-Hi-C, DNase Hi-C and CAP-C datasets were preprocessed in the same way as Footprint-C. HiCAR datasets were processed as described in ref. ⁵⁹. Briefly, reads were aligned to hg38 reference genome using bwa mem with flags-SP. Alignments were parsed, and the alignment contact pairs were generated using the pairtools (https://github.com/mirnylab/pairtools). The alignment contact pairs with low mapping quality (MAPQ < 10) was filtered out. The alignment contact pairs with the same coordinate on the genome or mapped to the same digestion fragment were removed. The alignment contact pairs of Hi-TrAC and multiway interactions of HiPore-C were directly downloaded from Gene Expression Omnibus (GEO).

Enrichment analysis

The fragment contact pairs of Footprint-C or other Hi-C datasets were first converted to single alignment BED files. Alignments of other Hi-C datasets were scaled to 60 bp, the median fragment length of Footprint-C. genomeCoverageBed in bedtools⁶⁰ were used to compute normalized signal (reads per million for hg38). The average signal distribution relative to the occupied CTCF motifs and distal DNase I hypersensitive sites (dDHS) was computed across ±500 bp by computeMatrix in deepTools⁶¹. The average signal of Read 1 or Read 2 alignments in HiCAR dataset around the occupied CTCF motifs and dDHS were calculated respectively. The list of functional CTCF motifs was generated as described previously⁴³. Briefly, the sets of occupied CTCF motifs in K562 and GM12878 were screened by respective CTCF ChIP-seq datasets, and only those with an average RPM > 1 within a 30 bp window surrounding the motif center base were kept. The set of DHS in K562 or GM12878 cells was called from respective DNase-seq dataset using MACS2. DHS with q values > 20 and distance from TSS > 1 kb were defined dDHS.

Footprint-C 1D peak processing

Each two fragments were extracted from Footprint-C pairs and processed to be compatible with MACS2⁶² input BED files. MACS2 was used to identify peaks with the following parameters: --nomodel --extsize 30 -q 0.05 --call-summits --keep-dup all. The reproducible peaks from multiple biological replicates were kept using ChiP-R⁶³.

Footprint annotation and classification

A list of expressed TFs was selected from genes with top 10000 RPM values from RNA-seq data in K562 or HEK293T. Only motifs of expressed TFs from HOMER’s motif collection¹⁸ were kept for analysis. The closest motif to Footprint-C 1D peak summit was obtained by bedtools closestBed with parameters: -t first. Different TFs from the same family were named in the family name (e.g. KLF). Footprint-annotated peaks were merged and grouped into 3 types of sites (1 V, 2 V, and 3 V) based on the distance between two adjacent footprints. 1 V sites were defined as single footprints with distances >200 bp from their nearest footprints. 2 V sites were defined as two footprints with distances <200 bp, but >200 bp from nearest 3rd footprint. 3 V sites were defined as three footprints with distances <200 bp between any two, but >200 bp from nearest 4th footprint.

Classification of CTCF 1 V sites

In Fig. 2j, CTCF motifs within 1 V sites were extended by ±20 bp from the motif center base. Footprint-C fragment ends within the upstream or downstream 20 bp regions were counted. Skewed CTCF 1 V sites were classified into two categories based on the ratio between two regions (ratio ≥4 or ≤0.25). The third category was randomly picked from all CTCF 1 V sites.

Annotation of promoters with footprints

In Supplementary Fig. 3e, the promoters were defined as 500 bp regions upstream of Transcription Start Site (TSS). A promoter was considered as a “1 V” promoter only if the summit of a 1 V site falls within this promoter region. Similarly, a promoter was considered as a “2 V” or “3 V” promoter only if all summits of the 2 V or 3 V site falls within this promoter region.

Calculation of inside-V fragment ratio

Inside-V fragment ratio was calculated as a/(a + b₁ + b₂) as shown in Supplementary Fig. 3c, where a represents the number of fragments located within the V shape enclosed by two borders extending from the footprint summit with slopes of ±2 (the theoretical slopes of V-plot). b₁ and b₂ represent the number of fragments located within region of ±50 bp from footprint summit but not in the V region.

TAD calling

TADs were called using hicFindTADs⁶⁴ at 40-kb resolution with settings: --thresholdComparisons 0.05 --delta 0.01 --correctForMultipleTesting fdr.

Loop calling

Loops were computed using Mustache⁶⁵ or Chromosight⁶⁶. For Mustache, loops were called at 5-kb resolution using the options -r 5000 --pThreshold 0.1. For Chromosight, loops were called by Chromosight detect function at 5-kb resolution. Loops smaller than 20 kb were removed. Common and specific loops between Footprint-C and Micro-C were characterized as described previously⁶⁷. Briefly, loops with both anchors overlapping were called as common loops. Loops with one or neither anchor overlapping were called as specific loops. Anchors were extended ±20 kb, and pairtopair in bedtools were used to characterize common loops with parameters -type both -f 0.5. Footprint-C or Micro-C specific loops were characterized with parameters -type notboth -slop 40000. Pile-up plots of loops were generated using cooltools.

Loop classification

Loops were annotated and categorized into promoter-promoter (Pro-Pro), promoter-enhancer (Pro-Enh), enhancer-enhancer (Enh-Enh), structural between A compartments (A-A structural), structural between B compartments (B-B structural), or other loops based on the annotation of active TSS, H3K27ac ChIP-seq peaks⁶⁸ and PC1 values from Footprint-C data. H3K27ac peaks were called from K562 H3K27ac ChIP-seq dataset using MACS2⁶². RNA-seq data was used to annotate active TSS. Firstly, we extended anchors ±10 kb. Anchors containing active TSS were defined as Pro. Secondly, remaining anchors containing H3K27ac peaks were defined as Enh. Thirdly, remaining anchors were further defined as structural A or B anchors based on the PC1 value of anchors. Pro-Pro, Pro-Enh, Enh-Enh, A-A structural, and B-B structural loops were defined by the respective annotation of the two anchors.

Stripe calling

Stripes were called from contact matrices using Stripenn⁶⁹ or StripeCaller⁷⁰. The stripes were called by Stripenn at 5-kb resolution with the following settings: -m 0.95,0.96,0.97,0.98,0.99 -p 0.1. Stripes were called by StripeCaller with the following settings: --local-num 2 --fold-enrichment 1.1 --min-seed-len 6. Pile-up plots of stripes were obtained using coolpup.py⁷¹, with the following settings: --local --rescale. The loop domains in Supplementary Fig. 4k were annotated by horizontal and vertical stripes obtained by Stripenn, and were divided into four types of left-, right-, both-sided, and no stripe loops. The loop domains were divided into 50 equally distanced intervals. The left or right anchors were extended 10 intervals forward or backward respectively. The percentage of loops overlapping with 1 V sites in each interval were calculated.

Insulation score analysis

The insulation scores⁷² were calculated at 40-kb resolution using the cooltools package.

Compartment analysis

The compartments were identified at 100-kb resolution using cooltools package. The eigenvector of the first principal component represents the compartment profile, with positive and negative values representing A and B compartments respectively².

Motif preprocessing

Transcription factor motif coordinates were obtained from HOMER¹⁸ and CIS-BP⁷³ database. The coordinates from HOMER were used directly. The coordinates from CIS-BP were annotated by FIMO⁷⁴. The lists of functional CTCF, EGR1, KLF, MAZ and SP motifs were generated as described previously⁴³. Briefly, for K562 cells, the CDC5L, E2F4, TFDP1, VEZF1, and ZNF143 motif coordinates were screened by respective ChIP-seq data, and only those with an average RPM value above 0.2 within a 30 bp window surrounding the motif center base were kept. MTF2, RBAK, and ZFP69B motif coordinates were screened by DNase-seq data, and only those with an average RPM value above 0.1 within a 30 bp window surrounding the motif center base were kept. For HEK293T cells, the RBAK and ZNF69B motif coordinates were screened by respective ChIP-seq data, and only those with an average RPM value above 0.1 within a 30 bp window surrounding the motif center base were kept. E2F4, EGR1, ZNF143, VEZF1, CDC5L, TFDP1, and MTF2 motif coordinates were screened by DNase-seq data, and only those with an average RPM value above 0.1 within a 30 bp window surrounding the motif center base were kept.

Motif uniqueness analysis

The fragment contact pairs of Footprint-C and other Hi-C datasets were first converted to single fragment BED files. The Footprint-C fragments less than or equal to 60 bp in length were kept for analysis. The fragments of in situ Hi-C were extended to the nearest GATC site (DpnII). The fragments of BL-Hi-C were extended to the nearest GGCC sites (HaeIII). The fragments of Hi-TrAC were extended to the nearest AATT or CATG sites (MluCI or NlaIII). The fragments of HiCAR were extended to the nearest GTAC sites (CviQI). The fragments of Micro-C were extended ±75 bp from the center base. Fragment coordinates were then intersected with all HOMER motif coordinates by intersectBed in bedtools. Finally, the proportions of fragments from Footprint-C, in situ Hi-C, BL-Hi-C, Hi-TrAC, HiCAR, or Micro-C datasets annotated with zero, one or multiple motifs were calculated.

Analysis of motif orientation of CTCF-CTCF contacts

The closest CTCF motif to upstream and downstream read pair within contacts from Footprint-C or others Hi-C datasets was obtained by bedtools closestBed with parameters: -t first. Only the contacts with both alignments overlapping with center base of a CTCF motif were counted. The frequencies of CTCF contacts were calculated according to the orientations of the two motifs (+−: convergent, ++: tandem F, −+: divergent, −−: tandem R).

Construction of TF motif-based contact maps

The lists of functional CTCF or MAZ motifs were generated as described previously⁴³ and also under “Motif uniqueness analysis”. The motifs were sorted by genome coordinates, indexed, and used to construct a genome-wide 2D contact map. The Footprint-C fragments were annotated by the indexed motifs. The contact pairs with motifs annotated on both fragments were extracted and dumped to the respective bins in the 2D contact map. The counts in each bin were shaded by four different colors according to the orientations of the two motifs (+−, ++, −+, −−). The motif-based contact maps were plotted using ggplot2, and were merged using Adobe Illustrator.

Three-way interaction analysis

The three-way interactions were inferred from two-way motif interactions using quick-cliques (https://github.com/darrenstrash/quick-cliques). For three-way interactions between CTCF and MAZ, KLF, or EGR1, the obtained four-way interactions were split into three-way interactions in addition to the directly inferred three-way interactions. The inferred three-way interactions were then intersected with HiPore-C data by intersectBed in bedtools, and only interactions with all three motifs located in three different fragments of a single multiway complex were kept for motif orientation analysis.

Statistics and reproducibility

The stratum-adjusted correlation coefficient using HiCRep⁷⁵ was calculated for 100-kb resolution contact matrices by parameters settings of --h 1 --dBPMax 100000 --binSize 100000. Comparisons were performed between two biological replicates of Footprint-C libraries, or between Footprint-C and Micro-C or in situ Hi-C libraries. Two-sided Wilcoxon rank-sum test was used to analyze differences between groups of data. Statistical details were shown in the figure or legends. No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All raw and processed data can be obtained at Genome Sequence Archive (GSA) under accession number HRA004768. Detailed statistics of Footprint-C datasets are provided in Supplementary Data 2. A list of Footprint-C specific 1D peaks as described in Supplementary Fig. 1i is provided as Supplementary Data 3. Source data are provided with this paper.

Code availability

Scripts used to process the data are available at: https://github.com/xiaokunliu01/Footprint-C and https://doi.org/10.5281/zenodo.14191418.

References

Misteli, T. The Self-Organizing Genome: Principles of Genome Architecture and Function. Cell 183, 28–45 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article CAS PubMed PubMed Central MATH Google Scholar
Hsieh, T. H. et al. Mapping Nucleosome Resolution Chromosome Folding in Yeast by Micro-C. Cell 162, 108–119 (2015).
Article CAS PubMed PubMed Central MATH Google Scholar
Wei, X. et al. HiCAR is a robust and sensitive method to analyze open-chromatin-associated genome organization. Mol. Cell 82, 1225–1238 e6 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Liu, S., Cao, Y., Cui, K., Tang, Q. & Zhao, K. Hi-TrAC reveals division of labor of transcription factors in organizing chromatin loops. Nat. Commun. 13, 6679 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Li, T., Jia, L., Cao, Y., Chen, Q. & Li, C. OCEAN-C: mapping hubs of open chromatin interactions across the genome reveals gene regulatory networks. Genome Biol. 19, 54 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Hsieh, T. S. et al. Resolving the 3D Landscape of Transcription-Linked Mammalian Chromatin Folding. Mol. Cell 78, 539–553 e8 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Krietenstein, N. et al. Ultrastructural Details of Mammalian Chromosome Architecture. Mol. Cell 78, 554–565 e7 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Vierstra, J. et al. Global reference mapping of human transcription factor footprints. Nature 583, 729–736 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Ma, W. et al. Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat. Methods 12, 71–78 (2015).
Article ADS PubMed MATH Google Scholar
Gridina, M. et al. A cookbook for DNase Hi-C. Epigenetics Chromatin 14, 15 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
You, Q. et al. Direct DNA crosslinking with CAP-C uncovers transcription-dependent chromatin organization at high resolution. Nat. Biotechnol. 39, 225–235 (2021).
Article CAS PubMed MATH Google Scholar
Ong, C. T. & Corces, V. G. CTCF: an architectural protein bridging genome topology and function. Nat. Rev. Genet 15, 234–246 (2014).
Article CAS PubMed PubMed Central Google Scholar
Liang, Z. et al. BL-Hi-C is an efficient and sensitive approach for capturing structural and regulatory chromatin interactions. Nat. Commun. 8, 1622 (2017).
Article ADS PubMed PubMed Central MATH Google Scholar
Barshad, G. et al. RNA polymerase II dynamics shape enhancer-promoter interactions. Nat. Genet 55, 1370–1380 (2023).
Article CAS PubMed PubMed Central MATH Google Scholar
Henikoff, J. G., Belsky, J. A., Krassovsky, K., MacAlpine, D. M. & Henikoff, S. Epigenome characterization at single base-pair resolution. Proc. Natl Acad. Sci. USA 108, 18318–18323 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article CAS PubMed PubMed Central MATH Google Scholar
Xu, C. & Corces, V. G. Towards a predictive model of chromatin 3D organization. Semin Cell Dev. Biol. 57, 24–30 (2016).
Article CAS PubMed MATH Google Scholar
Rowley, M. J. et al. Evolutionarily Conserved Principles Predict 3D Chromatin Organization. Mol. Cell 67, 837–852 e7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Phillips, J. E. & Corces, V. G. CTCF: master weaver of the genome. Cell 137, 1194–1211 (2009).
Article PubMed PubMed Central MATH Google Scholar
Xiao, T., Li, X. & Felsenfeld, G. The Myc-associated zinc finger protein (MAZ) works together with CTCF to control cohesin positioning and genome organization. Proc. Natl Acad. Sci. USA 118, e2023127118 (2021).
Ortabozkoyun, H. et al. CRISPR and biochemical screens identify MAZ as a cofactor in CTCF-mediated insulation at Hox clusters. Nat. Genet 54, 202–212 (2022).
Article CAS PubMed PubMed Central Google Scholar
Akerberg, B. N. et al. A reference map of murine cardiac transcription factor chromatin occupancy identifies dynamic and conserved enhancers. Nat. Commun. 10, 4907 (2019).
Article ADS PubMed PubMed Central MATH Google Scholar
Nie, Y., Shu, C. & Sun, X. Cooperative binding of transcription factors in the human genome. Genomics 112, 3427–3434 (2020).
Article CAS PubMed MATH Google Scholar
Zhao, Y. et al. Stripe” transcription factors provide accessibility to co-binding partners in mammalian genomes. Mol. Cell 82, 3398–3411 e11 (2022).
Article CAS PubMed MATH Google Scholar
Georgakopoulos-Soares, I. et al. Transcription factor binding site orientation and order are major drivers of gene regulatory activity. Nat. Commun. 14, 2333 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Pugacheva, E. M. et al. Comparative analyses of CTCF and BORIS occupancies uncover two distinct classes of CTCF binding genomic regions. Genome Biol. 16, 161 (2015).
Article PubMed PubMed Central MATH Google Scholar
Leng, F. et al. The transcription factor FoxP3 can fold into two dimerization states with divergent implications for regulatory T cell function and immune homeostasis. Immunity 55, 1354–1369 e8 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Choi, Y. et al. FOXL2 and FOXA1 cooperatively assemble on the TP53 promoter in alternative dimer configurations. Nucleic Acids Res. 50, 8929–8946 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article ADS MATH Google Scholar
Zhang, K., Li, N., Ainsworth, R. I. & Wang, W. Systematic identification of protein combinations mediating chromatin looping. Nat. Commun. 7, 12249 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Hu, G. et al. Systematic screening of CTCF binding partners identifies that BHLHE40 regulates CTCF genome-wide distribution and long-range chromatin interactions. Nucleic Acids Res. 48, 9606–9620 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Tang, Z. et al. CTCF-Mediated Human 3D Genome Architecture Reveals Chromatin Topology for Transcription. Cell 163, 1611–1627 (2015).
Article CAS PubMed PubMed Central MATH Google Scholar
Ohno, M. et al. Sub-nucleosomal Genome Structure Reveals Distinct Nucleosome Folding Motifs. Cell 176, 520–534 e25 (2019).
Article CAS PubMed Google Scholar
Kadota, M. et al. Multifaceted Hi-C benchmarking: what makes a difference in chromosome-scale genome scaffolding? Gigascience 9, giz158 (2020).
Vian, L. et al. The Energetics and Physiological Impact of Cohesin Extrusion. Cell 173, 1165–1178 e20 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Rao, S. S. P. et al. Cohesin Loss Eliminates All Loop Domains. Cell 171, 305–320 e24 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wong, K. M., Song, J. & Wong, Y. H. CTCF and EGR1 suppress breast cancer cell migration through transcriptional control of Nm23-H1. Sci. Rep. 11, 491 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Bailey, S. D. et al. ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat. Commun. 2, 6186 (2015).
Article ADS PubMed MATH Google Scholar
Zhou, Q. et al. ZNF143 mediates CTCF-bound promoter-enhancer loops required for murine hematopoietic stem and progenitor cell function. Nat. Commun. 12, 43 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, M., Huang, H., Li, J. & Wu, Q. ZNF143 deletion alters enhancer/promoter looping and CTCF/cohesin geometry. Cell Rep. 43, 113663 (2024).
Article CAS PubMed Google Scholar
Xu, C. & Corces, V. G. Nascent DNA methylome mapping reveals inheritance of hemimethylation at CTCF/cohesin sites. Science 359, 1166–1170 (2018).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Zhong, J. Y. et al. High-throughput Pore-C reveals the single-allele topology and cell type-specificity of 3D genome folding. Nat. Commun. 14, 1250 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Lambert, S. A. et al. The Human Transcription Factors. Cell 172, 650–665 (2018).
Article CAS PubMed MATH Google Scholar
Yuan, G. C. et al. Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309, 626–630 (2005).
Article ADS CAS PubMed MATH Google Scholar
Davidson, I. F. & Peters, J. M. Genome folding through loop extrusion by SMC complexes. Nat. Rev. Mol. Cell Biol. 22, 445–464 (2021).
Article CAS PubMed Google Scholar
Ortabozkoyun, H. et al. Members of an array of zinc-finger proteins specify distinct Hox chromatin boundaries. Mol. Cell 84, 3406–3422 e6 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
de Wit, E. & Nora, E. P. New insights into genome folding by loop extrusion from inducible degron technologies. Nat. Rev. Genet 24, 73–85 (2023).
Article PubMed Google Scholar
Magnitov, M. D. et al. ZNF143 is a transcriptional regulator of nuclear-encoded mitochondrial genes that acts independently of looping and CTCF. bioRxiv, https://www.biorxiv.org/content/10.1101/2024.03.08.583864v1 (2024).
Narducci, D. N. & Hansen, A. S. Putative Looping Factor ZNF143/ZFP143 is an Essential Transcriptional Regulator with No Looping Function. bioRxiv, https://www.biorxiv.org/content/10.1101/2024.03.08.583987v1 (2024).
Stadhouders, R., Filion, G. J. & Graf, T. Transcription factors and 3D genome conformation in cell-fate decisions. Nature 569, 345–354 (2019).
Article ADS CAS PubMed MATH Google Scholar
Nabet, B. et al. The dTAG system for immediate and target-specific protein degradation. Nat. Chem. Biol. 14, 431–441 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Concordet, J. P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res 46, W242–W245 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Article PubMed PubMed Central MATH Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Abdennur, N. & Mirny, L. A. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics 36, 311–316 (2020).
Article CAS PubMed Google Scholar
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Article CAS PubMed PubMed Central MATH Google Scholar
Wei, X. L. et al. HiCAR is a robust and sensitive method to analyze open-chromatin-associated genome organization. Mol. Cell 82, 1225–122 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Quinlan, A. R. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr. Protoc. Bioinforma. 47, 1–34 (2014).
Article MATH Google Scholar
Ramirez, F., Dundar, F., Diehl, S., Gruning, B. A. & Manke, T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res 42, W187–W191 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central MATH Google Scholar
Newell, R. et al. ChIP-R: Assembling reproducible sets of ChIP-seq and ATAC-seq peaks from multiple replicates. Genomics 113, 1855–1866 (2021).
Article CAS PubMed MATH Google Scholar
Ramirez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9, 189 (2018).
Article ADS PubMed PubMed Central MATH Google Scholar
Roayaei Ardakany, A., Gezer, H. T., Lonardi, S. & Ay, F. Mustache: multi-scale detection of chromatin loops from Hi-C and Micro-C maps using scale-space representation. Genome Biol. 21, 256 (2020).
Article PubMed PubMed Central MATH Google Scholar
Matthey-Doret, C. et al. Computer vision for pattern detection in chromosome contact maps. Nat. Commun. 11, 5795 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Yang, H. et al. A map of cis-regulatory elements and 3D genome structures in zebrafish. Nature 588, 337–343 (2020).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Xu, J. et al. Subtype-specific 3D genome alteration in acute myeloid leukaemia. Nature 611, 387–398 (2022).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Yoon, S., Chandra, A. & Vahedi, G. Stripenn detects architectural stripes from chromatin conformation data using computer vision. Nat. Commun. 13, 1602 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Vian, L. et al. The Energetics and Physiological Impact of Cohesin Extrusion. Cell 175, 292–294 (2018).
Article CAS PubMed PubMed Central MATH Google Scholar
Flyamer, I. M., Illingworth, R. S. & Bickmore, W. A. Coolpup.py: versatile pile-up analysis of Hi-C data. Bioinformatics 36, 2980–2985 (2020).
Article CAS PubMed PubMed Central MATH Google Scholar
Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).
Article CAS PubMed PubMed Central MATH Google Scholar
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Article CAS PubMed PubMed Central MATH Google Scholar
Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar

Download references

Acknowledgements

This work was supported by grants from the Ministry of Science and Technology of China (2022YFC2703303 and 2020YFA0803401 to C.X., and 2022YFC2703300 to Q.W.), and the National Natural Science Foundation of China (32070611 and 32370624 to C.X.).

Author information

These authors contributed equally: Xiaokun Liu, Hanhan Wei

Authors and Affiliations

China National Center for Bioinformation, Beijing, China
Xiaokun Liu, Hanhan Wei, Qifan Zhang & Chenhuan Xu
Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
Xiaokun Liu, Hanhan Wei, Qifan Zhang & Chenhuan Xu
University of Chinese Academy of Sciences, Beijing, China
Xiaokun Liu, Hanhan Wei, Qifan Zhang & Chenhuan Xu
Department of Ultrasound, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing Maternal and Child Health Care Hospital, Beijing, China
Na Zhang & Qingqing Wu

Authors

Xiaokun Liu
View author publications
Search author on:PubMed Google Scholar
Hanhan Wei
View author publications
Search author on:PubMed Google Scholar
Qifan Zhang
View author publications
Search author on:PubMed Google Scholar
Na Zhang
View author publications
Search author on:PubMed Google Scholar
Qingqing Wu
View author publications
Search author on:PubMed Google Scholar
Chenhuan Xu
View author publications
Search author on:PubMed Google Scholar

Contributions

C.X. conceived and supervised the project. C.X. and Q.W. acquired funding. C.X. and X.L. designed experiments. X.L. performed the experiments. X.L., H.W., and Q.Z. conducted the analyses and visualizations. N.Z. provided resources. C.X. wrote the manuscript with input from all authors.

Corresponding author

Correspondence to Chenhuan Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Ferhat Ay and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Reporting Summary

Peer Review file

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, X., Wei, H., Zhang, Q. et al. Footprint-C reveals transcription factor modes in local clusters and long-range chromatin interactions. Nat Commun 15, 10922 (2024). https://doi.org/10.1038/s41467-024-55403-7

Download citation

Received: 19 April 2024
Accepted: 10 December 2024
Published: 30 December 2024
Version of record: 30 December 2024
DOI: https://doi.org/10.1038/s41467-024-55403-7