Boosting the detection of enhancer-promoter loops via normalization methods for chromatin interaction data

Wang, Xiaotao; Shi, Detong; Xue, Feiyang; Liu, Yunxia; Yang, Hongbo; Jiang, Linghan

doi:10.1038/s41467-026-69082-z

Download PDF

Article
Open access
Published: 03 February 2026

Boosting the detection of enhancer-promoter loops via normalization methods for chromatin interaction data

Nature Communications volume 17, Article number: 2299 (2026) Cite this article

4451 Accesses
12 Altmetric
Metrics details

Subjects

Abstract

Accurate detection of enhancer-promoter loops from genome-wide chromatin interaction data is critical for understanding gene regulation. Standard normalization methods, such as matrix balancing approaches, are widely used to correct biases in chromatin contact data prior to chromatin loop detection. However, while these methods preserve structural loop signals, they often attenuate enhancer-promoter interaction signals, making these regulatory loops more difficult to detect. To address this limitation, we develop Raichu, a normalization method for chromatin contact data. Raichu identifies nearly twice as many loops as conventional normalization approaches, recovering almost all previously detected loops while uncovering thousands of additional enhancer-promoter interactions that are otherwise missed. With its improved sensitivity for regulatory loops, Raichu detects more biologically meaningful differential interactions, including those between conditions within the same cell type. Moreover, Raichu performs robustly across a wide range of sequencing depths, resolutions, species, and experimental platforms, making it a versatile tool for revealing insights into three-dimensional genome organization and transcriptional regulation.

Deep learning connects DNA traces to transcription to reveal predictive features beyond enhancer–promoter contact

Article Open access 08 June 2021

Nonlinear control of transcription through enhancer–promoter interactions

Article Open access 13 April 2022

Chromatin alternates between A and B compartments at kilobase scale for subgenic organization

Article Open access 06 June 2023

Introduction

The spatial organization of chromatin plays a critical role in the precise control of transcriptional programs in mammalian cells^1,2. At kilobase to megabase scales, distant loci on the linear genome can come into close proximity in 3D space, forming structures known as chromatin loops³. These loops are broadly categorized into two types: structural loops, which connect CTCF-bound insulators, and regulatory loops, which link promoters to distal cis-regulatory elements, such as enhancers^4,5. Disruption or rewiring of chromatin loops has been implicated in various developmental disorders and cancers^6,7.

A suite of experimental techniques has been developed to explore 3D genome organization and map chromatin loops⁸. Among these, Hi-C has become one of the most widely used methods due to its ability to capture chromatin contacts between all possible pairs of genomic loci^3,9. However, Hi-C data are subject to substantial technical biases, including those related to sequence mappability, GC content, and restriction fragment length¹⁰. To mitigate these biases, a range of computational normalization methods has been developed, which can be broadly categorized into three groups: explicit methods, such as HiCNorm¹¹ and OneD¹², which model known sources of bias; implicit methods, such as ICE (Iterative Correction and Eigenvector decomposition)¹³ and KR (Knight–Ruiz matrix balancing)³, which adjust the data without explicitly modeling biases; and hybrid methods, which combine features of both approaches¹⁴. Implicit methods like ICE and KR, owing to their simplicity and broad applicability, have become the de facto standards for normalizing Hi-C data and are widely used in downstream analyses, including the identification of chromatin loops.

However, both ICE and KR have notable limitations. In a recent study by the 4D Nucleome Consortium, we observed that although these methods perform well in identifying CTCF-mediated loops, chromatin compartments, and other higher-order structures, they often fail to detect transcription-related loops¹⁵. This limitation stems from a core assumption in both methods: that all genomic loci should have equal visibility in a Hi-C map. In practice, this assumption can lead to over-correction of interaction signals, causing low-frequency contacts—such as enhancer–promoter loops—to be under-detected, despite their central role in gene regulation.

To address this long-standing gap in 3D genome analysis, we introduce Raichu, a computational method for normalizing chromatin contact data. Raichu is an implicit method that retains the simplicity of ICE and KR but diverges from their core assumption of uniform visibility across genomic loci. Instead, it employs an optimization-based approach to adjust for variable interaction biases embedded in the raw data, allowing it to preserve signals from biologically important—but often subtle—chromatin interactions. Our results show that Raichu detects nearly twice as many chromatin loops as either ICE or KR, with a notable enrichment for enhancer–promoter loops critical to gene regulation. Importantly, Raichu outperforms existing methods in identifying differential loops between experimental conditions, offering insights into how chromatin architecture regulates transcription across cellular states. Furthermore, Raichu demonstrates robustness across varying sequencing depths and 3D genomic platforms, making it a versatile tool for chromatin interaction analysis.

Results

Limitations of existing Hi-C data normalization methods in detecting transcription-related loops

In this study, we focus on implicit normalization methods for Hi-C data, as explicit and hybrid methods require additional external inputs (e.g., mappability, GC content, restriction fragment length), which limits their applicability to genome assemblies where such data may be unavailable (Supplementary Table 1). Specifically, we compare two widely used matrix balancing implementations: Iterative Correction and Eigenvector decomposition (ICE), as implemented in the cooler package¹⁶, and Knight–Ruiz balancing (KR), available in the juicer toolkit¹⁷. Both cooler and juicer are central to the 3D genomic data analysis ecosystem and serve as default Hi-C processing tools within the 4D Nucleome Consortium pipeline¹⁸. While other software based on implicit methods exists, these tools either implement conceptually similar matrix balancing algorithms or are less compatible with standard downstream analysis workflows.

In Supplementary Fig. 1, we present raw Hi-C contact maps alongside ICE- and KR-normalized maps for selected genomic regions in GM12878 cells. As expected, both normalization methods reduce background noise and improve the visibility of topologically associating domains (TADs)^19,20 and chromatin loop structures. However, closer examination reveals that many enhancer-associated loops—marked by H3K27ac peaks—are clearly visible in the raw Hi-C data but become substantially attenuated after ICE or KR normalization, often to the point of being indistinguishable from background noise. This observation highlights a key limitation of existing normalization methods: their tendency to suppress transcription-related interactions, particularly low-frequency enhancer–promoter loops.

Raichu: a computational method for normalizing chromatin contact data

Here, we present Raichu, a computational method for normalizing chromatin contact data (Fig. 1a). Raichu is conceptually grounded in a biophysical framework that models chromatin as a polymer, where the observed interaction frequency between any two loci reflects the combined effects of three components: (1) a distance-dependent decay, (2) locus-specific biases, and (3) a residual component specific to the interaction itself. Among these, the distance-dependent decay is a well-established property of polymer folding and accounts for the global background of interactions, describing the average decrease in contact frequency with increasing genomic distance. This behavior is consistently observed across Hi-C and other 3D genome mapping technologies and can be robustly estimated from the raw contact matrix¹⁵. Locus-specific biases—arising from factors such as GC content, mappability, and restriction fragment density—are not intrinsic to 3D genome organization but introduce systematic distortions into measured contact frequencies.

**Fig. 1: A computational method for normalizing chromatin contact data.**

To correct for these biases, Raichu employs an efficient optimization algorithm based on dual annealing to estimate a genome-wide bias vector that, together with the distance-dependent decay (which is derived from the raw contact matrix and remains fixed throughout the optimization process), best explains the observed raw contact map (Methods). The estimated bias values are then used to normalize the data in a manner similar to ICE and KR—by dividing the observed contact frequency by the product of the bias values for the two interacting loci. This procedure is designed to remove locus-specific technical biases while preserving the expected distance-decay and retaining a residual signal that reflects genuine interactions between specific loci.

Unlike ICE and KR, Raichu-normalized contact maps exhibit substantially greater variability in visibility (Supplementary Fig. 2). Using multiple Hi-C datasets as benchmarks, we found that Raichu preserves the large-scale distance-dependent decay structure (Fig. 1b), and this pattern holds across species (Supplementary Fig. 3) and under various perturbations, including the depletion of cohesion, NIPBL, and WAPL (Supplementary Fig. 4)^3,21,22,23. Importantly, although Raichu uses a single uniform distance-decay function during optimization, it accurately captures region-specific decay behavior across diverse genomic contexts, such as different compartment types³ and chromatin domain states (active, inactive, and repressed)²⁴ (Supplementary Fig. 3).

To directly test whether a uniform distance-decay function is necessary for Raichu to function properly, we implemented an alternative version that applies compartment-specific decay functions. The resulting bias vectors and normalized contact maps were highly similar to those produced using a single genome-wide decay function, demonstrating that Raichu does not rely on the assumption of uniform scaling decay (Supplementary Fig. 5).

Raichu-normalized contact maps are highly reproducible across biological replicates, with HiCRep correlation coefficients²⁵ comparable to ICE and higher than KR. This difference may reflect the distinct treatment of regions with poor mappability or low coverage—such as those near centromeres and telomeres—where both ICE and Raichu exclude these regions from normalization, whereas KR assigns bias values to them (Supplementary Fig. 6). Outside of these problematic regions, the bias vectors and normalized contact maps generated by all three methods were broadly similar (Fig. 1c and Supplementary Fig. 7). Key chromatin architectural features, including compartments (measured by the first principal component, PC1) and TADs (measured by insulation scores)²⁶, also showed strong concordance across methods (Supplementary Figs. 8 and 9).

Upon closer inspection of specific genomic regions, however, we found that while Raichu’s bias vectors followed similar trends to those of ICE and KR, they differed at specific loci, with peak and valley magnitudes generally lower (Fig. 1d). This difference translated into stronger Raichu-normalized signals for interactions that are weaker than canonical loop dots yet clearly visible in raw Hi-C data (highlighted by black circles in the bottom panel of Supplementary Fig. 7b). Given that various transcription factors (TFs) and histone modifications have been associated with chromatin loop formation, we next evaluated the enrichment of normalized contact signals between ChIP-seq peaks for selected TFs and histone modifications (Fig. 1e and Supplementary Fig. 10). Across all evaluated factors, Raichu-normalized signals consistently showed greater enrichment than those from ICE and KR, suggesting that Raichu enhances the detection of chromatin loops and may be better suited for capturing transcription-related interactions.

Raichu identifies thousands of transcription-related loops missed by existing methods

To assess the effectiveness of Raichu in detecting chromatin loops, we applied HiCCUPS³—a widely used loop-calling algorithm—to Hi-C contact maps normalized by either ICE or Raichu at 5-kb and 10-kb resolutions (Methods). We first benchmarked performance using the GM12878 dataset, which is one of the most deeply sequenced Hi-C datasets to date. Strikingly, while ICE detected 15,446 loops, Raichu identified 28,986. Moreover, 90.6% of ICE-detected loops (13,997 out of 15,446) were also recovered by Raichu, whereas 51.7% of Raichu-detected loops (14,989 out of 28,986) were not identified by ICE (Fig. 2a).

**Fig. 2: Raichu identifies thousands of transcription-related loops missed by ICE.**

We grouped the detected loops into three categories: ICE-specific, Raichu-specific, and common loops identified by both methods. While the average genomic distance between loop anchors was similar across categories, Raichu-specific loops displayed a distinct bimodal size distribution (Fig. 2b)—a pattern consistent with the idea that transcription-related loops tend to span shorter distances than structural CTCF-mediated loops^5,15,27,28.

To evaluate the functional relevance of the detected loops, we analyzed the overlap between loop anchors and ChIP-seq peaks for selected TFs and histone modifications. Fig. 2c shows a representative genomic region where ICE identified only two loops, both associated with CTCF and RAD21 peaks. In contrast, Raichu not only recovered these two loops but also uncovered seven additional loops in the same region. Most of these additional loops overlapped with either H3K27ac peaks (a mark of active enhancers and promoters) or H3K4me3 peaks (a promoter mark) at both anchors, suggesting their involvement in transcriptional regulation.

At the genome-wide level, Raichu-specific loops were more frequently classified as enhancer–promoter (E–P), enhancer–enhancer (E–E), and promoter–promoter (P–P) interactions (40.8, 46.3, and 19.7%, respectively) compared to ICE-specific loops (19.3, 25.5, and 7.3%) and common loops (23.2, 28.0, and 8.9%) (Fig. 2d). To systematically investigate regulatory factor associations, we computed the fold enrichment of 132 TFs and 10 histone modifications at loop anchors using ENCODE ChIP-Seq data (Fig. 2e). As expected, common loops showed the strongest enrichment for CTCF and RAD21. While ICE-specific and Raichu-specific loops exhibited comparable enrichment for these structural factors, Raichu-specific loops showed substantially greater enrichment for a broader set of transcription-associated factors, including RNA polymerase II (POLR2A), CREB1, RELB, H3K4me3, and H3K27ac (Fig. 2e, f and Supplementary Fig. 11).

To further explore the nature of loops detected by ICE and Raichu, we validated each set against four orthogonal 3D genomic datasets (Supplementary Data 1 and 5): CTCF ChIA-PET (targeting CTCF-mediated interactions), Pol2 ChIA-PET (targeting RNA polymerase II-mediated interactions), H3K27ac HiChIP (targeting H3K27ac interactions), and H3K4me3 HiChIP (targeting H3K4me3 interactions). As expected, common loops exhibited the highest validation rate across these datasets (72.8%; 10,191 out of 13,997), followed by Raichu-specific loops (52.2%; 7823 out of 14,989) and ICE-specific loops (44.2%; 640 out of 1449) (Fig. 2g). Notably, among the validated Raichu-specific loops, 74.6% (5837 out of 7823) overlapped with transcription-related interactions captured by Pol2 ChIA-PET, H3K27ac HiChIP, or H3K4me3 HiChIP—substantially higher than the corresponding proportions for ICE-specific loops (60.5%) and common loops (57.8%). Even for Raichu-specific loops that did not overlap with any orthogonal datasets (7166 out of 14,989), Aggregate Peak Analysis (APA) using raw contact maps from ChIA-PET and HiChIP revealed clear enrichment signals, whereas ICE-specific loops in the unsupported category (809 out of 1449) exhibited much weaker enrichment (Supplementary Fig. 12a). Furthermore, ChIP-Seq profiles demonstrated that anchors of Raichu-specific loops unique to Hi-C were enriched for CTCF as well as multiple transcription-related factors, whereas ICE-specific loops unique to Hi-C were enriched only for CTCF and RAD21, with even those signals weaker than in Raichu-specific loops (Supplementary Fig. 12b).

It is worth noting that the absence of a loop from a call set does not necessarily imply a lack of enrichment; it may simply fall below the statistical threshold. To assess whether ICE-specific and Raichu-specific loops were enriched in each normalization setting, we again performed APA (Fig. 2h). Interestingly, ICE-specific loops showed even greater enrichment in Raichu-normalized maps than in ICE-normalized ones, indicating that these loops are detectable by Raichu but did not rank highly enough to be called as loops. In contrast, Raichu-specific loops displayed much weaker enrichment in ICE-normalized data, suggesting they are genuinely missed by ICE, even when applying relaxed cutoffs.

We also benchmarked Raichu against KR normalization and assessed its performance using loop-calling algorithms other than HiCCUPS²⁹, with consistent results observed across all comparisons (Supplementary Figs. 13 and 14).

Raichu-detected loops are supported by ligation-free and imaging-based methods

We next extended our benchmark analyses to mouse embryonic stem cells (mESCs), a system with both Hi-C³⁰ and multiple orthogonal 3D genomic datasets^31,32,33 available. Consistent with our findings in GM12878 cells, Raichu substantially increased loop detection sensitivity in mESCs, identifying 27,410 loops compared to 11,333 detected by ICE (Fig. 3a). While the majority of ICE-detected loops were also recovered by Raichu, 65.4% of Raichu-detected loops were not detected by ICE. As in GM12878, Raichu-specific loops in mESCs exhibited a distinct bimodal size distribution (Fig. 3b), showed stronger enrichment for active histone modifications and stem cell-specific transcription factors (e.g., POU5F1 and NANOG; Fig. 3c), and were more likely to span E–E, E–P, and P–P interactions (Fig. 3d) than either ICE-specific loops or those shared between both methods. Similar results were also obtained from comparisons between Raichu and KR on the same dataset (Supplementary Fig. 15a).

**Fig. 3: Validation of Raichu-specific loops using DNA SPRITE, GAM, and DNA seqFISH+ in mouse embryonic stem cells (mESCs).**

To determine whether Raichu-specific loops represent bona fide spatial interactions, we performed APA using contact signals from two ligation-free 3D genome mapping methods: GAM³¹ and DNA SPRITE³². We observed strong signal enrichment at Raichu-specific loop positions in both datasets (Fig. 3e), supporting their physical proximity in the nucleus. To directly measure spatial distances between loop anchors, we further analyzed high-resolution DNA seqFISH+ imaging data³³. We examined three categories of Raichu-specific loops: (1) CTCF-mediated loops (with CTCF binding peaks at either anchor), (2) E–P loops, and (3) other loops that are neither CTCF-mediated nor E–P. For each category, we constructed matched control loops by fixing one anchor and placing the other at the same genomic distance but in the opposite direction. Across all three categories, Raichu-specific loops exhibited significantly shorter spatial distances between anchors than their matched controls (Fig. 3f), indicating that these loci are brought into closer proximity in 3D nuclear space.

Together, these orthogonal validations—using both ligation-free contact data and direct imaging—provide independent support for the biological relevance of Raichu-specific loops and demonstrate that Raichu accurately captures higher-order chromatin organization.

Raichu is robust across different sequencing depths

To evaulate the impact of sequencing depth on Raichu’s performance, we computationally downsampled the GM12878 dataset to 23 different levels, ranging from 100 million to 2 billion intra-chromosomal reads (Methods). As expected, the number of detected loops decreased with reduced sequencing depth for both ICE and Raichu. Notably, Raichu consistently identified approximately twice as many loops as ICE across all tested depths (Fig. 4a, b). Consistent with observations from the full-depth dataset, the additional loops detected by Raichu were enriched for transcription-related interactions, as indicated by a higher fraction of anchors overlapping H3K27ac peaks (Fig. 4c).

**Fig. 4: Raichu performs well across various sequencing depths.**

Interestingly, the bimodal size distribution characteristic of Raichu-specific loops (Fig. 2b) became progressively less pronounced as sequencing depth decreased, disappearing entirely below ~500 million intra-chromosomal contacts (Supplementary Fig. 16). This trend may be partially attributed to changes in the regulatory composition of the detected loops: as depth declines, the proportion of anchors overlapping CTCF increases (Fig. 4c), while a subset of short-range, non-CTCF-associated loops—typically weaker in contact enrichment—becomes increasingly difficult to detect without deep coverage.

Raichu’s greater power in detecting chromatin loops enables it to achieve loop counts at lower sequencing depths that are comparable to those of ICE at higher depths (Fig. 4a). For example, while ICE detected 10,589 loops at ~900 million usable reads, Raichu identified a similar number—10,900 loops—with only ~350 million reads. Of these, 59.6% (6307 out of 10,589) of ICE-detected loops overlapped with 57.9% (6307 out of 10,900) of Raichu-detected loops. Importantly, validation rates based on orthogonal ChIA-PET and HiChIP datasets were comparable between ICE-specific and Raichu-specific loops (74.3 vs. 69.6%). However, among the validated loops, 82.2% of Raichu-specific loops overlapped transcription-related interactions identified by Pol2 ChIA-PET, H3K27ac HiChIP, or H3K4me3 HiChIP, compared to only 53.0% for ICE-specific loops (Fig. 4d, e). Similar trends were observed across additional sequencing depths (Supplementary Fig. 17).

Together, these findings demonstrate that Raichu consistently detects thousands of additional transcription-related chromatin loops, even under reduced sequencing coverage.

Raichu detects unique differential loops involved in transcriptional regulation

During cell development or drug treatment, chromatin looping structures can undergo dramatic changes that lead to gene activation or repression^34,35,36,37. Accurately detecting these changes between conditions is critical for understanding the molecular mechanisms of gene regulation underlying specific cell states.

To demonstrate the power of Raichu in detecting such changes, we analyzed a cellular system described in a previous study³⁸. Briefly, through targeted sequencing of 5008 patients, that study identified a germline regulatory variant in the GATA3 enhancer associated with Philadelphia chromosome-like acute lymphoblastic leukemia. To investigate its role in 3D genome organization and gene regulation, GM12878 cells were genetically engineered by replacing the wild-type C/C allele with the risk A/A allele. In wild-type cells, GATA3 is only moderately expressed; in engineered cells, however, enhancer activity is significantly increased, leading to elevated GATA3 expression. This, in turn, enhances GATA3 binding at thousands of enhancers, further driving transcription of downstream genes.

In the original study, ICE-normalized Hi-C maps revealed only a few instances in which differential chromatin loops were associated with gene regulation. Here, we tested whether the improved sensitivity of Raichu in detecting transcription-related loops could enhance the detection of meaningful chromatin looping changes between wild-type (C/C) and engineered (A/A) cells.

Raichu indeed revealed clearer differences at specific loci. At the SUPT16H gene locus, ICE-normalized data failed to show any meaningful differences in chromatin looping between the two conditions (Fig. 5a, left). In contrast, Raichu identified an additional loop unique to the engineered cells, linking SUPT16H—which encodes SPT16, a subunit of the FACT complex involved in chromatin remodeling, transcriptional regulation, and genomic stability—to a downstream enhancer with specific GATA3 binding in the engineered cells (Fig. 5a, right). This example, together with others (Supplementary Fig. 18), highlights the ability of Raichu to uncover previously unrecognized regulatory mechanisms involved in cell development and disease.

**Fig. 5: Raichu detects unique differential loops associated with transcriptional regulation.**

Using Hi-C data normalized by both ICE and Raichu, we identified 4504 loops specific to wild-type (C/C) cells. Of these, 937 were unique to Raichu, while only 245 were unique to ICE. Similarly, we found 964 loops specific to engineered cells, with 249 unique to Raichu and only 65 unique to ICE (Fig. 5b–d). APA plots confirmed the expected pattern: C/C-specific loops—whether identified by Raichu or ICE—were enriched in wild-type cells but depleted in engineered cells, and vice versa for A/A-specific loops (Fig. 5d; Methods). While the size distributions of differential loops were similar across methods, Raichu-specific loops tended to span shorter genomic distances than ICE-specific loops (average: 203 kb vs. 298 kb for C/C-specific loops; 202 kb vs. 306 kb for A/A-specific loops), consistent with Raichu’s improved sensitivity for short-range interactions (Fig. 5c).

To assess the functional relevance of these differential loops, we categorized them into P–E, E–E, and P–P types. Strikingly, Raichu-specific loops contained 5.75 to 16.5 times more transcription-related interactions than ICE-specific loops (Fig. 5e). Moreover, Raichu-specific differential loops were significantly enriched for H3K27ac and GATA3 binding signals at their anchors—a pattern absent from ICE-specific loops (Fig. 5f). Finally, only Raichu-unique A/A-specific loops were associated with significant upregulation of gene expression in engineered cells (Fig. 5g).

Collectively, these results demonstrate that Raichu improves the detection of biologically meaningful differential loops between conditions, revealing interactions closely associated with transcriptional regulation and cell identity that are frequently missed by ICE.

Raichu detects conserved enhancer–promoter loops across species

The unique power of Raichu in detecting transcription-related loops prompted us to investigate whether it can also uncover conserved regulatory interactions across species. To this end, we focused on neural progenitor cells (NPCs), for which high-quality Hi-C data are available in both mouse³⁰ and human³⁹.

In mouse NPCs (mNPCs), Raichu identified 32,639 loops—more than twice as many as ICE (14,671) (Fig. 6a). The vast majority (85.7%) of ICE-detected loops were also recovered by Raichu, while Raichu detected 20,068 additional loops not identified by ICE. As observed in previous cell types, Raichu-specific loops in mNPCs exhibited a distinct bimodal size distribution and were more frequently anchored at regions marked by active histone modifications (Fig. 6b, c). Similar patterns were observed in human NPCs (hNPCs), where Raichu identified 20,930 unique loops, compared to only 2009 uniquely detected by ICE (Fig. 6d–f). Comparable results were obtained when comparing Raichu and KR (Supplementary Fig. 15b), underscoring Raichu’s robustness across species and its sensitivity to regulatory interactions that may be missed by conventional normalization methods.

**Fig. 6: Raichu detects conserved enhancer–promoter loops across species.**

To assess evolutionary conservation, we compared loop calls between mouse and human NPCs (Methods). Raichu recovered 3089 conserved loops across species—more than fivefold the number detected by ICE (540). Notably, many of the conserved loops uniquely identified by Raichu connected distal enhancers to key neurodevelopmental genes, including SETD1A and SOX4 (Fig. 6g, h), underscoring their potential functional relevance.

These findings demonstrate that Raichu not only improves the detection of transcription-related loops within individual species but also enhances the discovery of conserved enhancer–promoter interactions across evolutionary distances.

Raichu improves the detection of transcription-related loops in Micro-C and region capture Micro-C

In addition to Hi-C, a number of alternative methods have been developed to study 3D genome organization. To assess whether Raichu can be applied beyond Hi-C, we evaluated its performance on two related platforms: Micro-C⁴⁰ and region capture Micro-C (RCMC)⁴¹. Micro-C uses micrococcal nuclease digestion to achieve nucleosome-level resolution of chromatin contacts, while RCMC further enriches for specific genomic regions of interest, enabling ultra-high-resolution contact maps in targeted loci.

We first applied Raichu to a Micro-C dataset generated in H1 embryonic stem cells (H1ESCs)⁴⁰. Visual inspection revealed improved loop detection with Raichu in Micro-C. For example, in a representative genomic region near the NANOG gene—an essential regulator of ESC pluripotency—Raichu identified a loop at 2-kb resolution that was missed by ICE, connecting NANOG to an upstream enhancer cluster (Fig. 7a). Genome-wide, Raichu detected 99,256 loops, more than twice as many as ICE (45,030) (Fig. 7b). While 96.5% of ICE-detected loops were also recovered by Raichu, 56.2% of Raichu-detected loops were unique to Raichu. Moreover, Raichu-specific loops in Micro-C tended to span shorter genomic distances than ICE-specific or shared loops (Fig. 7c). Consistent with previous Hi-C analyses, Raichu-specific loops were more enriched for transcription-related interactions, with higher proportions of E–E, E–P, and P–P contacts (Fig. 7d), and exhibited stronger signals for active histone marks at their anchors (Fig. 7e). Comparable results were also observed in comparisons with KR (Supplementary Fig. 15c).

**Fig. 7: Application of Raichu to Micro-C and Region Capture Micro-C datasets.**

To further evaluate the performance of Raichu in ultra-high-resolution contexts, we applied it to an RCMC dataset generated in mouse embryonic stem cells (mESCs)⁴¹. For this dataset, both ICE and Raichu normalization were performed strictly within the targeted regions of interest to avoid potential artifacts. In a representative region encompassing the Sox2 locus, Raichu revealed numerous additional loops at 500-bp resolution—both between distal enhancers and Sox2, and among enhancer elements themselves—that were not detected by ICE (Fig. 7f). These results demonstrate that Raichu remains effective even at sub-kilobase resolution and enhances the detection of regulatory loops in targeted, high-resolution datasets.

Raichu enhances the detection of chromatin loops in single-cell Hi-C data

We next applied Raichu to single-cell Hi-C data, which typically suffer from limited contact coverage, making it challenging to reliably detect chromatin loops. To evaluate whether Raichu improves loop detection in this context, we analyzed a single-cell Hi-C dataset generated in GM12878 cells, comprising 221 single cells with a median of approximately 1.08 million contacts per cell⁴².

To compare the performance of Raichu and ICE, we ranked all 221 cells by sequencing depth and constructed a series of pseudo-bulk datasets by pooling up to 130 cells with the highest contact counts (Supplementary Fig. 19a). For each pooled dataset, loops were identified using HiCCUPS on contact maps normalized with either ICE or Raichu, and loop calls were merged across 5-kb, 10-kb, and 25-kb resolutions (Methods). As shown in Fig. 8a, Raichu consistently detected 1.24 to 1.75 times more loops than ICE. Even with as few as five cells (~10 million contacts), Raichu identified 1697 loops, whereas ICE detected only 970 on the same dataset. Using a reference set of interactions compiled from orthogonal 3D genome mapping datasets (Supplementary Data 1 and 6)—including Hi-C, CTCF ChIA-PET, RNAPII ChIA-PET, CTCF HiChIP, H3K27ac HiChIP, SMC1A HiChIP, H3K4me3 PLAC-Seq, TrAC-Loop, and HiCAR—we evaluated the precision and recall of Raichu and ICE across all pseudo-bulk datasets (Supplementary Fig. 19b, c). Notably, Raichu achieved higher F1 scores than ICE in all tested cases (Fig. 8b).

**Fig. 8: Raichu enhances the detection of chromatin loops in single-cell Hi-C data.**

Across pooled maps, the number of Raichu-specific loops was 2.45 to 4.59 times greater than that of ICE-specific loops (Fig. 8c). More importantly, loop anchors identified by Raichu showed a higher overlap with CTCF and H3K27ac binding compared to those identified by ICE, indicating that Raichu more effectively detects both CTCF-associated and transcription-related loops from single-cell Hi-C data (Fig. 8d, e). Results were consistent when comparing Raichu and KR (Supplementary Fig. 15d), further supporting Raichu’s robustness in single-cell Hi-C analysis.

Discussion

Recent years have underscored the critical role of 3D genome organization in transcriptional regulation, yet existing normalization methods often fail to preserve transcription-related chromatin loops in Hi-C data. Here, we present Raichu, a computational method for normalizing chromatin contact data. Unlike matrix balancing approaches that assume uniform visibility across all loci, Raichu employs an efficient optimization framework to infer a bias vector that best captures locus-specific variability. This design avoids the over-correction artifacts inherent to matrix balancing and substantially improves the detection of transcription-related loops. We show that Raichu is robust across a wide range of sequencing depths, biological systems, and species, and is compatible with multiple 3D genome mapping platforms, including Hi-C, Micro-C, region capture Micro-C (RCMC), and single-cell Hi-C. Notably, Raichu improves loop detection even at sub-kilobase resolution, enabling the identification of fine-scale regulatory interactions that are missed by other methods.

In this study, we compared Raichu with two widely used implementations of matrix balancing, ICE and KR. While other methods, such as HiCNorm¹¹, OneD¹², and HiCorr¹⁴, normalize Hi-C data by explicitly modeling known genomic biases, we did not include them in our comparison because they require external reference data, which are typically available only for a limited number of genomes (Supplementary Table 1). Built on the cooler Python package¹⁶, Raichu does not require any external input and stores the calculated bias vector in the same format as ICE, ensuring seamless compatibility with downstream analyses of compartments, TADs, loops, and other features. Raichu is also computationally efficient (Supplementary Tables 2 and 3): on a Hi-C dataset with 4 billion usable reads at 5-kb resolution, it completed the calculation in ~4 h using 4 CPU cores and required only 17 GB of memory. Together, these features make Raichu a practical and scalable replacement for ICE and KR.

Recent advances in single-cell Hi-C and related techniques have significantly enhanced our ability to study chromatin organization at the resolution of individual cells. These innovations now enable the simultaneous profiling of chromatin contacts alongside additional layers of epigenomic and transcriptomic information, such as DNA methylation^37,43,44, RNA expression^42,45,46, and chromatin accessibility (ATAC)⁴⁷. Such integrated approaches allow researchers to identify cell groups or types and investigate cell-type-specific chromatin contacts and their regulatory roles. Typically, the number of cells in each group ranges from a few dozen to a few hundred. Using a recently published single-cell Hi-C dataset in GM12878 cells, we showed that Raichu is effective on pseudo-bulk contact maps derived from as few as five cells—corresponding to ~10 million chromatin contacts. Even under such sparse conditions, Raichu outperformed existing normalization methods in detecting both CTCF-related and transcription-related loops (Fig. 8 and Supplementary Fig. 15d). This result highlights Raichu’s robustness and utility in handling the sparsity typical of single-cell Hi-C data. Future applications of Raichu to pseudo-bulk maps derived from single-cell multi-omics data could provide insights into the relationship between gene regulation and cell-type-specific 3D genome architecture.

The close association between 3D genome organization and transcriptional regulation has long been recognized, with disruptions in chromatin organization implicated in developmental diseases and cancer. However, the dynamic behavior of chromatin loops underlying transcriptional regulation remains incompletely understood, particularly under conditions involving minimal changes in cell state, such as drug treatments or heat shock. In these settings, studies using ICE often report negligible loop changes, reinforcing the notion that most chromatin loops are pre-established^48,49. Our results suggest that this conclusion may be due, in part, to ICE’s tendency to attenuate transcription-related signals. By applying Raichu to an engineered GM12878 cell system, we identified differential loops closely associated with gene transcription that ICE failed to detect. These findings highlight Raichu’s ability to reveal previously unrecognized regulatory changes in chromatin organization.

Looking forward, Raichu may help redefine how 3D genomic changes are studied, providing a more accurate perspective on their roles in transcriptional regulation and cellular function. This capability is particularly promising for elucidating subtle chromatin dynamics during development, disease progression, and therapeutic interventions. By addressing key limitations of existing normalization methods, Raichu opens new avenues for exploring chromatin architecture and its impact on gene expression.

Methods

Workflow of Raichu

The core rationale underlying Raichu is that each interaction in a chromatin contact matrix can be decomposed into three components: (1) a global background term (E), which accounts for the distance-dependent decay of contact frequencies (i.e., two loci that are closer together in the linear genome tend to interact more frequently); (2) locus-specific biases (B), which reflect systematic variations in interaction frequency attributable to individual loci (e.g., due to GC content, mappability, or restriction fragment density); and (3) a residual component specific to the interaction itself. Raichu aims to estimate a genome-wide bias vector that, in conjunction with the distance-decay function, best explains the observed raw contact map. Once this bias vector is obtained, contact counts are normalized in a manner similar to ICE and other implicit normalization methods—by dividing each observed contact by the product of the bias values for the two interacting loci.

Given an intra-chromosomal contact matrix M, the global background is calculated as:

$${E}_{d}=\frac{1}{{n}_{d}}{\sum }_{\left|j-i\right|=d}{M}_{{ij}}$$

(1)

where ${n}_{d}$ represents the number of bin pairs separated by distance $d$, and ${M}_{{ij}}$ represents the observed interaction frequency between bin $i$ and bin $j$.

Assuming locus-specific biases are multiplicative, each contact ${M}_{{ij}}$ is modeled as:

$${M}_{{ij}}\approx {B}_{i}{B}_{j}{E}_{{|\,j}-{i|}}$$

(2)

The objective is to estimate B such that the product ${B}_{i}{B}_{j}{E}_{{|\,j}-{i|}}$ approximates the observed contact count ${M}_{{ij}}$. This is formulated as the following optimization problem:

$${{{\rm{minimize}}}}_{{{\bf{B}}}}{\sum }_{i,j}{({M}_{{ij}}-{B}_{i}{B}_{j}{E}_{{|\,j}-{i|}})}^{2}$$

$${{\rm{subject\; to}}}:l < {{\bf{B}}} < u$$

(3)

After obtaining the bias vector B, the normalized contact frequencies ${M}_{{ij}}^{{\prime} }$ are calculated as ${M}_{{ij}}^{{\prime} }=\frac{{M}_{{ij}}}{{B}_{i}*{B}_{j}}$.

The bias vector B is initialized as the square root of the normalized one-dimensional coverage of the contact matrix:

$${B}_{i}^{(0)}=\sqrt{\frac{{S}_{i}}{\bar{{S}_{i}}}}$$

(4)

where ${S}_{i}={\sum }_{j}{M}_{{ij}}$ represents the coverage of bin $i$, and $\bar{{S}_{i}}$ is the mean coverage across all loci.

Raichu employs the dual annealing optimization algorithm, with the L-BFGS-B algorithm as the local minimizer. To manage the computational demands of evaluating the objective function across millions or billions of data points, Raichu incorporates two performance-optimization strategies: (1) A sliding window $[{s}_{i},{s}_{i}+w]$ of size $w$ is applied to each chromosome, with a 10% overlap between consecutive windows (${s}_{i+1}={s}_{i}+0.9w$). Bias vectors are computed independently for each window, and overlapping regions are averaged to produce a single bias vector for the chromosome. (2) The Numba library is used to enhance computational efficiency when evaluating the objective function.

Raichu provides several tunable parameters to enhance flexibility: (1) sliding window size ($w$), set to 200 bins by default; (2) the maximum number of global search iterations for dual annealing ($m$), set to 100; and (3) bias vector search bounds ($l$, $u$), set to 0.001 and 1000, respectively. These default values were chosen empirically to balance computational efficiency and normalization quality. A systematic evaluation revealed that Raichu is robust to variation in these key parameters, producing normalized contact maps that are highly similar and loop detection results that are highly stable across a broad range of settings (Supplementary Fig. 20).

Hi-C and ChIP-seq data processing

For human datasets, the hg38 reference genome was used. For mouse datasets, the mm10 reference genome was used, except for the RCMC datasets, which used mm39. For drosophila datasets, the dm3 reference genome was used.

For most datasets—including Hi-C, Micro-C, RCMC, GAM, DNA SPRITE, DNA seqFISH+, ChIP-Seq, and RNA-Seq—the processed data were downloaded and used directly in our analysis (Supplementary Data 1). In the following cases, we processed the data either from raw reads or partially processed formats:

Hi-C (GM12878, wild-type and engineered): Raw Hi-C reads were processed into.mcool format using the runHiC Python package (v0.9.0; https://pypi.org/project/runHiC/).
Drosophila Kc167 Hi-C: Data were downloaded as a.hic file and converted to.mcool file format using hic2cool (v0.8.3; https://github.com/4dn-dcic/hic2cool).
H3K27ac and GATA3 ChIP-Seq (GM12878, wild-type and engineered): Genomic coordinates in the original peak files were converted from hg19 to hg38 using HiCLift⁵⁰ (v1.0; conversion script available at https://github.com/XiaoTaoWang/Raichu/tree/main/benchmark-analysis).
ChIP-Seq and RNA-Seq (mESC): BigWig files downloaded from ENCODE (Supplementary Data 1) were mapped to mm10. To visualize them alongside RCMC, coordinates were converted to mm39 using CrossMap (v0.6.5; https://github.com/liguowang/CrossMap).
ChIP-Seq and RNA-Seq (mNPC): Raw sequencing reads were mapped to mm10 and processed using the ENCODE ChIP-Seq pipeline (v2.2.2; https://github.com/ENCODE-DCC/chip-seq-pipeline2) and RNA-Seq pipeline (v1.2.4; https://github.com/ENCODE-DCC/rna-seq-pipeline), respectively.

Reproducibility analysis of contact maps between biological replicates

A Python implementation of the original HiCRep algorithm (v0.1.0; https://github.com/cmdoret/hicreppy) was used to assess the reproducibility of contact maps normalized with different methods. Specifically, for each pair of contact maps from biological replicates on a given chromosome, the stratum-adjusted correlation coefficient (SCC) was calculated using HiCRep, with higher SCC values indicating greater reproducibility. Genome-wide reproducibility was then summarized by computing a weighted average of SCC values across all chromosomes, using chromosome length as the weight. At 10-kb resolution, chromatin contacts up to 1 Mb apart were considered, while at 50-kb resolution, contacts up to 5 Mb apart were included. The smoothing parameter $h$ was set to 3 for both resolutions.

Calculation of visibility and distance-decay curves from contact maps

The sum of interactions across genomic loci (i.e., visibility) after ICE, KR, and Raichu normalization was calculated using cooltools (v0.7.1; https://github.com/open2c/cooltools), via the command “cooltools coverage”. The “--clr_weight_name” parameter was used to specify the bias vector from the input.cool file, while all other parameters were set to their default values. Distance-decay curves—i.e., average contact frequencies as a function of genomic distance—were computed for raw, ICE-, KR-, and Raichu-normalized contact maps using the command “cooltools expected-cis –smooth --aggregate-smoothed”. The “--clr_weight_name” parameter was again used to select the appropriate bias vector. To calculate distance-decay curves for specific compartments or domain types, genomic regions were provided in BED format (https://genome.ucsc.edu/FAQ/FAQformat.html) using the “--view” parameter.

Calculation of the insulation score and compartment analysis

Both insulation scores and chromatin compartments were calculated using cooltools. For insulation scores, the “cooltools insulation” command was applied with window sizes of 100 kb, 250 kb, and 500 kb at 10-kb, 25-kb, and 50-kb resolutions, respectively. TAD boundaries were defined as loci marked as “True” in the “is_boundary” column with a prominence score (“boundary_strength” column) greater than 0.2. For compartment analysis, the “cooltools eigs-cis” command was applied, using a BedGraph (https://genome.ucsc.edu/FAQ/FAQformat.html) file containing average H3K27ac ChIP-Seq signal per bin as the phasing track. Eigenvalue decomposition was performed on the contact matrix, and the first eigenvector (PC1) was extracted to capture the plaid pattern of chromatin compartments. The sign of PC1 was oriented using the H3K27ac track, such that positive values correspond to A compartments and negative values correspond to B compartments. Saddle plots were generated using the command “cooltools saddle -t cis --qrange 0.01 0.99 --strength”.

Calculation of average contact signals between ChIP-seq peaks of varying binding strength

For this analysis, the following procedures were followed:

1.
TADs were identified using HiTAD⁵¹ (v0.4.5-r1; https://xiaotaowang.github.io/TADLib/hitad.html) on GM12878 Hi-C contact maps at 25 kb resolution. In the HiTAD output, only regions marked with “0” in the last column were classified as TADs, while all other regions were treated as sub-TADs.
2.
ChIP-Seq peaks for CTCF, RAD21, POLR2A, and H3K27ac in GM12878 cells were downloaded from ENCODE (see Supplementary Data 1 for accession codes). Peaks for each factor were sorted and classified into eight groups based on binding strength, using the value in the 7th column of the downloaded peak file.
3.
Average contact signals were calculated at 5-kb resolution between each pair of peak groups. For bins containing multiple peaks, only the peak with the highest binding strength was considered. Only intra-TAD contacts were included, and contact signals were normalized by dividing the observed signal by the expected signal at the corresponding genomic distance (observed/expected).

Loop detection in Hi-C data using HiCCUPS and Mustache

To benchmark the performance of Raichu in loop detection, we tested two loop-calling tools on the GM12878 Hi-C dataset: (1) A Python implementation of the original HiCCUPS algorithm³ (v0.3.9; https://github.com/XiaoTaoWang/HiCPeaks), and (2) Mustache²⁹ (https://github.com/ay-lab/mustache). Our goal was to demonstrate that Raichu consistently improves chromatin loop detection compared to existing normalization methods, regardless of the loop detection software used.

For HiCCUPS, we applied the following parameters “--pw 1 2 4 --ww 3 5 7 --only-anchors --maxapart 4000000”. For Mustache, the following parameter was used “-pt 0.05”. Both tools were run on contact matrices at 5-kb and 10-kb resolutions separately. To generate a non-redundant list of detected loops for each tool, results from both resolutions were merged using the “combine-resolutions” command from the HiCPeaks Python package (v0.3.9; https://github.com/XiaoTaoWang/HiCPeaks) with the following parameters “-R 5000 10000 -G 10000 -M 100000 --max-res 10000”.

For other datasets, loops were detected using HiCCUPS only.

For Hi-C, the same HiCCUPS parameters as used for GM12878 were applied.

For Micro-C, we applied the following parameters to identify loops at 2-kb, 5-kb, and 10-kb resolutions “--pw 2 3 4 --ww 5 6 7 --maxww 7 --only-anchors --min-local-reads 100”. Loops detected from the three resolutions were merged using “combine-resolutions -R 2000 5000 10000 -G 10000 -M 100000 --max-res 10000”. We restricted the analysis to interactions within: 1 Mb at 2-kb resolution (“--maxapart 1000000”), 2 Mb at 5-kb resolution, and 4 Mb at 10-kb resolution.

For RCMC, we applied the following parameters to identify loops at 250-bp, 500-bp, and 1-kb resolutions “-C 5 8 18 3 6 --pw 2 3 4 --ww 5 6 7 --maxww 7 --only-anchors --min-local-reads 100”. Results were merged using “combine-resolutions -R 250 500 1000 -G 500 -M 5000 --max-res 500”. We restricted the analysis to interactions within: 100 kb at 250-bp resolution, 200 kb at 500-bp resolution, and 400 kb at 1-kb resolution.

Calculation of the overlap between two loop sets

For each loop coordinate ($i$, $j$) in loop set A, if there exists a loop ($i\hbox{'},$ $j\hbox{'}$) in loop set B such that the Euclidean distance between ($i$, $j$) and ($i\hbox{'},$ $j\hbox{'}$) is less than $\min (0.2\times {|i}-{j|},\,50{kb})$, we define ($i$, $j$) as preserved in B. Using this definition, we evaluated whether loops detected by Hi-C could be supported by orthogonal ChIA-PET/HiChIP interactions (Supplementary Data 1 and 5).

When generating the Venn diagram of loop sets obtained from different normalization methods, we applied an additional matching criterion: since a loop in one set might match multiple loops in another set based on the distance threshold, we ensured each loop matched only once by searching for the closest loop coordinate in the other set. This approach guaranteed a unique match for each loop meeting the distance criterion. The custom matching script is available at https://github.com/XiaoTaoWang/Raichu/tree/main/benchmark-analysis.

Definition of promoter and enhancer regions

Promoter and enhancer regions were defined based on ChromHMM chromatin state annotations downloaded from the UCSC Genome Browser (https://genome.ucsc.edu/cgi-bin/hgTables).

For GM12878 and H1ESC, the annotation file was based on a 15-state ChromHMM model⁵². Regions annotated as “1_Active_Promoter”, “2_Weak_Promoter”, or “3_Poised_Promoter” were classified as promoters, while regions annotated as “4_Strong_Enhancer”, “5_Strong_Enhancer”, “6_Weak_Enhancer”, or “7_Weak_Enhancer” were classified as enhancers. Since the original annotations were in hg19 coordinates, they were converted to hg38 using HiCLift⁵⁰ (v1.0; conversion script available at https://github.com/XiaoTaoWang/Raichu/tree/main/benchmark-analysis).

For mESC, the annotation file was based on a 18-state ChromHMM model⁵³. Regions annotated as “Tss” were defined as promoters, and regions annotated as “Enh” were defined as enhancers.

Enrichment analysis of TFs and histone modifications

To explore the association of transcription factors (TFs) and histone modifications with different loop categories, we calculated both the fraction of loop anchors overlapping ChIP-Seq peaks and a fold-enrichment score.

Briefly, a non-redundant list of loop anchors was compiled for each loop category. For each TF or histone modification, we iterated through this list and counted the number of anchors that overlapped at least one ChIP-Seq peak. To calculate the fold-enrichment score, we generated 100 random control sets by shuffling the loops and repeated the overlap calculation for each control set. Each random loop set preserved the original genomic distance distribution between loop anchors and the number of loops per chromosome, while ensuring that no loop intervals overlapped known genomic gaps. The fold-enrichment score was computed as the ratio of the observed number of overlaps to the mean number of overlaps across the control sets.

For selected TFs and histone modifications, we also generated ChIP-Seq binding profiles around loop anchors. Specifically, the ±50 kb region surrounding each anchor was divided into 50 bins of 2 kb each. The number of ChIP-Seq peaks falling within each bin was counted, producing a 50-element array for each anchor. Each array was then normalized by dividing all values by its mean. Finally, the average binding profile for each loop category was computed by averaging the normalized values across all loop anchors at each bin position.

When computing the binding profiles for GATA3 and H3K27ac ChIP-Seq peaks around C/C-specific and A/A-specific loops unique to ICE or Raichu (Fig. 5f), we skipped the normalization step (i.e., dividing each anchor’s profile by its mean), and instead calculated the average profile directly from the raw peak overlap counts. This approach facilitated comparisons both between loop categories and between biological conditions (C/C vs. A/A).

Down-sampling of Hi-C data

To perform down-sampling at a rate of $\alpha$ (0 <$\alpha$ < 1), we used a binomial probability-based approach that does not require re-mapping of the raw reads. Specifically, for each non-zero pixel in the full contact matrix at 5-kb resolution, the contact frequency was assigned as a random integer drawn from a binomial distribution with parameters ${M}_{{ij}}$ and $\alpha$, where ${M}_{{ij}}$ represents the contact count in the full Hi-C matrix between bin $i$ and bin $j$. As both 5-kb and 10-kb contact matrices are required for loop detection (as described above), the “cooler coarsen” command was applied to the downsampled 5-kb contact matrices to generate the corresponding 10-kb contact matrices.

Detection of differential loops

To detect differential loops between wild-type (C/C) and engineered (A/A) GM12878 cells, we applied the following approach to minimize instances where a loop is enriched in a contact map but not detected due to stringent p value or fold-enrichment cutoffs (script available on GitHub at https://github.com/XiaoTaoWang/Raichu/tree/main/benchmark-analysis).

1.
For each of the four contact maps (ICE-normalized and Raichu-normalized for wild-type and engineered GM12878 cells), we ran HiCCUPS using two parameter settings. For both settings, loops detected at 5-kb and 10-kb resolutions were merged using the “combine-resolutions” command, as described in the “Loop detection in Hi-C data using HiCCUPS and Mustache” section. The first setting used the default p-value and fold-enrichment cutoffs with parameters “--pw 1 2 4 --ww 3 5 7 --only-anchors --maxapart 4000000”, which resulted in 8803, 5467, 11,114, and 6562 loops for ICE-normalized maps in wild-type cells (D_{ICE, CC}), ICE-normalized maps in engineered cells (D_{ICE, AA}), Raichu-normalized maps in wild-type cells (D_{Raichu, CC}), and Raichu-normalized maps in engineered cells (D_{Raichu, AA}), respectively. The second setting used a more relaxed significance threshold with parameters “--pw 1 2 4 --ww 3 5 7 --only-anchors --maxapart 4000000 --siglevel 0.1 --sumq 0.05 --double-fold 1.5 --single-fold 1.75”, which resulted in 13,433, 8688, 17,481, and 10,697 loops for ICE-normalized maps in wild-type cells (L_{ICE, CC}), ICE-normalized maps in engineered cells (L_{ICE, AA}), Raichu-normalized maps in wild-type cells (L_{Raichu, CC}), and Raichu-normalized maps in engineered cells (L_{Raichu, AA}), respectively.
2.
The loop sets D_{ICE, CC}, D_{ICE, AA}, D_{Raichu, CC}, and D_{Raichu, AA} were defined as the final loops detected for the corresponding contact maps, while the loop sets L_{ICE, CC}, L_{ICE, AA}, L_{Raichu, CC}, and L_{Raichu, AA} were used to determine whether a loop from one set was potentially detectable in another contact map. For example, to check whether a loop ($i$, $j$) from D_{ICE, CC} was detectable in the Raichu-normalized contact maps for wild-type GM12878 cells, we searched for a loop ($i\hbox{'}$, $j\hbox{'}$) in L_Raichu, _CC such that ${|i}-i\hbox{'}| < \min (0.2\times {|i}-{j|},\,50{kb})$ and ${|j}-j\hbox{'}{|} < \,\min (0.2\times {|i}-{j|},\,50{kb})$.

Identification of chromatin loops conserved between mouse and human

For each loop ($i$, $j$) identified from mNPC Hi-C data, we extracted the genomic coordinates of its two anchors and used HiCLift (v1.0) to convert the coordinates from the mouse genome (mm10) to the human genome (hg38). A loop was considered conserved if it met the following criteria (script available on GitHub at https://github.com/XiaoTaoWang/Raichu/tree/main/benchmark-analysis):

1.
Both anchors were successfully mapped to the same chromosome in the target genome (hg38).
2.
The genomic distance between the two anchors remained relatively stable after conversion. Specifically, if the converted coordinates were ($i\hbox{'}$, $j\hbox{'}$), the following condition had to be satisfied $0.8\times {|i}-{j|} < {|i}\hbox{'}-j\hbox{'}| < 1.25\times {|i}-{j|}$.
3.
A matching loop existed in hNPC Hi-C data. That is, there was a loop ($x$, $y$) in hNPC such that the Euclidean distance between ($x$, $y$) and ($i\hbox{'}$, $j\hbox{'}$) was less than $\min (0.2\times {|i}\hbox{'}-j\hbox{'}|,\,50{kb})$.

Single-cell Hi-C data analysis

The GM12878 single-cell Hi-C dataset in.pairs format was downloaded from GEO (accession code: GSE240128). Contact pairs were processed using HiCLift (v1.0; https://github.com/XiaoTaoWang/HiCLift) to generate an.mcool file for each single cell, with the parameters “--output-format cool --in-assembly hg38 --out-assembly hg38”. Pseudo-bulk Hi-C maps were generated using the “cooler merge” command.

Loops were identified using pyHICCUPS (v0.3.9; https://github.com/XiaoTaoWang/HiCPeaks) at 5-kb, 10-kb, and 25-kb resolutions individually, with the parameters ““--pw 1 2 4 --ww 3 5 7 --only-anchors --maxapart 4000000”. Detected loops were then combined across resolutions using the “combine-resolutions” command from the HiCPeaks Python package (v0.3.9; https://github.com/XiaoTaoWang/HiCPeaks), with the parameters “-G 10000 -M 100000 --max-res 25000”.

To evaluate the precision of the detected loops, the union set of interactions compiled from orthogonal 3D genomic datasets (including Hi-C, CTCF ChIA-PET, RNAPII ChIA-PET, CTCF HiChIP, H3K27ac HiChIP, SMC1A HiChIP, H3K4me3 PLAC-Seq, TrAC-Loop, and HiCAR) was used as the reference (Supplementary Data 1 and 6). For recall evaluation, a high-confidence interaction set was defined as the reference, where each interaction was detectable in at least three of the nine orthogonal datasets. F1 scores were calculated using the following equation:

$${F}_{1}=\frac{2\cdot ({{\rm{precision}}}\cdot {{\rm{recall}}})}{{{\rm{precision}}}+{{\rm{recall}}}}$$

(5)

Aggregate peak analysis

To evaluate the overall enrichment of chromatin loop signals in contact maps, we performed Aggregate Peak Analysis (APA). For a given list of chromatin loops or interactions, contact frequencies were extracted from $w\times w$ submatrices centered at the two-dimensional coordinates of each loop. Each submatrix was normalized by dividing all values by its mean. To reduce the influence of ourliers, submatrices with average signal intensities above the 99th percentile or below the 1st percentile were excluded. The average signal at each position was then computed across all loops and visualized as a heatmap. In Fig. 2h, APA was performed on contact matrices at 5-kb resolution with a window size ($w$) set to 15. In Fig. 5d, APA was performed on 10-kb resolution contact maps with a window size of 11. In Fig. 3e, APA was applied to GAM matrices at 30-kb resolution and DNA SPRITE matrices at 10-kb resolution, both with a window size of 11.

The APA z-score was calcualted by comparing the center pixel value to the average signal in the lower-left $3\times 3$ corner of the plot.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

This study did not generate new primary datasets. All publicly available datasets analyzed are summarized in Supplementary Data 1. Processed data generated in this study, including chromatin loop coordinates, enhancer and promoter annotations, and bias vectors calculated by different normalization methods, are provided as Supplementary Data 2–10. Source data are provided with this paper.

Code availability

The Raichu source code is publicly available and has been deposited in GitHub at https://github.com/XiaoTaoWang/Raichu/ under the GNU General Public License v3.0. The specific version of the code and analysis scripts associated with this publication are archived on Zenodo and are accessible via https://doi.org/10.5281/zenodo.18082586⁵⁴. No third-party code is reused.

References

Misteli, T. The self-organizing genome: principles of genome architecture and function. Cell 183, 28–45 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Rowley, M. J. & Corces, V. G. Organizational principles of 3D genome architecture. Nat. Rev. Genet. 19, 789–800 (2018).
Article CAS PubMed PubMed Central Google Scholar
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chepelev, I., Wei, G., Wangsa, D., Tang, Q. & Zhao, K. Characterization of genome-wide enhancer-promoter interactions reveals co-expression of interacting genes and modes of higher order chromatin organization. Cell Res. 22, 490–503 (2012).
Article CAS PubMed PubMed Central Google Scholar
Thiecke, M. J. et al. Cohesin-dependent and -independent mechanisms mediate chromosomal contacts between promoters and enhancers. Cell Rep. 32, 107929 (2020).
Article CAS PubMed PubMed Central Google Scholar
Dubois, F., Sidiropoulos, N., Weischenfeldt, J. & Beroukhim, R. Structural variations in cancer and the 3D genome. Nat. Rev. Cancer 22, 533–546 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. & Yue, F. Hijacked enhancer-promoter and silencer-promoter loops in cancer. Curr. Opin. Genet. Dev. 86, 102199 (2024).
Article CAS PubMed Google Scholar
Jerkovic, I. & Cavalli, G. Understanding 3D genome organization by multidisciplinary methods. Nat. Rev. Mol. Cell Biol. 22, 511–528 (2021).
Article CAS PubMed Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43, 1059–1065 (2011).
Article CAS PubMed Google Scholar
Hu, M. et al. HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics 28, 3131–3133 (2012).
Article CAS PubMed PubMed Central Google Scholar
Vidal, E. et al. OneD: increasing reproducibility of Hi-C samples with abnormal karyotypes. Nucleic Acids Res. 46, e49 (2018).
Article PubMed PubMed Central Google Scholar
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lu, L. et al. Robust Hi-C maps of enhancer-promoter interactions reveal the function of non-coding genome in neural development and diseases. Mol. Cell 79, 521–534e515 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Dekker, J. et al. An integrated view of the structure and function of the human 4D nucleome. Nature 649, 759–776 (2025).
Abdennur, N. & Mirny, L. A. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics 36, 311–316 (2020).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Reiff, S. B. et al. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nat. Commun. 13, 2365 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Eagen, K. P., Aiden, E. L. & Kornberg, R. D. Polycomb-mediated chromatin loops revealed by a subkilobase-resolution chromatin interaction map. Proc. Natl. Acad. Sci. USA 114, 8764–8769 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Schwarzer, W. et al. Two independent modes of chromatin organization revealed by cohesin removal. Nature 551, 51–56 (2017).
Article ADS PubMed PubMed Central Google Scholar
Hsieh, T. S. et al. Enhancer-promoter interactions and transcription are largely maintained upon acute loss of CTCF, cohesin, WAPL or YY1. Nat. Genet. 54, 1919–1932 (2022).
Article CAS PubMed PubMed Central Google Scholar
Boettiger, A. N. et al. Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature 529, 418–422 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017).
Article CAS PubMed PubMed Central Google Scholar
Open2C, et al. Cooltools: enabling high-resolution Hi-C analysis in Python. PLoS Comput. Biol. 20, e1012067 (2024).
Tang, Z. et al. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163, 1611–1627 (2015).
Article CAS PubMed PubMed Central Google Scholar
Uyehara, C. M. & Apostolou, E. 3D enhancer-promoter interactions and multi-connected hubs: organizational principles and functional roles. Cell Rep. 42, 112068 (2023).
Article CAS PubMed PubMed Central Google Scholar
Roayaei Ardakany, A., Gezer, H. T., Lonardi, S. & Ay, F. Mustache: multi-scale detection of chromatin loops from Hi-C and Micro-C maps using scale-space representation. Genome Biol. 21, 256 (2020).
Article PubMed PubMed Central Google Scholar
Bonev, B. et al. Multiscale 3D genome rewiring during mouse neural development. Cell 171, 557–572e524 (2017).
Article CAS PubMed PubMed Central Google Scholar
Beagrie, R. A. et al. Complex multi-enhancer contacts captured by genome architecture mapping. Nature 543, 519–524 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Quinodoz, S. A. et al. Higher-order inter-chromosomal hubs shape 3D genome organization in the nucleus. Cell 174, 744–757 e724 (2018).
Article CAS PubMed PubMed Central Google Scholar
Takei, Y. et al. Integrated spatial genomics reveals global architecture of single nuclei. Nature 590, 344–350 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Grubert, F. et al. Landscape of cohesin-mediated chromatin loops in the human genome. Nature 583, 737–743 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Xu, J. et al. Subtype-specific 3D genome alteration in acute myeloid leukaemia. Nature 611, 387–398 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, R. et al. SARS-CoV-2 restructures host chromatin architecture. Nat. Microbiol. 8, 679–694 (2023).
Article CAS PubMed PubMed Central Google Scholar
Heffel, M. G. et al. Temporally distinct 3D multi-omic dynamics in the developing human brain. Nature 635, 481–489 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Yang, H. et al. Noncoding genetic variation in GATA3 increases acute lymphoblastic leukemia risk through local and global changes in chromatin conformation. Nat. Genet. 54, 170–179 (2022).
Article CAS PubMed PubMed Central Google Scholar
Keough, K. C. et al. Three-dimensional genome rewiring in loci with human accelerated regions. Science 380, eabm1696 (2023).
Article CAS PubMed PubMed Central Google Scholar
Krietenstein, N. et al. Ultrastructural details of mammalian chromosome architecture. Mol. Cell 78, 554–565e557 (2020).
Article CAS PubMed PubMed Central Google Scholar
Goel, V. Y., Huseyin, M. K. & Hansen, A. S. Region capture micro-C reveals coalescence of enhancers and promoters into nested microcompartments. Nat. Genet. 55, 1048–1056 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wu, H. et al. Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice. Nat. Methods 21, 974–982 (2024).
Article PubMed PubMed Central Google Scholar
Lee, D. S. et al. Simultaneous profiling of 3D genome structure and DNA methylation in single human cells. Nat. Methods 16, 999–1006 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, G. et al. Joint profiling of DNA methylation and chromatin architecture in single cells. Nat. Methods 16, 991–993 (2019).
Article CAS PubMed PubMed Central Google Scholar
Liu, Z. et al. Linking genome structures to functions by simultaneous single-cell Hi-C and RNA-seq. Science 380, 1070–1076 (2023).
Article ADS CAS PubMed Google Scholar
Zhou, T. et al. GAGE-seq concurrently profiles multiscale 3D genome organization and gene expression in single cells. Nat. Genet. 56, 1701–1711 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chai, H. et al. Tri-omic single-cell mapping of the 3D epigenome and transcriptome in whole mouse brains throughout the lifespan. Nat. Methods 22, 994–1007 (2025).
Article CAS PubMed PubMed Central Google Scholar
Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Ray, J. et al. Chromatin conformation remains stable upon extensive transcriptional changes driven by heat shock. Proc. Natl. Acad. Sci. USA 116, 19431–19439 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, X. & Yue F. HiCLift: a fast and efficient tool for converting chromatin interaction data between genome assemblies. Bioinformatics 39, btad389 (2023).
Wang, X. T., Cui, W. & Peng, C. HiTAD: detecting the structural and functional hierarchies of topologically associating domains from chromatin interactions. Nucleic Acids Res. 45, e163 (2017).
Article PubMed PubMed Central Google Scholar
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
van der Velde, A. et al. Annotation of chromatin states in 66 complete mouse epigenomes during development. Commun. Biol. 4, 239 (2021).
Article PubMed PubMed Central Google Scholar
Wang, X. Raichu: a normalization method for boosting the detection of enhancer–promoter loops from chromatin interaction data. Zenodo https://doi.org/10.5281/zenodo.18082586 (2025).

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (No. 2022YFC2703400 to X.W.) and the National Natural Science Foundation of China (No. 32470698 to X.W.). Most analyses in this study were supported by the Computing Center in Xi’an.

Author information

Detong Shi
Present address: Zhongda Hospital, Advanced Institute for Life and Health, School of Public Health, Southeast University, Nanjing, China
These authors contributed equally: Xiaotao Wang, Detong Shi.

Authors and Affiliations

Institute of Reproduction and Development, Shanghai Key Laboratory of Reproduction and Development, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
Xiaotao Wang, Detong Shi, Feiyang Xue, Hongbo Yang & Linghan Jiang
Shanghai Key Laboratory of Female Reproductive Endocrine Related Diseases, Shanghai, China
Xiaotao Wang
Shanghai Immune Therapy Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Yunxia Liu

Authors

Xiaotao Wang
View author publications
Search author on:PubMed Google Scholar
Detong Shi
View author publications
Search author on:PubMed Google Scholar
Feiyang Xue
View author publications
Search author on:PubMed Google Scholar
Yunxia Liu
View author publications
Search author on:PubMed Google Scholar
Hongbo Yang
View author publications
Search author on:PubMed Google Scholar
Linghan Jiang
View author publications
Search author on:PubMed Google Scholar

Contributions

X.W. conceived the project, implemented the Raichu framework, performed all data analyses, designed the figures, and wrote the manuscript. D.S. contributed to the study through extensive discussion and analysis support. F.X. contributed to data analysis. Y.L. supported data collection and figure editing. H.Y. contributed through discussion. L.J. assisted with software testing.

Corresponding author

Correspondence to Xiaotao Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data 1 (download XLSX )

Supplementary Data 2 (download XLSX )

Supplementary Data 3 (download XLSX )

Supplementary Data 4 (download XLSX )

Supplementary Data 5 (download XLSX )

Supplementary Data 6 (download XLSX )

Supplementary Data 7 (download XLSX )

Supplementary Data 8 (download XLSX )

Supplementary Data 9 (download XLSX )

Supplementary Data 10 (download XLSX )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, X., Shi, D., Xue, F. et al. Boosting the detection of enhancer-promoter loops via normalization methods for chromatin interaction data. Nat Commun 17, 2299 (2026). https://doi.org/10.1038/s41467-026-69082-z

Download citation

Received: 31 December 2024
Accepted: 21 January 2026
Published: 03 February 2026
Version of record: 10 March 2026
DOI: https://doi.org/10.1038/s41467-026-69082-z