Introduction

The spatial organization of chromatin plays a critical role in the precise control of transcriptional programs in mammalian cells1,2. At kilobase to megabase scales, distant loci on the linear genome can come into close proximity in 3D space, forming structures known as chromatin loops3. These loops are broadly categorized into two types: structural loops, which connect CTCF-bound insulators, and regulatory loops, which link promoters to distal cis-regulatory elements, such as enhancers4,5. Disruption or rewiring of chromatin loops has been implicated in various developmental disorders and cancers6,7.

A suite of experimental techniques has been developed to explore 3D genome organization and map chromatin loops8. Among these, Hi-C has become one of the most widely used methods due to its ability to capture chromatin contacts between all possible pairs of genomic loci3,9. However, Hi-C data are subject to substantial technical biases, including those related to sequence mappability, GC content, and restriction fragment length10. To mitigate these biases, a range of computational normalization methods has been developed, which can be broadly categorized into three groups: explicit methods, such as HiCNorm11 and OneD12, which model known sources of bias; implicit methods, such as ICE (Iterative Correction and Eigenvector decomposition)13 and KR (Knight–Ruiz matrix balancing)3, which adjust the data without explicitly modeling biases; and hybrid methods, which combine features of both approaches14. Implicit methods like ICE and KR, owing to their simplicity and broad applicability, have become the de facto standards for normalizing Hi-C data and are widely used in downstream analyses, including the identification of chromatin loops.

However, both ICE and KR have notable limitations. In a recent study by the 4D Nucleome Consortium, we observed that although these methods perform well in identifying CTCF-mediated loops, chromatin compartments, and other higher-order structures, they often fail to detect transcription-related loops15. This limitation stems from a core assumption in both methods: that all genomic loci should have equal visibility in a Hi-C map. In practice, this assumption can lead to over-correction of interaction signals, causing low-frequency contacts—such as enhancer–promoter loops—to be under-detected, despite their central role in gene regulation.

To address this long-standing gap in 3D genome analysis, we introduce Raichu, a computational method for normalizing chromatin contact data. Raichu is an implicit method that retains the simplicity of ICE and KR but diverges from their core assumption of uniform visibility across genomic loci. Instead, it employs an optimization-based approach to adjust for variable interaction biases embedded in the raw data, allowing it to preserve signals from biologically important—but often subtle—chromatin interactions. Our results show that Raichu detects nearly twice as many chromatin loops as either ICE or KR, with a notable enrichment for enhancer–promoter loops critical to gene regulation. Importantly, Raichu outperforms existing methods in identifying differential loops between experimental conditions, offering insights into how chromatin architecture regulates transcription across cellular states. Furthermore, Raichu demonstrates robustness across varying sequencing depths and 3D genomic platforms, making it a versatile tool for chromatin interaction analysis.

Results

Limitations of existing Hi-C data normalization methods in detecting transcription-related loops

In this study, we focus on implicit normalization methods for Hi-C data, as explicit and hybrid methods require additional external inputs (e.g., mappability, GC content, restriction fragment length), which limits their applicability to genome assemblies where such data may be unavailable (Supplementary Table 1). Specifically, we compare two widely used matrix balancing implementations: Iterative Correction and Eigenvector decomposition (ICE), as implemented in the cooler package16, and Knight–Ruiz balancing (KR), available in the juicer toolkit17. Both cooler and juicer are central to the 3D genomic data analysis ecosystem and serve as default Hi-C processing tools within the 4D Nucleome Consortium pipeline18. While other software based on implicit methods exists, these tools either implement conceptually similar matrix balancing algorithms or are less compatible with standard downstream analysis workflows.

In Supplementary Fig. 1, we present raw Hi-C contact maps alongside ICE- and KR-normalized maps for selected genomic regions in GM12878 cells. As expected, both normalization methods reduce background noise and improve the visibility of topologically associating domains (TADs)19,20 and chromatin loop structures. However, closer examination reveals that many enhancer-associated loops—marked by H3K27ac peaks—are clearly visible in the raw Hi-C data but become substantially attenuated after ICE or KR normalization, often to the point of being indistinguishable from background noise. This observation highlights a key limitation of existing normalization methods: their tendency to suppress transcription-related interactions, particularly low-frequency enhancer–promoter loops.

Raichu: a computational method for normalizing chromatin contact data

Here, we present Raichu, a computational method for normalizing chromatin contact data (Fig. 1a). Raichu is conceptually grounded in a biophysical framework that models chromatin as a polymer, where the observed interaction frequency between any two loci reflects the combined effects of three components: (1) a distance-dependent decay, (2) locus-specific biases, and (3) a residual component specific to the interaction itself. Among these, the distance-dependent decay is a well-established property of polymer folding and accounts for the global background of interactions, describing the average decrease in contact frequency with increasing genomic distance. This behavior is consistently observed across Hi-C and other 3D genome mapping technologies and can be robustly estimated from the raw contact matrix15. Locus-specific biases—arising from factors such as GC content, mappability, and restriction fragment density—are not intrinsic to 3D genome organization but introduce systematic distortions into measured contact frequencies.

Fig. 1: A computational method for normalizing chromatin contact data.
Fig. 1: A computational method for normalizing chromatin contact data.
Full size image

The GM12878 Hi-C dataset was used for all panels. a Workflow of Raichu. b Average raw and normalized contact signals (ICE, KR, and Raichu) at 5-kb resolution plotted as a function of genomic distance. The black line indicates the theoretical 1/L decay scaling. c Comparison of bias vectors calculated by ICE and Raichu at 100-kb and 5-kb resolutions. d Example regions showing CTCF and H3K27ac ChIP-seq signals alongside bias vectors calculated by ICE, KR, and Raichu. e Comparison of ICE- and Raichu-normalized signals between genomic regions stratified by ChIP-Seq signal intensity for the indicated transcription factors or histone modifications. The X and Y axes represent genomic bins ranked by ChIP-Seq enrichment: bin 0 corresponds to non-peak regions, and bins 1–8 correspond to peak regions with increasing signal intensity. For each bin pair, the average observed/expected contact frequency was calculated. Source data are provided as a Source Data file.

To correct for these biases, Raichu employs an efficient optimization algorithm based on dual annealing to estimate a genome-wide bias vector that, together with the distance-dependent decay (which is derived from the raw contact matrix and remains fixed throughout the optimization process), best explains the observed raw contact map (Methods). The estimated bias values are then used to normalize the data in a manner similar to ICE and KR—by dividing the observed contact frequency by the product of the bias values for the two interacting loci. This procedure is designed to remove locus-specific technical biases while preserving the expected distance-decay and retaining a residual signal that reflects genuine interactions between specific loci.

Unlike ICE and KR, Raichu-normalized contact maps exhibit substantially greater variability in visibility (Supplementary Fig. 2). Using multiple Hi-C datasets as benchmarks, we found that Raichu preserves the large-scale distance-dependent decay structure (Fig. 1b), and this pattern holds across species (Supplementary Fig. 3) and under various perturbations, including the depletion of cohesion, NIPBL, and WAPL (Supplementary Fig. 4)3,21,22,23. Importantly, although Raichu uses a single uniform distance-decay function during optimization, it accurately captures region-specific decay behavior across diverse genomic contexts, such as different compartment types3 and chromatin domain states (active, inactive, and repressed)24 (Supplementary Fig. 3).

To directly test whether a uniform distance-decay function is necessary for Raichu to function properly, we implemented an alternative version that applies compartment-specific decay functions. The resulting bias vectors and normalized contact maps were highly similar to those produced using a single genome-wide decay function, demonstrating that Raichu does not rely on the assumption of uniform scaling decay (Supplementary Fig. 5).

Raichu-normalized contact maps are highly reproducible across biological replicates, with HiCRep correlation coefficients25 comparable to ICE and higher than KR. This difference may reflect the distinct treatment of regions with poor mappability or low coverage—such as those near centromeres and telomeres—where both ICE and Raichu exclude these regions from normalization, whereas KR assigns bias values to them (Supplementary Fig. 6). Outside of these problematic regions, the bias vectors and normalized contact maps generated by all three methods were broadly similar (Fig. 1c and Supplementary Fig. 7). Key chromatin architectural features, including compartments (measured by the first principal component, PC1) and TADs (measured by insulation scores)26, also showed strong concordance across methods (Supplementary Figs. 8 and 9).

Upon closer inspection of specific genomic regions, however, we found that while Raichu’s bias vectors followed similar trends to those of ICE and KR, they differed at specific loci, with peak and valley magnitudes generally lower (Fig. 1d). This difference translated into stronger Raichu-normalized signals for interactions that are weaker than canonical loop dots yet clearly visible in raw Hi-C data (highlighted by black circles in the bottom panel of Supplementary Fig. 7b). Given that various transcription factors (TFs) and histone modifications have been associated with chromatin loop formation, we next evaluated the enrichment of normalized contact signals between ChIP-seq peaks for selected TFs and histone modifications (Fig. 1e and Supplementary Fig. 10). Across all evaluated factors, Raichu-normalized signals consistently showed greater enrichment than those from ICE and KR, suggesting that Raichu enhances the detection of chromatin loops and may be better suited for capturing transcription-related interactions.

Raichu identifies thousands of transcription-related loops missed by existing methods

To assess the effectiveness of Raichu in detecting chromatin loops, we applied HiCCUPS3—a widely used loop-calling algorithm—to Hi-C contact maps normalized by either ICE or Raichu at 5-kb and 10-kb resolutions (Methods). We first benchmarked performance using the GM12878 dataset, which is one of the most deeply sequenced Hi-C datasets to date. Strikingly, while ICE detected 15,446 loops, Raichu identified 28,986. Moreover, 90.6% of ICE-detected loops (13,997 out of 15,446) were also recovered by Raichu, whereas 51.7% of Raichu-detected loops (14,989 out of 28,986) were not identified by ICE (Fig. 2a).

Fig. 2: Raichu identifies thousands of transcription-related loops missed by ICE.
Fig. 2: Raichu identifies thousands of transcription-related loops missed by ICE.
Full size image

HiCCUPS was used for loop detection. a Venn diagram showing the overlap of loops detected by ICE and Raichu in GM12878 cells. b Violin plots comparing the sizes of ICE-specific (n = 1449), Raichu-specific (n = 14,989), and shared (n = 13,997) loops detected by both methods. For box plots overlaid on each violin, the center line indicates the median, the box limits represent the upper and lower quartiles, and the whiskers extend to 1.5 times the interquartile range. c Comparison of contact signals in an example genomic region normalized by ICE or Raichu. Contact heatmaps are shown alongside gene annotations, RNA-seq profiles, and ChIP-seq signals for selected transcription factors and histone modifications. Black circles indicate detected loops, yellow bars mark loop anchors detected by both ICE and Raichu, and blue bars denote Raichu-specific anchors. d Proportions of promoter–promoter (P–P), enhancer–enhancer (E–E), and promoter–enhancer (P–E) loops among ICE-specific, Raichu-specific, and shared loop sets. e Fraction of loop anchors bound versus fold enrichment for 132 transcription factors and 10 histone modifications, with colors indicating loop categories. f ChIP-seq profiles of selected transcription factors and histone modifications centered on both anchors of each loop, grouped by loop category. Each row represents one loop. g Overlap of each loop category with orthogonal ChIA-PET and HiChIP interactions. h APA plots for ICE-specific and Raichu-specific loops in ICE- and Raichu-normalized Hi-C maps. Source data are provided as a Source Data file.

We grouped the detected loops into three categories: ICE-specific, Raichu-specific, and common loops identified by both methods. While the average genomic distance between loop anchors was similar across categories, Raichu-specific loops displayed a distinct bimodal size distribution (Fig. 2b)—a pattern consistent with the idea that transcription-related loops tend to span shorter distances than structural CTCF-mediated loops5,15,27,28.

To evaluate the functional relevance of the detected loops, we analyzed the overlap between loop anchors and ChIP-seq peaks for selected TFs and histone modifications. Fig. 2c shows a representative genomic region where ICE identified only two loops, both associated with CTCF and RAD21 peaks. In contrast, Raichu not only recovered these two loops but also uncovered seven additional loops in the same region. Most of these additional loops overlapped with either H3K27ac peaks (a mark of active enhancers and promoters) or H3K4me3 peaks (a promoter mark) at both anchors, suggesting their involvement in transcriptional regulation.

At the genome-wide level, Raichu-specific loops were more frequently classified as enhancer–promoter (E–P), enhancer–enhancer (E–E), and promoter–promoter (P–P) interactions (40.8, 46.3, and 19.7%, respectively) compared to ICE-specific loops (19.3, 25.5, and 7.3%) and common loops (23.2, 28.0, and 8.9%) (Fig. 2d). To systematically investigate regulatory factor associations, we computed the fold enrichment of 132 TFs and 10 histone modifications at loop anchors using ENCODE ChIP-Seq data (Fig. 2e). As expected, common loops showed the strongest enrichment for CTCF and RAD21. While ICE-specific and Raichu-specific loops exhibited comparable enrichment for these structural factors, Raichu-specific loops showed substantially greater enrichment for a broader set of transcription-associated factors, including RNA polymerase II (POLR2A), CREB1, RELB, H3K4me3, and H3K27ac (Fig. 2e, f and Supplementary Fig. 11).

To further explore the nature of loops detected by ICE and Raichu, we validated each set against four orthogonal 3D genomic datasets (Supplementary Data 1 and 5): CTCF ChIA-PET (targeting CTCF-mediated interactions), Pol2 ChIA-PET (targeting RNA polymerase II-mediated interactions), H3K27ac HiChIP (targeting H3K27ac interactions), and H3K4me3 HiChIP (targeting H3K4me3 interactions). As expected, common loops exhibited the highest validation rate across these datasets (72.8%; 10,191 out of 13,997), followed by Raichu-specific loops (52.2%; 7823 out of 14,989) and ICE-specific loops (44.2%; 640 out of 1449) (Fig. 2g). Notably, among the validated Raichu-specific loops, 74.6% (5837 out of 7823) overlapped with transcription-related interactions captured by Pol2 ChIA-PET, H3K27ac HiChIP, or H3K4me3 HiChIP—substantially higher than the corresponding proportions for ICE-specific loops (60.5%) and common loops (57.8%). Even for Raichu-specific loops that did not overlap with any orthogonal datasets (7166 out of 14,989), Aggregate Peak Analysis (APA) using raw contact maps from ChIA-PET and HiChIP revealed clear enrichment signals, whereas ICE-specific loops in the unsupported category (809 out of 1449) exhibited much weaker enrichment (Supplementary Fig. 12a). Furthermore, ChIP-Seq profiles demonstrated that anchors of Raichu-specific loops unique to Hi-C were enriched for CTCF as well as multiple transcription-related factors, whereas ICE-specific loops unique to Hi-C were enriched only for CTCF and RAD21, with even those signals weaker than in Raichu-specific loops (Supplementary Fig. 12b).

It is worth noting that the absence of a loop from a call set does not necessarily imply a lack of enrichment; it may simply fall below the statistical threshold. To assess whether ICE-specific and Raichu-specific loops were enriched in each normalization setting, we again performed APA (Fig. 2h). Interestingly, ICE-specific loops showed even greater enrichment in Raichu-normalized maps than in ICE-normalized ones, indicating that these loops are detectable by Raichu but did not rank highly enough to be called as loops. In contrast, Raichu-specific loops displayed much weaker enrichment in ICE-normalized data, suggesting they are genuinely missed by ICE, even when applying relaxed cutoffs.

We also benchmarked Raichu against KR normalization and assessed its performance using loop-calling algorithms other than HiCCUPS29, with consistent results observed across all comparisons (Supplementary Figs. 13 and 14).

Raichu-detected loops are supported by ligation-free and imaging-based methods

We next extended our benchmark analyses to mouse embryonic stem cells (mESCs), a system with both Hi-C30 and multiple orthogonal 3D genomic datasets31,32,33 available. Consistent with our findings in GM12878 cells, Raichu substantially increased loop detection sensitivity in mESCs, identifying 27,410 loops compared to 11,333 detected by ICE (Fig. 3a). While the majority of ICE-detected loops were also recovered by Raichu, 65.4% of Raichu-detected loops were not detected by ICE. As in GM12878, Raichu-specific loops in mESCs exhibited a distinct bimodal size distribution (Fig. 3b), showed stronger enrichment for active histone modifications and stem cell-specific transcription factors (e.g., POU5F1 and NANOG; Fig. 3c), and were more likely to span E–E, E–P, and P–P interactions (Fig. 3d) than either ICE-specific loops or those shared between both methods. Similar results were also obtained from comparisons between Raichu and KR on the same dataset (Supplementary Fig. 15a).

Fig. 3: Validation of Raichu-specific loops using DNA SPRITE, GAM, and DNA seqFISH+ in mouse embryonic stem cells (mESCs).
Fig. 3: Validation of Raichu-specific loops using DNA SPRITE, GAM, and DNA seqFISH+ in mouse embryonic stem cells (mESCs).
Full size image

a Venn diagram showing the overlap of chromatin loops detected by ICE and Raichu from mESC Hi-C data. b Violin plots comparing loop sizes among ICE-specific (n = 1852), Raichu-specific (n = 17,929), and shared (n = 9481) loops. For box plots overlaid on each violin, the center line indicates the median, the box limits represent the upper and lower quartiles, and the whiskers extend to 1.5 times the interquartile range. c Fraction of loop anchors bound and fold enrichment for selected transcription factors and histone modifications, with different colors indicating different loop categories. d Proportions of promoter–promoter (P–P), enhancer–enhancer (E–E), and promoter–enhancer (P–E) loops among ICE-specific, Raichu-specific, and shared loops. e APA plots of Raichu-specific loops using contact signals from DNA SPRITE (10-kb resolution) and GAM (30-kb resolution). f Spatial distances measured by DNA seqFISH+ between anchors of Raichu-specific loops and matched control loops. Three loop categories are shown: CTCF loops (with CTCF binding peaks at either anchor), enhancer–promoter (E–P) loops, and other loops that are neither CTCF nor enhancer–promoter. In each box plot, the center line indicates the median, the box limits represent the upper and lower quartiles, and the whiskers extend to 1.5 times the interquartile range. P values were calculated using a two-sided Wilcoxon signed-rank test. Source data are provided as a Source Data file.

To determine whether Raichu-specific loops represent bona fide spatial interactions, we performed APA using contact signals from two ligation-free 3D genome mapping methods: GAM31 and DNA SPRITE32. We observed strong signal enrichment at Raichu-specific loop positions in both datasets (Fig. 3e), supporting their physical proximity in the nucleus. To directly measure spatial distances between loop anchors, we further analyzed high-resolution DNA seqFISH+ imaging data33. We examined three categories of Raichu-specific loops: (1) CTCF-mediated loops (with CTCF binding peaks at either anchor), (2) E–P loops, and (3) other loops that are neither CTCF-mediated nor E–P. For each category, we constructed matched control loops by fixing one anchor and placing the other at the same genomic distance but in the opposite direction. Across all three categories, Raichu-specific loops exhibited significantly shorter spatial distances between anchors than their matched controls (Fig. 3f), indicating that these loci are brought into closer proximity in 3D nuclear space.

Together, these orthogonal validations—using both ligation-free contact data and direct imaging—provide independent support for the biological relevance of Raichu-specific loops and demonstrate that Raichu accurately captures higher-order chromatin organization.

Raichu is robust across different sequencing depths

To evaulate the impact of sequencing depth on Raichu’s performance, we computationally downsampled the GM12878 dataset to 23 different levels, ranging from 100 million to 2 billion intra-chromosomal reads (Methods). As expected, the number of detected loops decreased with reduced sequencing depth for both ICE and Raichu. Notably, Raichu consistently identified approximately twice as many loops as ICE across all tested depths (Fig. 4a, b). Consistent with observations from the full-depth dataset, the additional loops detected by Raichu were enriched for transcription-related interactions, as indicated by a higher fraction of anchors overlapping H3K27ac peaks (Fig. 4c).

Fig. 4: Raichu performs well across various sequencing depths.
Fig. 4: Raichu performs well across various sequencing depths.
Full size image

a Number of loops detected by ICE and Raichu across a series of downsampled GM12878 Hi-C maps. b Counts of ICE-specific and Raichu-specific loops for the same series of downsampled GM12878 Hi-C maps. c Fraction of loop anchors overlapping H3K27ac versus fraction overlapping CTCF for ICE-specific and Raichu-specific loops across different down-sampling levels. Dot sizes represent sequencing depths. d Comparison of loops identified by ICE using 900 million usable reads and Raichu using 350 million usable reads. The panel includes contact heatmaps, gene annotations, RNA-Seq profiles, and ChIP-Seq signals for selected transcription factors and histone modifications. Black circles indicate identified loops. e Overlap of ICE-specific, Raichu-specific, and shared loops with orthogonal ChIA-PET and HiChIP interactions, comparing loops detected by ICE (900 million reads) and Raichu (350 million reads). Source data are provided as a Source Data file.

Interestingly, the bimodal size distribution characteristic of Raichu-specific loops (Fig. 2b) became progressively less pronounced as sequencing depth decreased, disappearing entirely below ~500 million intra-chromosomal contacts (Supplementary Fig. 16). This trend may be partially attributed to changes in the regulatory composition of the detected loops: as depth declines, the proportion of anchors overlapping CTCF increases (Fig. 4c), while a subset of short-range, non-CTCF-associated loops—typically weaker in contact enrichment—becomes increasingly difficult to detect without deep coverage.

Raichu’s greater power in detecting chromatin loops enables it to achieve loop counts at lower sequencing depths that are comparable to those of ICE at higher depths (Fig. 4a). For example, while ICE detected 10,589 loops at ~900 million usable reads, Raichu identified a similar number—10,900 loops—with only ~350 million reads. Of these, 59.6% (6307 out of 10,589) of ICE-detected loops overlapped with 57.9% (6307 out of 10,900) of Raichu-detected loops. Importantly, validation rates based on orthogonal ChIA-PET and HiChIP datasets were comparable between ICE-specific and Raichu-specific loops (74.3 vs. 69.6%). However, among the validated loops, 82.2% of Raichu-specific loops overlapped transcription-related interactions identified by Pol2 ChIA-PET, H3K27ac HiChIP, or H3K4me3 HiChIP, compared to only 53.0% for ICE-specific loops (Fig. 4d, e). Similar trends were observed across additional sequencing depths (Supplementary Fig. 17).

Together, these findings demonstrate that Raichu consistently detects thousands of additional transcription-related chromatin loops, even under reduced sequencing coverage.

Raichu detects unique differential loops involved in transcriptional regulation

During cell development or drug treatment, chromatin looping structures can undergo dramatic changes that lead to gene activation or repression34,35,36,37. Accurately detecting these changes between conditions is critical for understanding the molecular mechanisms of gene regulation underlying specific cell states.

To demonstrate the power of Raichu in detecting such changes, we analyzed a cellular system described in a previous study38. Briefly, through targeted sequencing of 5008 patients, that study identified a germline regulatory variant in the GATA3 enhancer associated with Philadelphia chromosome-like acute lymphoblastic leukemia. To investigate its role in 3D genome organization and gene regulation, GM12878 cells were genetically engineered by replacing the wild-type C/C allele with the risk A/A allele. In wild-type cells, GATA3 is only moderately expressed; in engineered cells, however, enhancer activity is significantly increased, leading to elevated GATA3 expression. This, in turn, enhances GATA3 binding at thousands of enhancers, further driving transcription of downstream genes.

In the original study, ICE-normalized Hi-C maps revealed only a few instances in which differential chromatin loops were associated with gene regulation. Here, we tested whether the improved sensitivity of Raichu in detecting transcription-related loops could enhance the detection of meaningful chromatin looping changes between wild-type (C/C) and engineered (A/A) cells.

Raichu indeed revealed clearer differences at specific loci. At the SUPT16H gene locus, ICE-normalized data failed to show any meaningful differences in chromatin looping between the two conditions (Fig. 5a, left). In contrast, Raichu identified an additional loop unique to the engineered cells, linking SUPT16H—which encodes SPT16, a subunit of the FACT complex involved in chromatin remodeling, transcriptional regulation, and genomic stability—to a downstream enhancer with specific GATA3 binding in the engineered cells (Fig. 5a, right). This example, together with others (Supplementary Fig. 18), highlights the ability of Raichu to uncover previously unrecognized regulatory mechanisms involved in cell development and disease.

Fig. 5: Raichu detects unique differential loops associated with transcriptional regulation.
Fig. 5: Raichu detects unique differential loops associated with transcriptional regulation.
Full size image

a Comparison of Hi-C contact maps, RNA-Seq, GATA3 ChIP-Seq, and H3K27ac ChIP-Seq signals between wild-type (C/C) and engineered (A/A) GM12878 cells. The left panel shows ICE-normalized Hi-C maps, and the right panel shows Raichu-normalized maps. Black circles indicate detected loops, and the anchors of a Raichu-specific A/A loop are highlighted in yellow. b Venn diagram showing the overlap of loops detected by ICE and Raichu in C/C and A/A cells. c Loop size distributions for C/C-specific and A/A-specific loops unique to ICE or Raichu. The number of loops in each group is: ICE-specific C/C (n = 245), Raichu-specific C/C (n = 937), ICE-specific A/A (n = 65), and Raichu-specific A/A (n = 249). For box plots overlaid on each violin, the center line indicates the median, the box limits represent the upper and lower quartiles, and the whiskers extend to 1.5 times the interquartile range. d APA plots for C/C-specific and A/A-specific loops unique to ICE or Raichu, shown in both C/C and A/A Hi-C maps. e Counts of promoter–enhancer (P–E), enhancer–enhancer (E–E), and promoter–promoter (P–P) loops within the indicated loop categories. f H3K27ac and GATA3 binding profiles centered on loop anchors for the indicated loop categories. Gray and black lines represent signal profiles from C/C and A/A cells, respectively. g Quantile-normalized transcription levels in C/C versus A/A cells for genes linked to GATA3-bound enhancers via the indicated loop categories. In each box plot, the center line indicates the median, the box limits represent the upper and lower quartiles, and the whiskers extend to 1.5 times the interquartile range. P values were calculated using a two-sided Wilcoxon signed-rank test. n denotes the number of genes in each comparison. Source data are provided as a Source Data file.

Using Hi-C data normalized by both ICE and Raichu, we identified 4504 loops specific to wild-type (C/C) cells. Of these, 937 were unique to Raichu, while only 245 were unique to ICE. Similarly, we found 964 loops specific to engineered cells, with 249 unique to Raichu and only 65 unique to ICE (Fig. 5b–d). APA plots confirmed the expected pattern: C/C-specific loops—whether identified by Raichu or ICE—were enriched in wild-type cells but depleted in engineered cells, and vice versa for A/A-specific loops (Fig. 5d; Methods). While the size distributions of differential loops were similar across methods, Raichu-specific loops tended to span shorter genomic distances than ICE-specific loops (average: 203 kb vs. 298 kb for C/C-specific loops; 202 kb vs. 306 kb for A/A-specific loops), consistent with Raichu’s improved sensitivity for short-range interactions (Fig. 5c).

To assess the functional relevance of these differential loops, we categorized them into P–E, E–E, and P–P types. Strikingly, Raichu-specific loops contained 5.75 to 16.5 times more transcription-related interactions than ICE-specific loops (Fig. 5e). Moreover, Raichu-specific differential loops were significantly enriched for H3K27ac and GATA3 binding signals at their anchors—a pattern absent from ICE-specific loops (Fig. 5f). Finally, only Raichu-unique A/A-specific loops were associated with significant upregulation of gene expression in engineered cells (Fig. 5g).

Collectively, these results demonstrate that Raichu improves the detection of biologically meaningful differential loops between conditions, revealing interactions closely associated with transcriptional regulation and cell identity that are frequently missed by ICE.

Raichu detects conserved enhancer–promoter loops across species

The unique power of Raichu in detecting transcription-related loops prompted us to investigate whether it can also uncover conserved regulatory interactions across species. To this end, we focused on neural progenitor cells (NPCs), for which high-quality Hi-C data are available in both mouse30 and human39.

In mouse NPCs (mNPCs), Raichu identified 32,639 loops—more than twice as many as ICE (14,671) (Fig. 6a). The vast majority (85.7%) of ICE-detected loops were also recovered by Raichu, while Raichu detected 20,068 additional loops not identified by ICE. As observed in previous cell types, Raichu-specific loops in mNPCs exhibited a distinct bimodal size distribution and were more frequently anchored at regions marked by active histone modifications (Fig. 6b, c). Similar patterns were observed in human NPCs (hNPCs), where Raichu identified 20,930 unique loops, compared to only 2009 uniquely detected by ICE (Fig. 6d–f). Comparable results were obtained when comparing Raichu and KR (Supplementary Fig. 15b), underscoring Raichu’s robustness across species and its sensitivity to regulatory interactions that may be missed by conventional normalization methods.

Fig. 6: Raichu detects conserved enhancer–promoter loops across species.
Fig. 6: Raichu detects conserved enhancer–promoter loops across species.
Full size image

For all box plots overlaid on violins in this figure, the center line indicates the median, the box limits represent the upper and lower quartiles, and the whiskers extend to 1.5 times the interquartile range. a Venn diagram showing the overlap of chromatin loops detected by ICE and Raichu in mouse neural progenitor cells (mNPCs). b Violin plots comparing the loop sizes of ICE-specific (n = 2100), Raichu-specific (n = 20,068), and shared (n = 12,571) loops from (a). c ChIP-Seq profiles of CTCF and selected histone modifications centered on both anchors of each loop, grouped by loop category. Each row represents one loop. df Corresponding analyses for human neural progenitor cells (hNPCs), with loop counts of n = 2009 (ICE-specific), n = 20,930 (Raichu-specific), and n = 29,676 (shared). g, h Examples of enhancer–promoter loops uniquely detected by Raichu and conserved between mouse and human NPCs. Blue circles indicate detected loops; bold black circles highlight conserved enhancer–promoter loops unique to Raichu. Yellow shading marks the anchor regions of these conserved loops. Only genes involved in the conserved loops are shown. Source data are provided as a Source Data file.

To assess evolutionary conservation, we compared loop calls between mouse and human NPCs (Methods). Raichu recovered 3089 conserved loops across species—more than fivefold the number detected by ICE (540). Notably, many of the conserved loops uniquely identified by Raichu connected distal enhancers to key neurodevelopmental genes, including SETD1A and SOX4 (Fig. 6g, h), underscoring their potential functional relevance.

These findings demonstrate that Raichu not only improves the detection of transcription-related loops within individual species but also enhances the discovery of conserved enhancer–promoter interactions across evolutionary distances.

Raichu improves the detection of transcription-related loops in Micro-C and region capture Micro-C

In addition to Hi-C, a number of alternative methods have been developed to study 3D genome organization. To assess whether Raichu can be applied beyond Hi-C, we evaluated its performance on two related platforms: Micro-C40 and region capture Micro-C (RCMC)41. Micro-C uses micrococcal nuclease digestion to achieve nucleosome-level resolution of chromatin contacts, while RCMC further enriches for specific genomic regions of interest, enabling ultra-high-resolution contact maps in targeted loci.

We first applied Raichu to a Micro-C dataset generated in H1 embryonic stem cells (H1ESCs)40. Visual inspection revealed improved loop detection with Raichu in Micro-C. For example, in a representative genomic region near the NANOG gene—an essential regulator of ESC pluripotency—Raichu identified a loop at 2-kb resolution that was missed by ICE, connecting NANOG to an upstream enhancer cluster (Fig. 7a). Genome-wide, Raichu detected 99,256 loops, more than twice as many as ICE (45,030) (Fig. 7b). While 96.5% of ICE-detected loops were also recovered by Raichu, 56.2% of Raichu-detected loops were unique to Raichu. Moreover, Raichu-specific loops in Micro-C tended to span shorter genomic distances than ICE-specific or shared loops (Fig. 7c). Consistent with previous Hi-C analyses, Raichu-specific loops were more enriched for transcription-related interactions, with higher proportions of E–E, E–P, and P–P contacts (Fig. 7d), and exhibited stronger signals for active histone marks at their anchors (Fig. 7e). Comparable results were also observed in comparisons with KR (Supplementary Fig. 15c).

Fig. 7: Application of Raichu to Micro-C and Region Capture Micro-C datasets.
Fig. 7: Application of Raichu to Micro-C and Region Capture Micro-C datasets.
Full size image

Panels ae use a Micro-C dataset generated in H1ESC cells (PMID: 32213324), and panel f uses a Region Capture Micro-C dataset from mouse embryonic stem cells (PMID: 37157000). a Comparison of contact signals and detected loops between ICE and Raichu in a representative genomic region. Contact maps are shown at 2-kb resolution, with detected loops marked by black circles. Black arrows highlight Raichu-specific loops connecting the NANOG gene to an upstream enhancer cluster (highlighted in yellow). b Venn diagram showing the overlap between loops detected by ICE and Raichu. c Violin plots comparing loop sizes among ICE-specific (n = 1581), Raichu-specific (n = 55,807), and shared (n = 43,449) loops. For box plots overlaid on each violin, the center line indicates the median, the box limits represent the upper and lower quartiles, and the whiskers extend to 1.5 times the interquartile range. d Proportions of promoter–promoter (P–P), enhancer–enhancer (E–E), and promoter–enhancer (P–E) loops within each loop category. e ChIP-Seq profiles of selected transcription factors and histone modifications centered on both anchors of each loop, grouped by loop category. Each row represents one loop. f Comparison of ICE- and Raichu-normalized contact maps at 500-bp resolution from a different genomic region in the Region Capture Micro-C dataset. Black arrows highlight Raichu-specific loops connecting distal enhancers to each other or to the Sox2 gene. Source data are provided as a Source Data file.

To further evaluate the performance of Raichu in ultra-high-resolution contexts, we applied it to an RCMC dataset generated in mouse embryonic stem cells (mESCs)41. For this dataset, both ICE and Raichu normalization were performed strictly within the targeted regions of interest to avoid potential artifacts. In a representative region encompassing the Sox2 locus, Raichu revealed numerous additional loops at 500-bp resolution—both between distal enhancers and Sox2, and among enhancer elements themselves—that were not detected by ICE (Fig. 7f). These results demonstrate that Raichu remains effective even at sub-kilobase resolution and enhances the detection of regulatory loops in targeted, high-resolution datasets.

Raichu enhances the detection of chromatin loops in single-cell Hi-C data

We next applied Raichu to single-cell Hi-C data, which typically suffer from limited contact coverage, making it challenging to reliably detect chromatin loops. To evaluate whether Raichu improves loop detection in this context, we analyzed a single-cell Hi-C dataset generated in GM12878 cells, comprising 221 single cells with a median of approximately 1.08 million contacts per cell42.

To compare the performance of Raichu and ICE, we ranked all 221 cells by sequencing depth and constructed a series of pseudo-bulk datasets by pooling up to 130 cells with the highest contact counts (Supplementary Fig. 19a). For each pooled dataset, loops were identified using HiCCUPS on contact maps normalized with either ICE or Raichu, and loop calls were merged across 5-kb, 10-kb, and 25-kb resolutions (Methods). As shown in Fig. 8a, Raichu consistently detected 1.24 to 1.75 times more loops than ICE. Even with as few as five cells (~10 million contacts), Raichu identified 1697 loops, whereas ICE detected only 970 on the same dataset. Using a reference set of interactions compiled from orthogonal 3D genome mapping datasets (Supplementary Data 1 and 6)—including Hi-C, CTCF ChIA-PET, RNAPII ChIA-PET, CTCF HiChIP, H3K27ac HiChIP, SMC1A HiChIP, H3K4me3 PLAC-Seq, TrAC-Loop, and HiCAR—we evaluated the precision and recall of Raichu and ICE across all pseudo-bulk datasets (Supplementary Fig. 19b, c). Notably, Raichu achieved higher F1 scores than ICE in all tested cases (Fig. 8b).

Fig. 8: Raichu enhances the detection of chromatin loops in single-cell Hi-C data.
Fig. 8: Raichu enhances the detection of chromatin loops in single-cell Hi-C data.
Full size image

a Number of loops detected by ICE and Raichu as a function of the number of merged GM12878 cells. b Comparison of F1 scores for loops detected by ICE and Raichu across the same merged datasets. c Counts of ICE-specific and Raichu-specific loops across increasing numbers of merged GM12878 cells. d Fraction of loop anchors overlapping H3K27ac versus those overlapping CTCF for ICE-specific and Raichu-specific loops. Dot sizes indicate the number of merged cells. e Comparison of contact signals and loop calls between ICE- and Raichu-normalized maps merged from 9 GM12878 cells. Blue circles indicate detected loops, and yellow bars mark loop anchors uniquely detected by Raichu. Source data are provided as a Source Data file.

Across pooled maps, the number of Raichu-specific loops was 2.45 to 4.59 times greater than that of ICE-specific loops (Fig. 8c). More importantly, loop anchors identified by Raichu showed a higher overlap with CTCF and H3K27ac binding compared to those identified by ICE, indicating that Raichu more effectively detects both CTCF-associated and transcription-related loops from single-cell Hi-C data (Fig. 8d, e). Results were consistent when comparing Raichu and KR (Supplementary Fig. 15d), further supporting Raichu’s robustness in single-cell Hi-C analysis.

Discussion

Recent years have underscored the critical role of 3D genome organization in transcriptional regulation, yet existing normalization methods often fail to preserve transcription-related chromatin loops in Hi-C data. Here, we present Raichu, a computational method for normalizing chromatin contact data. Unlike matrix balancing approaches that assume uniform visibility across all loci, Raichu employs an efficient optimization framework to infer a bias vector that best captures locus-specific variability. This design avoids the over-correction artifacts inherent to matrix balancing and substantially improves the detection of transcription-related loops. We show that Raichu is robust across a wide range of sequencing depths, biological systems, and species, and is compatible with multiple 3D genome mapping platforms, including Hi-C, Micro-C, region capture Micro-C (RCMC), and single-cell Hi-C. Notably, Raichu improves loop detection even at sub-kilobase resolution, enabling the identification of fine-scale regulatory interactions that are missed by other methods.

In this study, we compared Raichu with two widely used implementations of matrix balancing, ICE and KR. While other methods, such as HiCNorm11, OneD12, and HiCorr14, normalize Hi-C data by explicitly modeling known genomic biases, we did not include them in our comparison because they require external reference data, which are typically available only for a limited number of genomes (Supplementary Table 1). Built on the cooler Python package16, Raichu does not require any external input and stores the calculated bias vector in the same format as ICE, ensuring seamless compatibility with downstream analyses of compartments, TADs, loops, and other features. Raichu is also computationally efficient (Supplementary Tables 2 and 3): on a Hi-C dataset with 4 billion usable reads at 5-kb resolution, it completed the calculation in ~4 h using 4 CPU cores and required only 17 GB of memory. Together, these features make Raichu a practical and scalable replacement for ICE and KR.

Recent advances in single-cell Hi-C and related techniques have significantly enhanced our ability to study chromatin organization at the resolution of individual cells. These innovations now enable the simultaneous profiling of chromatin contacts alongside additional layers of epigenomic and transcriptomic information, such as DNA methylation37,43,44, RNA expression42,45,46, and chromatin accessibility (ATAC)47. Such integrated approaches allow researchers to identify cell groups or types and investigate cell-type-specific chromatin contacts and their regulatory roles. Typically, the number of cells in each group ranges from a few dozen to a few hundred. Using a recently published single-cell Hi-C dataset in GM12878 cells, we showed that Raichu is effective on pseudo-bulk contact maps derived from as few as five cells—corresponding to ~10 million chromatin contacts. Even under such sparse conditions, Raichu outperformed existing normalization methods in detecting both CTCF-related and transcription-related loops (Fig. 8 and Supplementary Fig. 15d). This result highlights Raichu’s robustness and utility in handling the sparsity typical of single-cell Hi-C data. Future applications of Raichu to pseudo-bulk maps derived from single-cell multi-omics data could provide insights into the relationship between gene regulation and cell-type-specific 3D genome architecture.

The close association between 3D genome organization and transcriptional regulation has long been recognized, with disruptions in chromatin organization implicated in developmental diseases and cancer. However, the dynamic behavior of chromatin loops underlying transcriptional regulation remains incompletely understood, particularly under conditions involving minimal changes in cell state, such as drug treatments or heat shock. In these settings, studies using ICE often report negligible loop changes, reinforcing the notion that most chromatin loops are pre-established48,49. Our results suggest that this conclusion may be due, in part, to ICE’s tendency to attenuate transcription-related signals. By applying Raichu to an engineered GM12878 cell system, we identified differential loops closely associated with gene transcription that ICE failed to detect. These findings highlight Raichu’s ability to reveal previously unrecognized regulatory changes in chromatin organization.

Looking forward, Raichu may help redefine how 3D genomic changes are studied, providing a more accurate perspective on their roles in transcriptional regulation and cellular function. This capability is particularly promising for elucidating subtle chromatin dynamics during development, disease progression, and therapeutic interventions. By addressing key limitations of existing normalization methods, Raichu opens new avenues for exploring chromatin architecture and its impact on gene expression.

Methods

Workflow of Raichu

The core rationale underlying Raichu is that each interaction in a chromatin contact matrix can be decomposed into three components: (1) a global background term (E), which accounts for the distance-dependent decay of contact frequencies (i.e., two loci that are closer together in the linear genome tend to interact more frequently); (2) locus-specific biases (B), which reflect systematic variations in interaction frequency attributable to individual loci (e.g., due to GC content, mappability, or restriction fragment density); and (3) a residual component specific to the interaction itself. Raichu aims to estimate a genome-wide bias vector that, in conjunction with the distance-decay function, best explains the observed raw contact map. Once this bias vector is obtained, contact counts are normalized in a manner similar to ICE and other implicit normalization methods—by dividing each observed contact by the product of the bias values for the two interacting loci.

Given an intra-chromosomal contact matrix M, the global background is calculated as:

$${E}_{d}=\frac{1}{{n}_{d}}{\sum }_{\left|j-i\right|=d}{M}_{{ij}}$$
(1)

where \({n}_{d}\) represents the number of bin pairs separated by distance \(d\), and \({M}_{{ij}}\) represents the observed interaction frequency between bin \(i\) and bin \(j\).

Assuming locus-specific biases are multiplicative, each contact \({M}_{{ij}}\) is modeled as:

$${M}_{{ij}}\approx {B}_{i}{B}_{j}{E}_{{|\,j}-{i|}}$$
(2)

The objective is to estimate B such that the product \({B}_{i}{B}_{j}{E}_{{|\,j}-{i|}}\) approximates the observed contact count \({M}_{{ij}}\). This is formulated as the following optimization problem:

$${{{\rm{minimize}}}}_{{{\bf{B}}}}{\sum }_{i,j}{({M}_{{ij}}-{B}_{i}{B}_{j}{E}_{{|\,j}-{i|}})}^{2}$$
$${{\rm{subject\; to}}}:l < {{\bf{B}}} < u$$
(3)

After obtaining the bias vector B, the normalized contact frequencies \({M}_{{ij}}^{{\prime} }\) are calculated as \({M}_{{ij}}^{{\prime} }=\frac{{M}_{{ij}}}{{B}_{i}*{B}_{j}}\).

The bias vector B is initialized as the square root of the normalized one-dimensional coverage of the contact matrix:

$${B}_{i}^{(0)}=\sqrt{\frac{{S}_{i}}{\bar{{S}_{i}}}}$$
(4)

where \({S}_{i}={\sum }_{j}{M}_{{ij}}\) represents the coverage of bin \(i\), and \(\bar{{S}_{i}}\) is the mean coverage across all loci.

Raichu employs the dual annealing optimization algorithm, with the L-BFGS-B algorithm as the local minimizer. To manage the computational demands of evaluating the objective function across millions or billions of data points, Raichu incorporates two performance-optimization strategies: (1) A sliding window \([{s}_{i},{s}_{i}+w]\) of size \(w\) is applied to each chromosome, with a 10% overlap between consecutive windows (\({s}_{i+1}={s}_{i}+0.9w\)). Bias vectors are computed independently for each window, and overlapping regions are averaged to produce a single bias vector for the chromosome. (2) The Numba library is used to enhance computational efficiency when evaluating the objective function.

Raichu provides several tunable parameters to enhance flexibility: (1) sliding window size (\(w\)), set to 200 bins by default; (2) the maximum number of global search iterations for dual annealing (\(m\)), set to 100; and (3) bias vector search bounds (\(l\), \(u\)), set to 0.001 and 1000, respectively. These default values were chosen empirically to balance computational efficiency and normalization quality. A systematic evaluation revealed that Raichu is robust to variation in these key parameters, producing normalized contact maps that are highly similar and loop detection results that are highly stable across a broad range of settings (Supplementary Fig. 20).

Hi-C and ChIP-seq data processing

For human datasets, the hg38 reference genome was used. For mouse datasets, the mm10 reference genome was used, except for the RCMC datasets, which used mm39. For drosophila datasets, the dm3 reference genome was used.

For most datasets—including Hi-C, Micro-C, RCMC, GAM, DNA SPRITE, DNA seqFISH+, ChIP-Seq, and RNA-Seq—the processed data were downloaded and used directly in our analysis (Supplementary Data 1). In the following cases, we processed the data either from raw reads or partially processed formats:

Reproducibility analysis of contact maps between biological replicates

A Python implementation of the original HiCRep algorithm (v0.1.0; https://github.com/cmdoret/hicreppy) was used to assess the reproducibility of contact maps normalized with different methods. Specifically, for each pair of contact maps from biological replicates on a given chromosome, the stratum-adjusted correlation coefficient (SCC) was calculated using HiCRep, with higher SCC values indicating greater reproducibility. Genome-wide reproducibility was then summarized by computing a weighted average of SCC values across all chromosomes, using chromosome length as the weight. At 10-kb resolution, chromatin contacts up to 1 Mb apart were considered, while at 50-kb resolution, contacts up to 5 Mb apart were included. The smoothing parameter \(h\) was set to 3 for both resolutions.

Calculation of visibility and distance-decay curves from contact maps

The sum of interactions across genomic loci (i.e., visibility) after ICE, KR, and Raichu normalization was calculated using cooltools (v0.7.1; https://github.com/open2c/cooltools), via the command “cooltools coverage”. The “--clr_weight_name” parameter was used to specify the bias vector from the input.cool file, while all other parameters were set to their default values. Distance-decay curves—i.e., average contact frequencies as a function of genomic distance—were computed for raw, ICE-, KR-, and Raichu-normalized contact maps using the command “cooltools expected-cis –smooth --aggregate-smoothed”. The “--clr_weight_name” parameter was again used to select the appropriate bias vector. To calculate distance-decay curves for specific compartments or domain types, genomic regions were provided in BED format (https://genome.ucsc.edu/FAQ/FAQformat.html) using the “--view” parameter.

Calculation of the insulation score and compartment analysis

Both insulation scores and chromatin compartments were calculated using cooltools. For insulation scores, the “cooltools insulation” command was applied with window sizes of 100 kb, 250 kb, and 500 kb at 10-kb, 25-kb, and 50-kb resolutions, respectively. TAD boundaries were defined as loci marked as “True” in the “is_boundary” column with a prominence score (“boundary_strength” column) greater than 0.2. For compartment analysis, the “cooltools eigs-cis” command was applied, using a BedGraph (https://genome.ucsc.edu/FAQ/FAQformat.html) file containing average H3K27ac ChIP-Seq signal per bin as the phasing track. Eigenvalue decomposition was performed on the contact matrix, and the first eigenvector (PC1) was extracted to capture the plaid pattern of chromatin compartments. The sign of PC1 was oriented using the H3K27ac track, such that positive values correspond to A compartments and negative values correspond to B compartments. Saddle plots were generated using the command “cooltools saddle -t cis --qrange 0.01 0.99 --strength”.

Calculation of average contact signals between ChIP-seq peaks of varying binding strength

For this analysis, the following procedures were followed:

  1. 1.

    TADs were identified using HiTAD51 (v0.4.5-r1; https://xiaotaowang.github.io/TADLib/hitad.html) on GM12878 Hi-C contact maps at 25 kb resolution. In the HiTAD output, only regions marked with “0” in the last column were classified as TADs, while all other regions were treated as sub-TADs.

  2. 2.

    ChIP-Seq peaks for CTCF, RAD21, POLR2A, and H3K27ac in GM12878 cells were downloaded from ENCODE (see Supplementary Data 1 for accession codes). Peaks for each factor were sorted and classified into eight groups based on binding strength, using the value in the 7th column of the downloaded peak file.

  3. 3.

    Average contact signals were calculated at 5-kb resolution between each pair of peak groups. For bins containing multiple peaks, only the peak with the highest binding strength was considered. Only intra-TAD contacts were included, and contact signals were normalized by dividing the observed signal by the expected signal at the corresponding genomic distance (observed/expected).

Loop detection in Hi-C data using HiCCUPS and Mustache

To benchmark the performance of Raichu in loop detection, we tested two loop-calling tools on the GM12878 Hi-C dataset: (1) A Python implementation of the original HiCCUPS algorithm3 (v0.3.9; https://github.com/XiaoTaoWang/HiCPeaks), and (2) Mustache29 (https://github.com/ay-lab/mustache). Our goal was to demonstrate that Raichu consistently improves chromatin loop detection compared to existing normalization methods, regardless of the loop detection software used.

For HiCCUPS, we applied the following parameters “--pw 1 2 4 --ww 3 5 7 --only-anchors --maxapart 4000000”. For Mustache, the following parameter was used “-pt 0.05”. Both tools were run on contact matrices at 5-kb and 10-kb resolutions separately. To generate a non-redundant list of detected loops for each tool, results from both resolutions were merged using the “combine-resolutions” command from the HiCPeaks Python package (v0.3.9; https://github.com/XiaoTaoWang/HiCPeaks) with the following parameters “-R 5000 10000 -G 10000 -M 100000 --max-res 10000”.

For other datasets, loops were detected using HiCCUPS only.

For Hi-C, the same HiCCUPS parameters as used for GM12878 were applied.

For Micro-C, we applied the following parameters to identify loops at 2-kb, 5-kb, and 10-kb resolutions “--pw 2 3 4 --ww 5 6 7 --maxww 7 --only-anchors --min-local-reads 100”. Loops detected from the three resolutions were merged using “combine-resolutions -R 2000 5000 10000 -G 10000 -M 100000 --max-res 10000”. We restricted the analysis to interactions within: 1 Mb at 2-kb resolution (“--maxapart 1000000”), 2 Mb at 5-kb resolution, and 4 Mb at 10-kb resolution.

For RCMC, we applied the following parameters to identify loops at 250-bp, 500-bp, and 1-kb resolutions “-C 5 8 18 3 6 --pw 2 3 4 --ww 5 6 7 --maxww 7 --only-anchors --min-local-reads 100”. Results were merged using “combine-resolutions -R 250 500 1000 -G 500 -M 5000 --max-res 500”. We restricted the analysis to interactions within: 100 kb at 250-bp resolution, 200 kb at 500-bp resolution, and 400 kb at 1-kb resolution.

Calculation of the overlap between two loop sets

For each loop coordinate (\(i\), \(j\)) in loop set A, if there exists a loop (\(i\hbox{'},\) \(j\hbox{'}\)) in loop set B such that the Euclidean distance between (\(i\), \(j\)) and (\(i\hbox{'},\) \(j\hbox{'}\)) is less than \(\min (0.2\times {|i}-{j|},\,50{kb})\), we define (\(i\), \(j\)) as preserved in B. Using this definition, we evaluated whether loops detected by Hi-C could be supported by orthogonal ChIA-PET/HiChIP interactions (Supplementary Data 1 and 5).

When generating the Venn diagram of loop sets obtained from different normalization methods, we applied an additional matching criterion: since a loop in one set might match multiple loops in another set based on the distance threshold, we ensured each loop matched only once by searching for the closest loop coordinate in the other set. This approach guaranteed a unique match for each loop meeting the distance criterion. The custom matching script is available at https://github.com/XiaoTaoWang/Raichu/tree/main/benchmark-analysis.

Definition of promoter and enhancer regions

Promoter and enhancer regions were defined based on ChromHMM chromatin state annotations downloaded from the UCSC Genome Browser (https://genome.ucsc.edu/cgi-bin/hgTables).

For GM12878 and H1ESC, the annotation file was based on a 15-state ChromHMM model52. Regions annotated as “1_Active_Promoter”, “2_Weak_Promoter”, or “3_Poised_Promoter” were classified as promoters, while regions annotated as “4_Strong_Enhancer”, “5_Strong_Enhancer”, “6_Weak_Enhancer”, or “7_Weak_Enhancer” were classified as enhancers. Since the original annotations were in hg19 coordinates, they were converted to hg38 using HiCLift50 (v1.0; conversion script available at https://github.com/XiaoTaoWang/Raichu/tree/main/benchmark-analysis).

For mESC, the annotation file was based on a 18-state ChromHMM model53. Regions annotated as “Tss” were defined as promoters, and regions annotated as “Enh” were defined as enhancers.

Enrichment analysis of TFs and histone modifications

To explore the association of transcription factors (TFs) and histone modifications with different loop categories, we calculated both the fraction of loop anchors overlapping ChIP-Seq peaks and a fold-enrichment score.

Briefly, a non-redundant list of loop anchors was compiled for each loop category. For each TF or histone modification, we iterated through this list and counted the number of anchors that overlapped at least one ChIP-Seq peak. To calculate the fold-enrichment score, we generated 100 random control sets by shuffling the loops and repeated the overlap calculation for each control set. Each random loop set preserved the original genomic distance distribution between loop anchors and the number of loops per chromosome, while ensuring that no loop intervals overlapped known genomic gaps. The fold-enrichment score was computed as the ratio of the observed number of overlaps to the mean number of overlaps across the control sets.

For selected TFs and histone modifications, we also generated ChIP-Seq binding profiles around loop anchors. Specifically, the ±50 kb region surrounding each anchor was divided into 50 bins of 2 kb each. The number of ChIP-Seq peaks falling within each bin was counted, producing a 50-element array for each anchor. Each array was then normalized by dividing all values by its mean. Finally, the average binding profile for each loop category was computed by averaging the normalized values across all loop anchors at each bin position.

When computing the binding profiles for GATA3 and H3K27ac ChIP-Seq peaks around C/C-specific and A/A-specific loops unique to ICE or Raichu (Fig. 5f), we skipped the normalization step (i.e., dividing each anchor’s profile by its mean), and instead calculated the average profile directly from the raw peak overlap counts. This approach facilitated comparisons both between loop categories and between biological conditions (C/C vs. A/A).

Down-sampling of Hi-C data

To perform down-sampling at a rate of \(\alpha\) (0 <\(\alpha\) < 1), we used a binomial probability-based approach that does not require re-mapping of the raw reads. Specifically, for each non-zero pixel in the full contact matrix at 5-kb resolution, the contact frequency was assigned as a random integer drawn from a binomial distribution with parameters \({M}_{{ij}}\) and \(\alpha\), where \({M}_{{ij}}\) represents the contact count in the full Hi-C matrix between bin \(i\) and bin \(j\). As both 5-kb and 10-kb contact matrices are required for loop detection (as described above), the “cooler coarsen” command was applied to the downsampled 5-kb contact matrices to generate the corresponding 10-kb contact matrices.

Detection of differential loops

To detect differential loops between wild-type (C/C) and engineered (A/A) GM12878 cells, we applied the following approach to minimize instances where a loop is enriched in a contact map but not detected due to stringent p value or fold-enrichment cutoffs (script available on GitHub at https://github.com/XiaoTaoWang/Raichu/tree/main/benchmark-analysis).

  1. 1.

    For each of the four contact maps (ICE-normalized and Raichu-normalized for wild-type and engineered GM12878 cells), we ran HiCCUPS using two parameter settings. For both settings, loops detected at 5-kb and 10-kb resolutions were merged using the “combine-resolutions” command, as described in the “Loop detection in Hi-C data using HiCCUPS and Mustache” section. The first setting used the default p-value and fold-enrichment cutoffs with parameters “--pw 1 2 4 --ww 3 5 7 --only-anchors --maxapart 4000000”, which resulted in 8803, 5467, 11,114, and 6562 loops for ICE-normalized maps in wild-type cells (DICE, CC), ICE-normalized maps in engineered cells (DICE, AA), Raichu-normalized maps in wild-type cells (DRaichu, CC), and Raichu-normalized maps in engineered cells (DRaichu, AA), respectively. The second setting used a more relaxed significance threshold with parameters “--pw 1 2 4 --ww 3 5 7 --only-anchors --maxapart 4000000 --siglevel 0.1 --sumq 0.05 --double-fold 1.5 --single-fold 1.75”, which resulted in 13,433, 8688, 17,481, and 10,697 loops for ICE-normalized maps in wild-type cells (LICE, CC), ICE-normalized maps in engineered cells (LICE, AA), Raichu-normalized maps in wild-type cells (LRaichu, CC), and Raichu-normalized maps in engineered cells (LRaichu, AA), respectively.

  2. 2.

    The loop sets DICE, CC, DICE, AA, DRaichu, CC, and DRaichu, AA were defined as the final loops detected for the corresponding contact maps, while the loop sets LICE, CC, LICE, AA, LRaichu, CC, and LRaichu, AA were used to determine whether a loop from one set was potentially detectable in another contact map. For example, to check whether a loop (\(i\), \(j\)) from DICE, CC was detectable in the Raichu-normalized contact maps for wild-type GM12878 cells, we searched for a loop (\(i\hbox{'}\), \(j\hbox{'}\)) in LRaichu, CC such that \({|i}-i\hbox{'}| < \min (0.2\times {|i}-{j|},\,50{kb})\) and \({|j}-j\hbox{'}{|} < \,\min (0.2\times {|i}-{j|},\,50{kb})\).

Identification of chromatin loops conserved between mouse and human

For each loop (\(i\), \(j\)) identified from mNPC Hi-C data, we extracted the genomic coordinates of its two anchors and used HiCLift (v1.0) to convert the coordinates from the mouse genome (mm10) to the human genome (hg38). A loop was considered conserved if it met the following criteria (script available on GitHub at https://github.com/XiaoTaoWang/Raichu/tree/main/benchmark-analysis):

  1. 1.

    Both anchors were successfully mapped to the same chromosome in the target genome (hg38).

  2. 2.

    The genomic distance between the two anchors remained relatively stable after conversion. Specifically, if the converted coordinates were (\(i\hbox{'}\), \(j\hbox{'}\)), the following condition had to be satisfied \(0.8\times {|i}-{j|} < {|i}\hbox{'}-j\hbox{'}| < 1.25\times {|i}-{j|}\).

  3. 3.

    A matching loop existed in hNPC Hi-C data. That is, there was a loop (\(x\), \(y\)) in hNPC such that the Euclidean distance between (\(x\), \(y\)) and (\(i\hbox{'}\), \(j\hbox{'}\)) was less than \(\min (0.2\times {|i}\hbox{'}-j\hbox{'}|,\,50{kb})\).

Single-cell Hi-C data analysis

The GM12878 single-cell Hi-C dataset in.pairs format was downloaded from GEO (accession code: GSE240128). Contact pairs were processed using HiCLift (v1.0; https://github.com/XiaoTaoWang/HiCLift) to generate an.mcool file for each single cell, with the parameters “--output-format cool --in-assembly hg38 --out-assembly hg38”. Pseudo-bulk Hi-C maps were generated using the “cooler merge” command.

Loops were identified using pyHICCUPS (v0.3.9; https://github.com/XiaoTaoWang/HiCPeaks) at 5-kb, 10-kb, and 25-kb resolutions individually, with the parameters ““--pw 1 2 4 --ww 3 5 7 --only-anchors --maxapart 4000000”. Detected loops were then combined across resolutions using the “combine-resolutions” command from the HiCPeaks Python package (v0.3.9; https://github.com/XiaoTaoWang/HiCPeaks), with the parameters “-G 10000 -M 100000 --max-res 25000”.

To evaluate the precision of the detected loops, the union set of interactions compiled from orthogonal 3D genomic datasets (including Hi-C, CTCF ChIA-PET, RNAPII ChIA-PET, CTCF HiChIP, H3K27ac HiChIP, SMC1A HiChIP, H3K4me3 PLAC-Seq, TrAC-Loop, and HiCAR) was used as the reference (Supplementary Data 1 and 6). For recall evaluation, a high-confidence interaction set was defined as the reference, where each interaction was detectable in at least three of the nine orthogonal datasets. F1 scores were calculated using the following equation:

$${F}_{1}=\frac{2\cdot ({{\rm{precision}}}\cdot {{\rm{recall}}})}{{{\rm{precision}}}+{{\rm{recall}}}}$$
(5)

Aggregate peak analysis

To evaluate the overall enrichment of chromatin loop signals in contact maps, we performed Aggregate Peak Analysis (APA). For a given list of chromatin loops or interactions, contact frequencies were extracted from \(w\times w\) submatrices centered at the two-dimensional coordinates of each loop. Each submatrix was normalized by dividing all values by its mean. To reduce the influence of ourliers, submatrices with average signal intensities above the 99th percentile or below the 1st percentile were excluded. The average signal at each position was then computed across all loops and visualized as a heatmap. In Fig. 2h, APA was performed on contact matrices at 5-kb resolution with a window size (\(w\)) set to 15. In Fig. 5d, APA was performed on 10-kb resolution contact maps with a window size of 11. In Fig. 3e, APA was applied to GAM matrices at 30-kb resolution and DNA SPRITE matrices at 10-kb resolution, both with a window size of 11.

The APA z-score was calcualted by comparing the center pixel value to the average signal in the lower-left \(3\times 3\) corner of the plot.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.