Extended Data Fig. 2: Evaluating the performance of CNV normalization of Hi-C data by simulation.

The inputs in our simulation are Hi-C data in GM12878 (normal lymphoblastic cells) and K562 (chronic myeloid leukemia) cells, and the CNV profiles in K562 cells. a, Our algorithm learns the trans-/cis-scaling factors separately for all possible copy number pairs from K562 Hi-C data. The CNV effects of K562 are then imposed on GM12878 Hi-C by linearly transforming the signals with the factor of corresponding copy number pairs. The resulting simulated GM12878 Hi-C matrix with K562 CNV is highly similar to the original K562 Hi-C matrix. b, We applied ICE and the newly designed CNV normalization method in this project to the simulated matrix from Supplementary Fig. 2a (GM12878 Hi-C matrix with CNVs in K562). By visual inspection, the CNV normalized Hi-C is more similar to the original GM12878. c, We used HiCRep to calculate the Stratum-adjusted Correlation Coefficients (SCCs) between ICE normalized and CNV normalized matrix to the original GM12878 Hi-C data. The distributions of SCC scores are presented in box-and-whisker plots, where the box represents the interquartile range (IQR, Q3-Q1), the horizontal thick line represents the median, the upper whisker extends to the last datum less than Q3 + 1.5×IQR, and the lower whisker extends to the first datum greater than Q1-1.5×IQR. Each dot represents an individual chromosome.