Extended Data Fig. 5: Design and benchmarking of scDeepLUCIA for scHiCAR loop calling.

a. Bar plot showing the number of open chromatin (R2) peaks overlapping with the top 50,000 chromatin loops identified by the indicated loop callers from the GM12878 scHiCAR dataset that are high-depth (148M reads) and downsampled (15M). b. Bar plot showing the number of loops with CTCF peaks at both anchors (left) and the number with convergent CTCF motifs (right) in the top 50,000 loops called by each method. If fewer than 50,000 loops were detected, all available loops were used. c. Receiver operating characteristic (ROC) curves comparing scDeepLUCIA (blue), Peakachu (green), and CovNorm (magenta) performance on loops identified in a downsampled (30M) mouse astrocyte bulk HiCAR dataset. Methods unable to detect loops (HiCCUPS, HiCExplorer, MAPS) are represented with AUROC = 0. d. Left: Bar plots showing the percentage of 5kb loops called by scDeepLUCIA and Peakachu that are also recovered at 10kb and 50kb resolution (Peakachu does not support 50kb) from H1-hESC (226M) and GM12878 (227M) bulk HiCAR data. Middle and right: The fraction of multi-resolution loops overlapping with open chromatin peaks (middle) or CTCF peaks with motif orientation (right) in GM12878. Input sequencing depth was adjusted to match each method’s optimal range. e. ROC curves and AUROC scores for scDeepLUCIA models trained on either mouse (7 brain cell types) or human (5 cell lines) bulk HiCAR datasets, tested on downsampled mouse astrocyte (30M) and human HepG2 (30M) bulk HiCAR datasets. f. ROC curve and AUROC for a scDeepLUCIA model trained on mouse brain scHiCAR data (6 most abundant cell types: L23IT-1, L45IT, L6CT, L6b, Oligo, and Astrocyte) tested on the human HepG2 bulk HiCAR dataset. g. ROC curve and AUROC of the scDeepLUCIA model (trained on mouse bulk HiCAR) evaluated on downsampled GM12878 H3K27ac HiChIP data (3.91M). h. Venn diagram showing the overlap between scDeepLUCIA and Peakachu-identified loops from full-depth GM12878 H3K27ac HiChIP data. Overlap significance was assessed by empirical P-value (P < 2.2e-6).