Fig. 5: Benchmarking of IQ and leading deep learning models.
From: IceQream: Quantitative chromosome accessibility analysis using physical TF models

A Model performance (R²) for IQ trajectory models using the full set of the gastrulation manifold motifs, without interactions (blue) and with interactions (green), compared to an expanded 180-motif IQ model with interactions (red) trained on our gastrulation data. B Scatter plots comparing observed versus predicted differential accessibility (dAP) scores using IQ (left), Borzoi (middle), and DeepTopic (right) models. Points are colored by local point density from low (gray) to high (blue to red). C R² values for three models and mouse (blue) or human (red) trajectories (n = 6 mouse trajectories, n = 3 human trajectories). Gray lines connect trajectories across models. Box plots show median (center line), first and third quartiles (box bounds), and whiskers extending to 1.5× the interquartile range. Paired two-sided Wilcoxon signed-rank tests with Benjamini-Hochberg correction: Borzoi vs IQ V(8) = 19, p = 0.813, r = 0.118, 95% CI = [−0.054, 0.025]; Borzoi vs DeepTopic V(8) = 14, p = 0.813, r = 0.316, 95% CI = [−0.046, 0.033]; IQ vs DeepTopic V(8) = 20, p = 0.813, r = −0.079, 95% CI = [−0.034, 0.031]. No significant differences between models were observed. D R² values for individual models and ensembles, ranked by mean performance from best (left) to worst (right) (n = 6 mouse trajectories, n = 3 human trajectories). Points and colors as in panel (C). Gray lines connect trajectories across models. Box colors indicate model class. Box plots show median (center line), first and third quartiles (box bounds), and whiskers extending to 1.5× the interquartile range. E Kolmogorov-Smirnov test D statistics comparing motif energies in model-specific errors versus all correct predictions (n = 141, RMS < 0.05 in all models). Red bars: Borzoi errors (n = 680, RMS > 0.1 in Borzoi, <0.05 in IQ and ensemble). Green bars: IQ errors (n = 304, RMS > 0.1 in IQ, <0.05 in Borzoi and ensemble). Asterisks indicate FDR < 0.05. Significant FDR-adjusted q-values for Borzoi errors: Eomes/Tbox (3.5 × 10⁻¹⁰), Tcf (9.8 × 10⁻⁵), Hoxb2_Tbx (4.4 × 10⁻⁵), Nanog/Sox (2.0 × 10⁻⁴), Dmrta2 (8.1 × 10⁻⁴), Snai (8.1 × 10⁻⁴), Meis (3.6 × 10⁻³), Smad4 (3.6 × 10⁻³), Mesp2 (1.4 × 10⁻²), A__TGAAT (4.3 × 10⁻²); IQ errors: CAAA (1.5 × 10⁻²). Source data are provided as a Source Data file.