Fig. 1: IceQream overview.
From: IceQream: Quantitative chromosome accessibility analysis using physical TF models

A Schematic of the IceQream (IQ) workflow: (i) Single-cell ATAC raw counts are transformed into estimated access probabilities (AP). (ii) The IQ model incorporates transcription factor (TF) models, epigenomic context variables, and pairwise interactions of TF models. Each TF model integrates contributions from strong and weak affinity sequences, weighted by spatial preferences around the accessible hotspot, which are transformed into dose-response-like spatial binding preference curves using pre-defined non-linear functions. (iii) Model initialization involves scanning candidate TF models from PSSM (position-specific scoring matrix) databases and de novo motif regression. (iv) An integrated IQ model predicts differential AP (dAP) across a selected manifold trajectory. (v) IQ models from multiple trajectories are fused to create a manifold-wide set of common TF motif models. B Normalization steps from raw ATAC-seq data on peaks to access probabilities (AP, left to right): Raw counts, region-normalized counts, constitutive-loci-normalized counts, and final APs for mouse gastrulation (top) and human hematopoiesis (bottom) datasets. Black points represent the constitutive loci. The red dashed line indicates the threshold for loci with AP = 1 (-15.3 for mouse gastrulation, −12.4 for human hematopoiesis). C AP for various cell types compared to epiblast in mouse gastrulation (top) and compared to HSC in human hematopoiesis (bottom) manifolds. Red and blue dots represent loci that opened or closed during the trajectory; gray dots show loci that did not change; and orange dots represent loci with a small change (dAP ≤ 0.4). D Examples of scATAC-seq signal at specific genomic loci before and after region normalization in mouse (left) and human (right) genomes. Top panels show the raw scATAC-seq signal (total number of reads). Bottom panels show the signal after region normalization. This is calculated as the raw signal divided by the mean ATAC signal in a 20 kbp window around it, excluding the center 1 kbp window. Dashed horizontal lines indicate the threshold for peak calling. Red shaded areas denote called peaks. Blue dashed vertical lines represent transcription start sites (TSS). Source data are provided as a Source Data file.