Fig. 2: S2F models trained on atlases with varying cell count and read depth.
From: Evaluating single-cell ATAC-seq atlasing technologies using sequence-to-function modeling

a t-distributed stochastic neighbor embedding (tSNE) of all 92,363 cells across 44 experiments (2 experiments 10x v1 with a total of 9756 cells, 2 experiments 10x v2 with a total of 9722 cells, 5 experiments HyDrop v1 with a total of 5805 cells, 35 experiments HyDrop v2 with a total of 67,080 cells) colored by cell type, batch corrected for the used technique. Data points were randomly shuffled before plotting. b Computational design: 10x v1 and v2 datasets were combined into the 10x Genomics training dataset to compare to the HyDrop v2-based dataset, serving as training data for S2F deep learning models in k-fold cross-validation (k = 10). The model performance is validated on standard DL metrics, accessibility predictions, and mouse cortex enhancers previously validated in vivo by Ben-Simon et al. (2024). c, d Data are presented as mean values ± SD (standard deviation) across n = 10 cross-validation folds. Individual fold values are shown as black dots. c Model performance comparison (Test set Pearson correlation) of the models shown in (b). d Model performance comparison (test set Pearson correlation) of the S2F model trained on the full 10x-based cell count and cell depth compared to S2F models trained on different amounts of HyDrop v2-based data at full sequencing depth. Source data are provided as a Source Data file. e Heatmap of in vivo validated enhancers (Ben-Simon et al., 2024) identified true positives by the 10x-based and HyDrop v2-based models. Source data are provided as a Source Data file.