Fig. 2: Addition of intergenic peak information improves integration of RNA and ATAC datasets. | Nature Communications

Fig. 2: Addition of intergenic peak information improves integration of RNA and ATAC datasets.

From: UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization

Fig. 2

a Schematic illustrating how the UINMF algorithm incorporates intergenic peaks when separately integrating the RNA and ATAC measurements from a SNARE-seq dataset. We treat each data type as if it came from an independent source, and perform an integration using regular iNMF and our proposed UINMF method, which incorporates intergenic peaks. b Average alignment and FOSCTTM (Fraction of Samples Closer Than True Match) scores for iNMF, Seurat v3, Harmony, and UINMF. iNMF and UINMF are both initialized 5 different times over ten random seeds, with UINMF including an additional 7,000 intergenic features into the analysis. For nondeterministic algorithms, data are presented as mean values +/− SEM. To compare algorithm performance, we used a paired, one-sided Wilcoxon test to compare UINMF’s alignment and FOSCTTM scores to iNMF (P = 1.953 × 10−3P = 1.953 × 10−3), Seurat (P = 1.953 × 10−3, P = 0.01855), and Harmony (P = 9.766 × 10−4, P = 9.766 × 10−4), with Seurat exhibiting a significantly lower FOSCTTM score. For each algorithm, we compare 10 pairs of data points (n = 20). We factorize and cluster the cells using their RNA transcripts (c) and chromatin accessibility measures (d) separately. After integration, we use the known cell correspondences to separately plot the gene expression (e) and chromatin accessibility datasets (f) from SNARE-seq, colored by the same cell type labels. We assess the contribution of information contained within the intergenic peaks by assessing the alignment (g) and FOSCTTM (h) scores across a range of included peaks, from 0 unshared features (iNMF) to 7000 unshared intergenic bins, adding 1000 unshared features to each analysis. The bold line indicates the median data value, and the boundaries of each box are defined by the first and third data quartiles (25 and 75%, respectively). The upper (lower) whiskers extend from to highest (lowest) point within 1.5 of the interquartile range. Outliers beyond the whiskers are plotted as points. We calculate FOSCTTM and alignment scores for ten random seeds for each number of unshared features (n = 10).

Back to article page