Fig. 8: The inclusion of non-orthologous genes improves the integration of cross-species data.

We use UINMF to include both orthologous and non-orthologous genes when integrating the datasets (a), and demonstrate the alignment between the two datasets (b). We also confirmed cell type correspondence by examining only the mouse cells (c) and only the lizard cells (d), both labeled with their published cell labels. To show the advantage of including the non-orthologous genes, we show the difference in ARI (e) and purity (f) scores using the originally published mouse labels, comparing UINMF performance to iNMF (P = 3.626 × 10−9, P = 6.258 × 10−4), Seurat (P < 2.2 × 10−16, P = 3.047 × 10−11), and Harmony (P < 2.2 × 10−16, P = 2.815 × 10−12). We compare algorithm performance using a paired, one-sided Wilcoxon test, where n = 200 ARI (purity) measures. We also confirm a similar trend in the ARI (g) and purity (h) scores using the original lizard labels to assess performance differences between UINMF and iNMF (P = 1.145 × 10−6, P = 0.07157), Seurat (P < 2.2 × 10−16, P = 0.8148), and Harmony (P = 1.674 × 10−5, P < 2.2 × 10−16). For nondeterministic algorithms, data are presented as mean values +/− SEM.