Fig. 2: GEDI captures sample-to-sample variability. | Nature Communications

Fig. 2: GEDI captures sample-to-sample variability.

From: A unified model for interpretable latent embedding of multi-sample, multi-condition single-cell data

Fig. 2: GEDI captures sample-to-sample variability.The alternative text for this image may have been generated using AI.

a UMAP embedding of the sample-specific manifold distortions learned by GEDI for the PBMC dataset. Each sample was encoded using the set of sample-specific manifold parameters learned by GEDI (excluding sample-specific translation vectors Δoi), followed by regressing out the effect of technology from sample-specific parameters post-hoc, selection of the top 20 most variable parameters, PCA, and UMAP. Each dot represents one sample, labeled by donor (left) or single-cell technology (right). Only technologies with more than one sample are displayed. See Supplementary Fig. 1a for details and results when the effect of donor is regressed out, and Supplementary Fig. 2a for other choices of top variable features. b UMAP embedding of the cells in the PBMC dataset after integration with GEDI (K = 40). Each dot represents one cell, colored by the cell type labels from ref. 15 (left) or by sample (right). Also see Supplementary Fig. 1b-f. c Overall ranking score comparing the performance of various integration methods over a range of latent factors (K), applied to the PBMC, Pancreas and Tabula Muris datasets. The score reflects the ability to remove technical effects while preserving biological variability, similar to ref. 16 (see Methods and Supplementary Data 1 for details, and Supplementary Fig. 3 for additional comparisons). d PCA embedding of the sample-specific manifold distortions learned by GEDI for the COVID-19 dataset. Samples were first encoded using the sample-specific parameters, similar to (a), followed by regressing out the effect of cohort and selection of the top 20 most variable parameters for PCA. Each dot represents a sample, labeled by the disease group (left) or the cohort of origin (right). e Receiver operating characteristic (ROC) curves assessing the classification between COVID and control cases in the COVID-19 dataset. For the classification task, a Support Vector Machine (SVM) was trained using the top 20 most variable parameters learned by GEDI. Left: SVM was trained with data from cohort 2 and tested on cohort 1. Right: SVM was trained with cohort 1 data and tested on cohort 2. See also Supplementary Fig. 2b. Source data are provided as a Source Data file.

Back to article page