Fig. 2: Semi-synthetic experiment. | Nature Methods

Fig. 2: Semi-synthetic experiment.

From: Deep generative modeling of sample-level heterogeneity in single-cell genomics

Fig. 2: Semi-synthetic experiment.

a, Experimental design. We created a semi-synthetic dataset with 5 subsets of cells and 32 study subjects (subj.), containing sample-specific differential-expression (exp.) and differential-abundance effects. In cell subset A, cells have differences in gene expression, on the basis of the value of a study subject covariate, covariate 1. These DE effects stratify synthetic samples according to a known hierarchy. In subsets B and C, cells have differences in abundance between study subjects, on the basis of the value of a second study subject covariate, covariate 2 (four categories (cat.) in total). Cells in subset B or C were over- or under-sampled, respectively, according to fixed rates in each category, such that the sum of cells from B and C remained constant. Stars indicate categories with strong resampling. There are no DE or DA effects across study subjects in other cell subsets. b, Minimum distortion embeddings (MDEs) of MrVI u and z latent spaces, colored by cell subset assignments and covariate 1 categories. c, MrVI’s distance matrices aggregated over cell subsets A and B. d,e, DA analysis using MrVI and Milo for the comparison of synthetic samples based on covariate 2 categories (categories with strong DA in cell subsets B and C versus rest (see a)). d, A u latent space MDE, colored by log density ratios comparing the subset population (star) with the remaining (rest) population (Equation (4)). Enr., enriched. e, Precision–recall curves with areas under the curve (AUCs, higher is better) for identifying DA cells. Cells in subset B or C are true positives; other cells are true negatives. We used the absolute value of the log density ratio for MrVI and the absolute value of the LFC produced by Milo as scores to estimate precision–recall. f,g, DE analysis using MrVI and miloDE comparing group 1 of the simulation (4 blue figures) against all other samples. We inferred which cells showed DE effects for the comparison of synthetic samples based their covariate 1 assignment (blue versus rest). f, u latent space MDE colored by the squared norm of βn, appearing in Equation (3), which quantifies the overall sample covariate effect on gene expression. g, Comparison of miloDE and MrVI LFCs versus DESeq2 reference, reporting Pearson’s r for each method.

Back to article page