Fig. 2: COMPOSITE model fitting performance on single-omics data.
From: A unified model-based framework for doublet or multiplet detection in single-cell multiomics data

The data for a–e are from the RNA modality of the PB-1 dataset. a–c Using one of the stable RNA features (RPL11) to demonstrate how the compound Poisson-gamma distribution captures the ground truth distribution of the recorded expression level for different underlying multiplet statuses. a Observed overall distribution. b Observed distribution stratified by ground truth multiplet status. c Fitted Gamma distributions associated with each multiplet status. The parameters of the Gamma distributions were estimated by the compound Poisson-Gamma model. d Histogram comparing ground truth multiplet status distribution vs. multiplet status distribution simulated using Poisson(0.20), which is the fitted Poisson component for this dataset in the COMPOSITE framework. e Contingency table comparing ground truth vs. predicted multiplet status from COMPOSITE. The numbers on each intersection point of the grids represent the number and proportion of droplets that belong to the corresponding category, and the sizes of the dots on the grid intersections represent the magnitudes of the corresponding numbers. f Scatter plot displaying the relationship between the goodness-of-fit and the prediction performance in terms of F1 score. Each dot represents the prediction made using one modality of a specific dataset from the 17 in-house DOGMA-seq datasets. Source data are provided as a Source Data file.