Table 1 Performance of different simulators in generating realistic scRNA-seq data using the PBMC-CTL dataset

Simulator	Cosine distance	Euclidean distance	MMD	RF AUROC	miLISI
GRouNdGAN	0.00057	182	0.026	0.54	1.891
scGAN²⁴	0.00095	222	0.031	0.59	1.888
scDESIGN2²⁵	0.00100	229	0.065	0.76	1.736
SPARsim²⁶	0.00104	235	0.309	0.95	1.625
Control	0.00019	99	0.012	0.50	1.909

The metrics are calculated between a simulated dataset of 1000 cells and the held-out test set of 1000 real cells (see Supplementary Data 1 – Sheet 2 for training set performance). Each gene in the imposed GRN of GRouNdGAN is regulated by 15 TFs (constructed using GRNBoost2 from the experimental training set). For the first three metrics, a value closer to zero is preferred, for RF AUROC a value closer to 0.5 is preferred, and for miLISI a value closer to 2 is preferred. For the first two metrics, the values correspond to the distance of the mean centroids of the real and simulated cells. The RF AUROC of control corresponds to perfect performance (of a random classifier). The other control metrics are calculated using the two halves of the real test dataset. Best performance values (excluding control) are in bold-face.

Quick links

Search