Fig. 2: Benchmark of SCALPEL using synthetic data.
From: Quantification of transcript isoforms at the single-cell level using SCALPEL

a UMAP plots depicting the two cell populations, population A (orange) and population B (green), simulated with Splatter. b Graphical representation of the types of genes included in the simulated data. c Violin plots showing the relative number of UMIs per isoform in a reference 10x dataset (gray) and those in the simulated datasets with different dropout rates (high, mid and low depicted in dark, medium and light purple). The real dataset contains information for 40,750 isoforms across 2042 cells. 12,320 isoforms in 6000 cells were simulated for the high, mid and low datasets. Differences in the distribution of UMIs per isoform were tested using Mann–Whitney–Wilcoxon two-sided test. The center of the box plot is denoted by the median, a horizontal line dividing the box into two equal halves. The bounds of the box are defined by the lower quartile (25th percentile) and the upper quartile (75th percentile). The whiskers extend from the box and represent the data points that fall within 1.5 times the interquartile range (IQR) from the lower and upper quartiles. Any data point outside this range is considered an outlier and plotted individually. p-Values for pairwise comparisons (Bonferroni-adjusted) are: High vs. real p-value = 0.86; high vs. mid p-value = 4.40e−259; high vs. low p-value = 2.22e−308; mid vs. low p-value = 2.6e−174. Correlation between simulated isoform abundances (y-axis) and predicted isoform abundances (x-axis) for the high (d), medium (e) and low (f) sequencing depth simulated datasets. g–i Number of correctly identified genes (g), isoforms (h), and DIU genes (i) by each of the sequencing tools in the high (dark purple), medium (medium purple) and low (light purple) simulated datasets. As reference, we provide the number of simulated genes, isoforms and DIU genes simulated (gray bar). j Summary of the performance of the different tools benchmarked on the three synthetic datasets. Source data are provided as a Source Data file.