Fig. 3: Evaluation of the doublet detection methods using the realistic synthetic datasets on DEG analysis and pseudotime analysis.

a The dataset of labelled DEGs was processed by each doublet detection method, and the top 40% of cells based on the doublet score were excluded. Then, the DEGs were detected using MAST19 and Wilcoxon rank-sum tests18. Taking the DEGs as positive, three accuracy measures (i.e., the TPR, TNR and accuracy) were calculated. b, c After processing the dataset for the pseudotime analysis using each doublet detection method, the top 20% of cells according to the doublet score were excluded. Monocle (B) Slingshot (C) were used to construct the trajectories of these results. d The UMAP was embedded for the two realistic synthetic datasets (DataDEG and DataPSE), in which the doublets are shown in red and the singlets are shown in grey. DataDEG is a simulation dataset containing two synthetic cell types, including 1667 cells, 40% of which are correctly labelled doublets. DataPSE consists of 600 cells, 20% of which are synthetic labelled doublets containing a bifurcating trajectory. e The AUC of each method on DataDEG and DataPSE and their ROC curves.