Fig. 4: Performance evaluation on driver regulator identification.

a–h The performance on the mESC (a–d) and hESC (e–h) datasets, measured in terms of the precision of the top-k predicted genes among all known genes in the four ground-truth gene sets. All the results with k ranking from 1 to 20 were reported. The shaded area represents the variation (mean ± s.d.) of precision over 20 repeats. i Venn diagrams about the MDS-derived driver genes, MFVS-derived driver genes, and the top-ranked genes according to the influence scores derived by CEFCON. j The top-20 predicted driver regulators on the hESC dataset sorted in descending order according to their influence scores. The genes belonging to each ground-truth gene set are presented below the bar chart. k UMAP visualization of the gene embeddings output by CEFCON on the hESC dataset. Gene embeddings were clustered by the Leiden method59 with a low resolution. The top-ranked driver regulators were mainly in cluster 1, marked with a black circle. Cluster 1 is further zoomed in with a higher resolution, and the genes belonging to the top-20 driver regulators in individual sub-clusters are also marked. l The top enriched GO terms of the genes in sub-clusters 7 (yellow), 8 (green), and 9 (blue) in k, respectively. The p-values were measured by one-sided Fisher’s exact test, adjusted for multiple hypothesis testing using the Benjamini-Hochberg false discovery rate method. Source data are provided as a Source Data file.