Figure 3

Comparison of feature selection performance on 500 simulated datasets. The median number of true positive variables as a function of the total number of selected genes as well as the median of PR-AUC and its standard deviation are shown for ECAR, CAR, SIS, ridge, lasso and stability selection under five \(R^{2}\) scenarios. The total number of influential genes is 30, which are randomly selected from the first 300 genes (first block). Parameter \({\upalpha }\) of ECAR is estimated using the methods described in the Methods section. The regularization parameter of ridge and lasso is estimated using fivefold cross-validation and generalized cross-validation, respectively. As lasso cannot select more variables than the sample size, we let it choose genes randomly when all genes in the output selected set are chosen. (a) \(R^{2}\) is controlled at 0.95 for the 100 simulated datasets. (b) same as (a), \(R^{2}\) controlled at 0.8. (c) Same as a, \(R^{2}\) controlled at 0.6. (d) same as (a), \(R^{2}\) controlled at 0.4. (e) Same as a, \(R^{2}\) controlled at 0.2.