Figure 3

Feature selection using recursive cABC analysis in the genomics dataset for leukemia. (https://bioconductor.org/packages/golubEsets45). The data set consisted of expression data of d = 150 genes, queried from n = 47 patients with acute lymphoblastic leukemia (ALL) and n = 25 patients with acute myeloid leukemia (AML)45,46. (A) Variable importance according to a 5 × 20 nested cross-validation feature selection using random forests and the generic permutation importance provided in the "permutation_importance" method of the "sklearn.inspection" package. Bar colors indicate the selection of informative variables in different repeated selection steps using cABC analysis, from light blue = "not selected" to dark blue and black for features selected in deeper until the last repetition of cABC analysis. (B)–(D) Results of the cABC analysis of the mean variable importance. The ABC plots (blue lines) show the cumulative distribution function of the importance variables together with the identity distribution, xi = constant (magenta line), and the uniform distribution, i.e., as a stopping criterion for the repetitions of the cABC analysis. The red lines show the boundaries between the ABC subsets "A", "B" and "C". The figure was created using Python version 3.8.13 for Linux (https://www.python.org) with the seaborn statistical data visualization package (https://seaborn.pydata.org22) and our Python package "cABCanalysis" (https://pypi.org/project/cABCanalysis/).