Table 8 Feature selection using recursive cABC analysis in sensory and genomic data for pain, quantified as balanced accuracy (and 95% nonparametric confidence interval, CI).

From: Recursive computed ABC (cABC) analysis as a precise method for reducing machine learning based feature sets to their minimum informative size

cABC times

KS-test p-value for item list

Number of features (% of all)

Median balanced accuracy (95% CI) (validation data)

Features

0

1.61·10–28

53 (100%)

0.64 (0.53–0.75)

All d = 52 sensorics variables and genetic markers

1

0.044

6 (11.3%)

0.69 (0.59–0.8)

“von Frey hairs plus capsaicin”, “blunt pressure”, “electrical”, “COMT_G472A”, “COMT4_1”

2

0.132

2 (3.8%)

0.7 (0.57–0.8)

“blunt pressure”, “electrical”,

2

2

0.55 (0.3–0.72)

As above but target assignment permuted

  1. The dataset includes subject sex, pain thresholds for heat, cold, blunt pressure, punctate pressure (von Frey hairs), and electrical stimuli with and without prior sensitization by topical application of capsaicin or menthol cream, and genetic information on 29 common variants in eight human genes reported to modulate pain, including single nucleotide variants and haplotypes, obtained from n = 125 healthy young volunteers48. Classification accuracy refers to the 20% validation sample not used for feature selection and classifier training. The cABC analysis was applied recursively ("recursive cABC analysis") to the items assigned to ABC subset "A" in the previous run, starting with the full feature set. Recursive subsets are named "A", "AA", etc. In addition, the p-values of a Kolmogorov–Smirnov test60 of the distribution of the values subjected to cABC analysis against the uniform distribution are reported.