Fig. 6: Descriptor performance for adsorbate-modified MgO surfaces.

Box plots represent the overall training and validation RMSE, computed using the l0 norm method to refine the total descriptor set identified with each method to a model containing the number of descriptors that yields the lowest validation error, of the predictive models obtained from feature selection methods, i.e., a LASSO, b Horseshoe prior, and c Dirichlet–Laplace prior. Box plots reflect data from 50 trials of randomly split training and validation sets. Parity plots of models generated with d LASSO, e Horseshoe prior, and f Dirichlet–Laplace prior, where training and validation data are presented as solid and hollow points, respectively. The training and validation RMSE for d–f are indicated in a–c as orange diamonds. 13D, 3D, and 5D denote the number of descriptors in each model (i.e., the number of free coefficients used to fit the linear regression). The center line, upper bound, lower bound, upper whisker, and lower whisker in box plots represent median, 75th percentile, 25th percentile, maximum, and minimum, respectively.