Fig. 1: Overview of the multi-attribute subset selection (MASS) approach.

MASS takes as input a matrix (X) of observations of samples by attributes, in this case the phenotypes of n organisms under m different environmental conditions. For each fixed number of predictor variables, p, MASS provides as output a binary vector z indicating the predictor variables that predict the remaining response variables with the highest accuracy. Subsequently, the labeled data can be used to build models using supervised learning methods, such as random forest models, as done in this study.