Fig. 4: MASS is applicable for categorical data.

a Growth phenotypes of 492 yeast species (rows) grown on 39 different carbon sources (columns)27. Positive growth on a specific carbon source is indicated in yellow; variable, weak, or delayed growth is indicated in teal; and negative (no) growth is indicated in purple. Marginal bar charts summarize the phenotype frequency for each species or each carbon source respectively. b Matrix showing MASS result in which carbon sources were used as a predictor (gray) or as a response (black) as a function of the total number of predictors allowed (parameter p). Growth in glucose was excluded from the MASS analysis. c Shannon entropy of each carbon source. b, c Media are arranged in descending order of how frequently they were used as predictors. d Average Matthews correlation coefficient (MCC) of random forest classifiers for each number of predictors, p. The classifiers were trained either using the MASS selection of predictors (blue), predictor sets selected based on maximum Shannon entropy (red), or 300 random draws of conditions used as predictors (green). Each point represents the mean MCC obtained via threefold cross-validation; the thick lines are the mean of those means across all MCC values for a respective p. Source Data for Fig. 4c, d is available in Supplementary Data 4.