Fig. 2: MASS successfully identifies predictor features for a small microbial growth dataset.

a Sixty-five different marine heterotrophic bacterial strains (rows) were grown individually on 11 different media (columns) (see ref. 21 for details) including Difco Marine Broth (difcoMB), eight engineered media with single classes of carbon sources (HMBpep = peptides; HMBaa = amino acids; HMBlips = lipids; HMBoligo = oligosaccharides; HMBorg = organic acids; HMBntrl = neutral sugars; HMBamisug = amino sugars; HMBacdsug = acidic sugars), a defined medium containing all eight carbon classes (HMBcmpt), and a medium with no added carbon sources (HMB–). The names of the different strains are available in Supplementary Data 1. b Matrix showing which media were used as a predictor (gray) or as a response (black) as a function of the total number of predictors allowed (parameter p). c Shannon entropy of each medium. b, c Media are arranged in descending order of how frequently they were used as predictors. d Average Matthews correlation coefficient (MCC) of random forest classifiers for each number of predictors, p. The classifiers were trained either using the MASS selection of predictors (blue), predictor sets selected based on maximum Shannon entropy (red), or 300 random draws of conditions used as predictors (green). Each point represents the mean MCC obtained via fivefold cross-validation; the thick lines are the mean of those means across all MCC values for a respective p. Source Data for Fig. 2c, d is available in Supplementary Data 4.