Fig. 4: Prediction of small molecule synthesis in Saccharomyces cerevisiae.
From: Accurate prediction of gene deletion phenotypes with Flux Cone Learning

a High-throughput deletion screening data of S. cerevisiae strains engineered to produce betaxanthin37; fitness scores were determined from betaxanthin autofluorescence, normalized and binned into three classes for model training. The majority of deletions are medium producers, with ~15% of deletions being high producers. b Accuracy results for several FCL models with different algorithms for multiclass classification of deletion strains (HGB Histogram-based Gradient Boosting, SVC Support Vector Classifier, LogReg Logistic Regression, RF random forest) and various strategies to rebalance the three classes. Accuracy was computed across all three classes and shown for N = 5 training repeats. Balanced: class labels were weighted to account for class imbalance; Resampled: majority class was subsampled to be the same size as the minority classes. Class rebalancing can increase accuracy for high producers, often at the expense of overall accuracy. For full class balancing results on a held-out 20% test set (N = 659 deletions), see Supplementary Table S6. c Ternary plots of model predictions on the test set for representative models with varying predictive accuracy across the three classes. Vertices represent class prediction with probability one (full confidence), whereas central points are deletions predicted to be equally likely to be any of the three classes. Each sample has been color-coded according to its ground truth class labels. Source data are provided as a Source data file.