Accurate prediction of gene deletion phenotypes with Flux Cone Learning

Merzbacher, Charlotte; Mac Aodha, Oisin; Oyarzún, Diego A.

doi:10.1038/s41467-025-63436-9

Download PDF

Article
Open access
Published: 26 September 2025

Accurate prediction of gene deletion phenotypes with Flux Cone Learning

Charlotte Merzbacher¹,
Oisin Mac Aodha¹ &
Diego A. Oyarzún ORCID: orcid.org/0000-0002-0381-5278^1,2

Nature Communications volume 16, Article number: 8492 (2025) Cite this article

2328 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

Understanding the impact of gene deletions is crucial for biological discovery, biomedicine, and biotechnology. Due to the complexity of genome-wide deletion screens, there is growing interest in computational methods that leverage existing screening data for predictive modeling. Here, we present Flux Cone Learning, a general framework designed to predict the effects of metabolic gene deletions on cellular phenotypes. Using Monte Carlo sampling and supervised learning, our approach identifies correlations between the geometry of the metabolic space and experimental fitness scores from deletion screens. Flux Cone Learning delivers best-in-class accuracy for prediction of metabolic gene essentiality in organisms of varied complexity (Escherichia coli, Saccharomyces cerevisiae, Chinese Hamster Ovary cells), outperforming the gold standard predictions of Flux Balance Analysis. We demonstrate the versatility of our approach by training a predictor of small molecule production using data from a large deletion screen. Flux Cone Learning can be applied to many organisms and phenotypes, without the need to encode cellular objectives as an optimization task. Our work offers a broadly applicable tool for phenotypic prediction and lays the groundwork for building metabolic foundation models across the kingdom of life.

Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality

Article Open access 06 March 2024

Genetically personalised organ-specific metabolic models in health and disease

Article Open access 29 November 2022

Local flux coordination and global gene expression regulation in metabolic modeling

Article Open access 14 September 2023

Introduction

Gene deletions impact cellular phenotypes in multiple ways, affecting how cells function, proliferate, and interact with their environment. Understanding the effect of such deletions is fundamental for basic biological discovery and a variety of applications. For example, identifying lethal deletions is key for developing new cancer therapies¹ or antimicrobial treatments that avoid drug resistance². In biotechnology, nonlethal deletions are a powerful strategy to redirect chemical flux toward production of high-value compounds for the food, energy, and pharmaceutical sectors, using genetically engineered cells as an alternative to petrochemicals³. Thanks to progress in high-throughput technologies such as RNAi or CRISPR-Cas9^4,5,6, genome-wide deletion screens have revealed foundational insights in numerous domains, including the genetic basis of disease^7,8, drug target discovery⁹, and genetic engineering¹⁰. Due to the cost and complexity of deletion screens, computational methods hold substantial promise to complement experimental approaches, for example by filling gaps in coverage, extrapolating predictions to new variants or conditions, or aiding experimental design.

In the case of metabolic genes, the gold standard is Flux Balance Analysis (FBA), a computational method that predicts metabolic phenotypes by combining genome-scale metabolic models¹¹ (GEM) with an optimality principle¹². This technique can model many metabolic tasks, such as growth capabilities in various substrates¹³, cell-specific auxotrophies¹⁴, or responses to drug interventions¹⁵. FBA is particularly effective at predicting gene essentiality in microbes, i.e., whether a gene deletion leads to cell death. For various model microbes¹⁶, FBA predicts metabolic gene essentiality with high accuracy, but its predictive power drops when applied to higher-order organisms where the optimality objective is unknown or nonexistent^17,18,19. Other methods build on FBA to extend essentiality predictions to other relevant tasks; for example, gene Minimal Cut Sets²⁰ were developed to identify combinations of deletions that block specific cellular functions, with particular success for predicting synthetic lethal genes in cancer²¹. Alternative strategies for essentiality prediction include network-based methods²² and sequence-based approaches that employ machine learning to extract predictive features from DNA or protein sequences^{23,24,25,26,27}.

Here, we describe Flux Cone Learning (FCL), a versatile machine learning strategy for predicting deletion phenotypes from the shape of the metabolic space. Through Monte Carlo sampling, our method utilizes mechanistic information encoded in a GEM to produce a large corpus of training data for each deletion. These data can be paired with experimental fitness readouts for a phenotype of interest and then employed for training predictive models with supervised learning. FCL can be adapted to multiple prediction tasks, provided that the fitness scores correlate with metabolic activity. This includes prediction of metabolic signals already encoded in a GEM, e.g., growth rate or the activity of specific pathways, as well as nonmetabolic readouts absent from the model but associated with metabolic activity. We show that FCL produces the most accurate predictions of metabolic gene essentiality, surpassing FBA predictions in all tested organisms. Crucially, FCL predictions do not require an optimality assumption and thus can be applied to a broader range of organisms than FBA. We demonstrate the flexibility of FCL for predicting other deletion phenotypes by building a predictor of small-molecule synthesis from deletion screen data.

Results

Learning the shape of the metabolic space

Our approach is based on learning the shape of the metabolic space of an organism through random sampling. FCL has four components (Fig. 1): a GEM, a Monte Carlo sampler to produce features for model training, a supervised learning algorithm trained on fitness data, and a score aggregation step. A GEM is defined by:

$${{{\bf{Sv}}}} \,=\, 0,$$

(1)

$${V}_{i}^{\,{\mbox{min}}\,}\le \, {v}_{i} \, \le {V}_{i}^{\max },$$

(2)

where S is an m × n integer matrix describing the metabolic stoichiometry, v is an n-dimensional vector of metabolic fluxes, and $({V}_{i}^{\,{\mbox{min}}},{V}_{i}^{{\mbox{max}}\,})$ are flux bounds that can be used to model gene deletions through a gene-protein-reaction (GPR) map. Upon deletion of gene g_j, the GPR determines which flux bounds need to be zeroed out in the GEM, i.e., by setting ${V}_{i}^{\,{\mbox{min}}\,}={V}_{i}^{max}=0$ in Eq. (1); a single gene deletion can affect more than one reaction flux in the GEM. From a geometric standpoint, a GEM defines a convex polytope in a high-dimensional space, which is known as the flux cone of an organism²⁸. The dimensionality of the cone equals that of the null space of S, which for current GEMs can be up to several thousand dimensions depending on model complexity (Supplementary Fig. S1).

**Fig. 1: Flux cone learning of metabolic deletion phenotypes.**

FCL relies on the observation that gene deletions perturb the shape of the flux cone, because zeroing out the flux bounds in Eq. (2) alters the boundaries of the polytope. The correlations between such geometric changes and a phenotype of interest can then be learned with supervised learning algorithms trained on experimental fitness scores. To test if the shape differences between flux cones can be captured from random samples, we first sampled five metabolically diverse pathogens (Bordetella pertussis, Pseudomonas aeruginosa, Helicobacter pylori, Mycobacterium tuberculosis, Streptococcus pneumoniae) from programatically generated GEMs to avoid confounders introduced by variations in model quality²⁹. We trained a variational autoencoder³⁰ based on neural networks to compute low-dimensional representations of each species cone³¹, using a large set of Monte Carlo samples of metabolic reactions shared across the five species and removing species-specific reactions. The learned representations are well separated across species, despite being trained on reactions shared by diverse species (Supplementary Fig. S2). This suggests that the cone geometry can be learned from Monte Carlo samples, and offers a path toward the construction of metabolic foundation models across many species and genomic perturbations.

To train predictive models of deletion phenotypes, FCL utilizes a Monte Carlo sampler to capture the shape of each deletion cone (Fig. 1). A supervised machine learning model is then trained on the flux samples alongside measured phenotypic fitness labels for each deletion; all samples in a deletion cone get assigned the same label. FCL does not prescribe the choice of machine learning model and can be applied to both regression and classification tasks. The feature matrix for model training has k × q rows and n columns, where k is the number of gene deletions, q is the number of flux samples per deletion cone, and n is the number of reactions in the GEM. This approach leads to large datasets; for example, in the case of the iML1515 model of Escherichia coli¹³, acquiring 100 Monte Carlo samples for the 2712 reactions and 1502 gene deletions leads to a dataset over 3Gb in single-precision floating-point format. In the final step, FCL aggregates sample-wise predictions with a majority voting scheme to produce deletion-wise predictions.

Best predictive accuracy of metabolic gene essentiality

We first tested FCL as a predictor of gene essentiality in E. coli, which has the best curated GEM in the literature. This evaluation allows mitigating the impact of poor GEM quality on the FCL predictive performance. When tested across different carbon sources, FBA delivers a maximal accuracy of 93.5% correctly predicted genes for E. coli growing aerobically in glucose with biomass synthesis as optimization objective¹³. We employed FCL using N = 1202 gene deletions (80%) with q = 100 samples/cone for training a binary classifier of gene essentiality; the biomass reaction was removed from training to prevent the model from learning the correlation between biomass and essentiality that support FBA predictions (Supplementary Fig. S3 and Supplementary Table S3). This led to a training dataset with N = 120,285 samples and n = 2712 features. We opted for a random forest classifier as a suitable compromise between model complexity and interpretability. Test results in a random set of N = 300 held-out genes (20%) outperformed the state-of-the-art FBA predictions in accuracy, precision and recall, achieving an average 95% accuracy for all test genes across training repeats (Fig. 2a and Supplementary Fig. S5); moreover, FCL achieved a 1% and 6% improvement in classification of nonessential and essential genes, respectively, as compared to FBA (Fig. 2b).

Fig. 2: Prediction of metabolic gene essentiality in *Escherichia coli.*

Inspection of sample-wise prediction scores show that a small number of deletions get incorrectly classified, likely due to GEM misspecifications (Fig. 2c). Interpretability analysis revealed that a few as 100 reactions can explain model predictions, with top predictors being enriched for transport and exchange reactions (Fig. 2d). Thanks to its excellent predictive power, FCL can be employed to define a distance metric between deletions and the wild type strain, with statistically significant differences between nonessential and essential deletions (Supplementary Fig. S4).

To investigate which factors determine FCL performance, we first retrained the model with sparser sampling data and fewer gene deletions (Fig. 2e); predictive accuracy dropped in both cases, but models trained on as few as 10 samples/cone already matched the current state-of-the-art FBA accuracy. We additionally retrained FCL with earlier and less complete GEMs for E. coli and found that only the smallest GEM (iJR904) displayed a statistically significant drop in performance (Fig. 2f). Given the high dimensionality of the feature space, we retrained the random forest model on a reduced feature set computed with Principal Component Analysis, but this resulted in lower accuracy in all tested cases, possibly because correlations between essentiality and small changes in cone shape can only be captured in a high-dimensional feature space. We also explored the use of deep learning models, including feedforward and convolutional neural networks, but these did not improve performance even when trained on larger data with more than q = 5000 samples/cone (not shown). This is likely because such models are deliberately overparameterized to accommodate highly nonlinear correlations among features, but in our case, flux samples are linearly correlated through the stoichiometric constraint in Eq. (1).

We tested FCL for essentiality prediction in Saccharomyces cerevisiae and Chinese Hamster Ovary (CHO) cells, two more complex organisms with well-curated GEMs^32,33 widely employed for the synthesis of heterologous proteins and metabolites. These models have 52% and 130% more reactions than E. coli, respectively, leading to a higher dimensionality of the flux cone and more features for training. Following the same data generation protocol as in Fig. 2a, we trained FCL on 80% of deleted genes with similar sample density and using essentiality labels from the literature^33,34. FCL achieved better classification results than FBA in both organisms across multiple performance scores (Fig. 3 and Supplementary Table S5). We found that FCL showed similar prediction errors as FBA, with a tendency to misclassify some essential genes as nonessential, likely due to the class imbalance in the training data (most genes are nonessential). In the case of CHO cells, the performance gains over FBA are narrow, likely due to incomplete curation of the GEM; FCL performance could likely be improved further by designing GEM-specific machine learning architectures, particularly for large models such as those of CHO cells.

**Fig. 3: Prediction of metabolic gene essentiality for higher-order organisms.**

The performance improvements against FBA predictions on three organisms of varied complexity (E. coli, S. cerevisiae, and CHO cells) suggest that FCL provides the most accurate predictions for metabolic gene essentiality in the literature. This result further demonstrates that optimality assumptions are not required for prediction of metabolic gene essentiality, in agreement with earlier evidence provided by recent studies^19,35.

Prediction of small molecule synthesis

To explore the power of FCL for predicting other phenotypes, we focused on small molecule biosynthesis in microbial strains engineered with heterologous pathways³⁶. Recent studies have showcased the utility of genome-wide deletion screens for improving production titers^37,38. Nonessential deletions can both suppress or boost metabolite production; for example, deletions that disrupt enzymatic cofactor homeostasis are deleterious for product synthesis, while other nonessential deletions can redirect metabolic flux away from nonessential pathways toward increased production³.

We focused on a large deletion screen of S. cerevisiae mutants engineered to synthesize betaxanthin³⁷, a tyrosine-derived pigment widely employed in the food sector. The screen includes a total of 4223 gene deletions, out of which N = 811 genes code for metabolic enzymes present in the latest yeast GEM³³. Fitness scores for each deletion strain were quantified via betaxanthin autofluorescence averaged across four nonclonal cultures (Fig. 4a). We first binned betaxanthin autofluorescence into three classes for low, medium, and high-producing cultures. We employed FCL to build a 3-class classifier that predicts betaxanthin synthesis using Monte Carlo sampling of the deletion GEMs. Due to the imbalanced data size across classes (17.1%, 67.2%, and 15.7%, respectively), we trialed various model architectures in combination with rebalancing strategies (Fig. 4b); the best-performing model delivered promising accuracy (69.8%). We observed a tendency to underpredict the high-producing deletions due to these being underrepresented in the training data, though high producer accuracy improvements between 5.5% and 28.3% could be obtained via various class balancing techniques (Fig. 4c and Supplementary Table S6).

Fig. 4: Prediction of small molecule synthesis in *Saccharomyces cerevisiae.*

To the best of our knowledge, this is the first demonstration that small molecule synthesis can be predicted from deletion screening data, and adds to the growing number of tools to predict metabolite production using various data modalities and computational approaches^39,40,41. Since FCL relies purely on the wild-type GEM and experimental fitness readouts, it does not require extending the GEM with a heterologous pathway, which is particularly beneficial for production pathways with poorly characterized stoichiometry.

Discussion

With the rapid progress in high-throughput genetic engineering and automated screening technologies, there is a growing opportunity to utilize such data for building predictors of the phenotypic response to gene deletions. FCL offers a general strategy to detect correlations between metabolic genotypes and phenotypic readouts. It combines experimental fitness data with mechanistic knowledge into a machine learning system able to draw phenotypic predictions for a specific gene deletion.

Our model evaluations demonstrate that FCL outperforms the state-of-the-art FBA predictions of metabolic gene essentiality. FBA has the advantage of being a zero-shot predictor, in the sense that it does not need to be trained on fitness data. Instead, FBA draws predictions based on a biological optimality assumption; for microbial systems, maximal growth rate or biomass synthesis rate are well-validated metabolic objectives. But for most organisms beyond the microbial world, such optimality assumptions are not warranted, and there is no consensus on how to define suitable metabolic objectives for higher-order organisms^16,18. Various studies have built strategies to accommodate the multiobjective nature of metabolic optimality^42,43 or to reverse engineer metabolic objectives^44,45 and tradeoffs^18,46. Yet even in cases where an optimal objective of the wild-type can be validated, there is little evidence that such an objective would be preserved upon a gene deletion. Mutants are likely to be subject to different evolutionary pressures that shift their genetic programs away from the physiological objectives of the wild-type. FCL thus allows essentiality predictions in a much wider range of cell types than current methods, including those with unknown optimality principles such as human cell lines⁴⁷ or the gut microbiome⁴⁸, as well as prediction of other deletion phenotypes beyond essentiality, such as single-cell metabolic capabilities⁴⁹, synthetic lethality⁵⁰, or gain-of-function deletions⁵¹. Although FCL is agnostic to the fitness score employed for training, its effectiveness is limited by the strength of correlations between metabolic activity and the phenotype of interest. In the case of gene essentiality, for example, FCL works well because deletions in pathways that supply key metabolites for growth can strongly impact cell viability. Other phenotypes with weaker or no associations to metabolic activity may require additional data modalities for accurate prediction.

The integration of learning algorithms with GEMs has shown substantial promise for improved predictivity across various tasks^{18,19,41,52,53,54}. The novel paradigm behind FCL is to learn the shape of the metabolic space through random sampling of GEM. High-dimensional sampling remains a key challenge in statistical learning⁵⁵, because in high dimensions, samples tend to be equidistant and concentrate on the boundaries of the space⁵⁶. While expectation would suggest that dense sampling is needed to accurately capture the cone geometry, we consistently found that accurate FCL models could be trained from shallow sampling with as few as 100 samples per deletion. We hypothesize this is a case of the curse-of-dimensionality working to our benefit: to capture changes to the cone, FCL only requires samples at the boundary, and therefore, a relatively small number of samples is sufficient for accurate prediction.

An exciting application of FCL is the discovery of knockouts or deletion strategies that result in improved production of small molecules. This would help reduce the number of costly experiments required for strain optimization. A key challenge, however, is that desirable traits such as high metabolite titer are rare, which results in substantial class imbalances like those observed in the betaxanthin dataset (Supplementary Table S4); only a few knockouts improve production, and therefore, training data is typically enriched for mid- or low-producers. Data augmentation and synthetic data generation could address some of these challenges, in addition to new model architectures that improve performance. We also note that when designing FCL-based machine learning models, the experimental reproducibility of production readouts should put a ceiling on the expected model accuracy, so as to avoid training models that predict with higher accuracy than the measurement error.

The performance of FCL suggests that predictive representations of metabolic capabilities can be learned from Monte Carlo sampling of GEMs. This advancement lays the groundwork for the development of metabolic foundation models via large sampling across species, growth conditions and deletion genotypes, thus extending the breadth of biological foundation models across additional layers of cellular organization^57,58. We expect that FCL will open new routes for computational prediction of many cellular phenotypes, with applications in basic discovery, biotechnology and future therapies.

Methods

FCL utilizes sampling data generated with a random walk on a deletion-specific GEM. First, a wild-type GEM is modified with a gene deletion by setting the corresponding reaction bounds to zero. The high-dimensional flux cone of the deletion GEM is sampled using a random walk sampler; in our implementations, we opted for OptGPSampler⁵⁹, a fast Monte Carlo method that aims to uniformly sample the flux cone. The sampler first transforms the problem into a convex optimization problem in logarithmic space using geometric programming, then employs a hit-and-run algorithm to sample the interior of the cone. The resulting flux data have a number of rows equal to the number of samples and a number of columns equal to the number of reactions in the GEM. Each deletion produces a collection of flux sampling vectors, all of which are labeled with a fitness score obtained from experimental data. The fitness score can be either discrete or continuous, depending on the fitness readout under study.

The resulting labeled dataset is then employed for training supervised machine learning models. The model predictions are made at the level of flux samples, i.e., one row of the flux sampling data frame from each deletion is passed through the trained machine learning model to produce a predicted fitness score. Therefore, every sample from each deletion GEM is assigned an individual predicted score, and the distribution of these scores is finally averaged to obtain a gene-level prediction. FCL can deliver high predictive accuracy because it is trained to learn correlations between the geometry of the flux cone and the resulting phenotype.

Generation of flux sampling data

Flux sampling is a collection of methods for randomly generating flux distributions from the solution space of a GEM. Flux sampling algorithms are based on random walks optimized for the high-dimensional and nonisotropic geometry of the convex polytope defined by the GEM.

OptGPSampler uses artificial centering hit-and-run to bias the random walk towards the elongated regions of the flux cone. After an initial random location in the flux space is selected and a warm-up phase, every kth point following is generated by the sampler until N points are generated. These two parameters (k, N) control the number of flux samples generated by the algorithm. Flux sampling is computationally costly because it requires running a random walk on a high-dimensional flux space that needs to reach mixing time to achieve uniform coverage.

We ran OptGPSampler on all single-gene deletions in four E. coli models, the Yeast9 model for S. cerevisiae, and the iCHO2291 model for CHO cells. For training supervised machine learning models, sampling data were normalized to zero mean and unit variance. There were a small number of deletions in each GEM where the sampling failed to converge; these were not included in training or testing. A summary of GEM sizes and sampling data can be found in Supplementary Tables S1, S2. For example, in the Yeast9 model, we sampled 1159 single-gene deletions with a step size of k = 5000 for a sampling density of N = 124 samples/cone, leading to a total of 143,716 samples with D = 4130 fluxes each (total data size 4.43 Gb).

For all models except E. coli, we sampled with a high step size of k = 5000. To ensure robust performance evaluations in the E. coli iML1515 model (Fig. 2a–d), we retrained models many times using different training sets. For computational efficiency and due to large data sizes, after computing an initial large set of samples, using a fine step size of k = 100, we subsampled the data 10 times to have the same number of samples per deletion (N = 100 samples/cone). Three smaller E. coli models (iAF1260, iJO1366, iJR904) were employed for the comparison in Fig. 2f. In these models, deletion GEMs were sampled with N = 100 samples/cone and k = 5000. To equalize the amount of training features between models, only the deletions present in all models (D = 864 reactions) were included in the training and test sets for the models in Fig. 2f. The biomass reactions were removed to ensure the models were learning from the true reaction fluxes, not the biomass reaction used to compute FBA predictions (see Supplementary Table S3).

Experimental fitness labels

The gene essentiality labels for E. coli, S. cerevisiae and CHO cells were obtained from the literature^13,33,34. The yeast labels included nonmetabolic genes and were labeled with both gene and ORF labels. Gene names were standardized to their systematic names from the Saccharomyces Genome Database, resulting in N = 1121 metabolic gene deletions labeled with essentiality data, sampled, and included in the final dataset for model training. ORFs and gene names were linked using a tool from Yeastract+ http://www.yeastract.com/formorftogene.php.

For the results in Fig. 4, betaxanthin autofluorescence readouts for N = 811 yeast deletions were taken from Cachera et al.³⁷ and averaged across four cultures. While one gene (YBR011C) was also identified as essential in other studies, we included it in our analysis as we hypothesized this could be a conditionally essential deletion which can grow in alternative strain and media conditions. The average autofluorescence was normalized to the (0,1) range. We first framed the problem as a regression task, but this proved challenging with the limited number of knockouts at the high and low ends of the autofluorescence distribution. Recognizing that predicting high or low producers is a core task in several applications, we chose to train a three-class classifier by binning the data into three classes of high, medium, and low producers (Fig. 4a). We set the thresholds qualitatively to label 67% of samples as medium producers (within ~1 standard deviation from the mean). In all our case studies, labels were highly class-imbalanced, as shown in Supplementary Table S4.

Training of supervised learning models

All models were trained using the scikit-learn package in Python.

Escherichia coli

A random forest model classifier was trained on an 80% training set stratified to maintain the class imbalance. Model hyperparameters were fixed as max_depth =None, min_samples_split =2. The random forest was retrained N = 5 times with different test sets to confirm that performance was not significantly affected by the composition of the training set. The FBA baseline was obtained using the single_gene_deletion function in the CobraPy package⁶⁰ applied to all genes in the iML1515 model with default biomass objective function, aerobic conditions, and glucose as the carbon source. For the results in Fig. 2a, b, we chose 0.41/h as the cutoff for FBA predictions. This was chosen to match the experimental growth rate cutoff employed by the original iML1515 source¹³, which is 50% of the wild-type growth rate (predicted to be ~0.81/h by FBA). A ROC curve computed across five repeats is included as Supplementary Fig. S5, which demonstrates that FCL outperforms FBA in E. coli essentiality prediction regardless of the chosen cutoff. The naive baseline was compared by predicting all genes as nonessential (majority class). Once trained on the sample level, the prediction score of all samples from a single deletion was averaged; if this score was less than 0.5, the deletion was classified as essential. A representative model was used to create the prediction score distributions in Fig. 2c. In Fig. 2d, we trained N = 50 random forest classifiers on one subsample with random held-out test sets. Feature importance scores of all reactions were extracted from the random forest models.

Saccharomyces cerevisiae

For essentiality prediction, a class-stratified 20% of deletions (192 nonessential; 31 essential) was held out as a test set. The remaining 80% of deletions (772 nonessential; 126 essential) were split into 5-fold cross-validation sets and a random forest model was trained on each fold. The max_depth, n_estimators, and min_samples_split hyperparameters were tuned using a grid search and the model with the highest average cross-validation accuracy was selected, and the confusion matrix and ROC curve were computed for Fig. 3a. The best max_depth value was 30, the best n_estimators value (the number of trees) was 300, and the best min_samples_split (the minimum number of data points to split a leaf on the random forest) value was 2. The minimum deletion-level accuracy was 87.5%, the maximum was 90.3%. The test set results were computed by running the held-out test set through all 5-fold models and averaging the deletion-level scores across all models. The FBA baseline was computed using the single_gene_deletion function in Cobrapy for all genes with glucose as the carbon source and the standard biomass reaction.

For prediction of betaxanthin synthesis (Fig. 4), multiple models were trained on a class-stratified 80% training set split with a consistent held-out test set. The following model types were trained: HistGradientBoostingClassifier, Linear Support Vector Classifier, Logistic Regression Classifier, and Random Forest Classifier. We implemented two class balancing techniques to improve the minority class performance: balancing, which weights the class labels to account for the class imbalance, and resampling, which subsamples the majority class to be the same size as the minority classes.

Chinese hamster ovary cells

A HistGradientBoosting Classifier was trained on a 5-fold cross-validation of the training set. Twenty percent of the original training dataset was held out as a test set and not included in the cross-validation. The large training set data size required training models across 4 CPU nodes to load all training data into memory. The hyperparameters learning_rate, max_iter, and max_depth parameters were tuned via grid search and the model with the highest average cross validation accuracy was selected and the confusion matrix and ROC curve computed for Fig. 3b. The learning_rate was varied between 0.01 and 0.2, the max_iter between 100 and 500, and the max_depth set to 5, 10, or None. The best model had a learning_rate of 0.05, a max_iter of 100, and a max_depth of None. The test set results were computed by running the held-out test set through all 5-fold models and averaging the deletion-level scores across all models. The confusion matrix was computed for a class threshold value of 0.5 for each fold, and counts were averaged across all 5 folds. The FBA results were computed using the single_gene_deletion function in Cobrapy for all deletions and the default carbon source and objective function in the iCHO2291 model.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Flux sampling data and experimental fitness labels employed in the paper have been deposited in Zenodo⁶¹ at https://doi.org/10.5281/zenodo.15518666. Due to data size, we have provided one set of flux samples for each of GEMs employed in the paper. These can be employed to retrain models presented in the paper with the code provided. Source data are provided with this paper.

Code availability

The code used to train the models, perform the analyses and generate results in this study is publicly available and has been deposited in Zenodo⁶¹ under license CC-BY 4.0. The specific version of the code associated with this publication can be found at https://doi.org/10.5281/zenodo.15518666.

References

Chang, L., Ruiz, P., Ito, T. & Sellers, W. R. Targeting pan-essential genes in cancer: challenges and opportunities. Cancer Cell 39, 466–479 (2021).
Article CAS PubMed PubMed Central Google Scholar
Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat. Microbiol. 7, 1580–1592 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rancati, G., Moffat, J., Typas, A. & Pavelka, N. Emerging and evolving concepts in gene essentiality. Nat. Rev. Genet. 19, 34–49 (2018).
Article CAS PubMed Google Scholar
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Cacheiro, P. et al. Human and mouse essentiality screens as a resource for disease gene discovery. Nat. Commun. 11, 1–16 (2020).
Article Google Scholar
Bock, C. et al. High-content CRISPR screening. Nat. Rev. Methods Prim. 2, 1–23 (2022).
ADS Google Scholar
Shohat, S. & Shifman, S. Genes essential for embryonic stem cells are associated with neurodevelopmental disorders. Genome Res. 29, 1910–1918 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576.e16 (2017).
Article PubMed PubMed Central Google Scholar
Minikel, E. V. et al. Evaluating drug targets through human loss-of-function genetic variation. Nature 581, 459–464 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Meyer, A. J., Segall-Shapiro, T. H., Glassey, E., Zhang, J. & Voigt, C. A. Escherichia coli “Marionette" strains with 12 highly optimized small-molecule sensors. Nat. Chem. Biol. 15, 196–204 (2019).
Article CAS PubMed Google Scholar
King, Z. A. et al. Bigg models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44, D515–D522 (2016).
Article CAS PubMed Google Scholar
Orth, J. D., Thiele, I. & Palsson, B. Ø. What is flux balance analysis? Nat. Biotechnol. 28, 245–248 (2010).
Article CAS PubMed PubMed Central Google Scholar
Monk, J. M. et al. i ml1515, a knowledgebase that computes Escherichia coli traits. Nat. Biotechnol. 35, 904–908 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, H. et al. RAVEN 2.0: a versatile toolbox for metabolic network reconstruction and a case study on Streptomyces coelicolor. PLoS Comput. Biol. 14, e1006541 (2018).
Article PubMed PubMed Central Google Scholar
Kim, H. U. et al. Integrative genome-scale metabolic analysis of Vibrio vulnificus for drug targeting and discovery. Mol. Syst. Biol. 7, 460 (2011).
Article PubMed PubMed Central Google Scholar
Bordbar, A., Monk, J. M., King, Z. A. & Palsson, B. O. Constraint-based models predict metabolic and associated cellular functions. Nat. Rev. Genet. 15, 107–120 (2014).
Article CAS PubMed Google Scholar
Segrè, D., Vitkup, D. & Church, G. M. Analysis of optimality in natural and perturbed metabolic networks. Proc. Natl. Acad. Sci. USA 99, 15112–15117 (2002).
Article ADS PubMed PubMed Central Google Scholar
Lin, D.-W., Zhang, L., Zhang, J. & Chandrasekaran, S. Inferring metabolic objectives and trade-offs in single cells during embryogenesis. Cell Syst. 16, 101164 (2025).
Hasibi, R., Michoel, T. & Oyarzún, D. A. Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality. NPJ Syst. Biol. Appl. 10, 1–10 (2024).
Apaolaza, I. et al. An in-silico approach to predict and exploit synthetic lethality in cancer metabolism. Nat. Commun. 8, 459 (2017).
Article ADS PubMed PubMed Central Google Scholar
Olaverri-Mendizabal, D., Valcárcel, L. V., Barrena, N., Rodríguez, C. J. & Planes, F. J. Review and meta-analysis of the genetic minimal cut set approach for gene essentiality prediction in cancer metabolism. Brief. Bioinforma. 25, bbae115 (2024).
Article CAS Google Scholar
Hahn, M. W. & Kern, A. D. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol. Biol. Evol. 22, 803–806 (2005).
Article CAS PubMed Google Scholar
Campos, T. L., Korhonen, P. K., Gasser, R. B. & Young, N. D. An evaluation of machine learning approaches for the prediction of essential genes in eukaryotes using protein sequence-derived features. Comput. Struct. Biotechnol. J. 17, 785–796 (2019).
Article CAS PubMed PubMed Central Google Scholar
Aromolaran, O., Aromolaran, D., Isewon, I. & Oyelade, J. Machine learning approach to gene essentiality prediction: a review. Brief. Bioinform. 22, bbab128 (2021).
Hasan, M. A. & Lonardi, S. Deeplyessential: a deep neural network for predicting essential genes in microbes. BMC Bioinforma. 21, 1–19 (2020).
Article CAS Google Scholar
Zhang, X., Xiao, W. & Xiao, W. Deephe: accurately predicting human essential genes based on deep learning. PLoS Comput. Biol. 16, e1008229 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kang, B., Fan, R., Cui, C. & Cui, Q. Comprehensive prediction and analysis of human protein essentiality based on a pretrained large language model. Nat. Comput. Sci. 5, 196–206 (2025).
Article PubMed Google Scholar
Palsson, B. Ø. Systems Biology: Simulation of Dynamic Network States (Cambridge University Press, 2011).
Devoid, S. et al. Automated genome annotation and metabolic model reconstruction in the SEED and Model SEED. Methods Mol. Biol. 985, 17–45 (2013).
Article CAS PubMed Google Scholar
Kingma, D. P. & Welling, M. et al. An introduction to variational autoencoders. Found. Trends Mach. Learn. 12, 307–392 (2019).
Article Google Scholar
Cain, S., Merzbacher, C. & Oyarzún, D. A. Low-dimensional representations of genome-scale metabolism. In Proc. Foundations of Systems Biology in Engineering Conference 2024–05 (IFAC, 2024).
Yeo, H. C., Hong, J., Lakshmanan, M. & Lee, D.-Y. Enzyme capacity-based genome scale modelling of cho cells. Metab. Eng. 60, 138–147 (2020).
Article CAS PubMed Google Scholar
Zhang, C. et al. Yeast9: a consensus genome-scale metabolic model for S. cerevisiae curated by the community. Mol. Syst. Biol. 20, 1134–1150 (2024).
Article PubMed PubMed Central Google Scholar
Xiong, K. et al. An optimized genome-wide, virus-free CRISPR screen for mammalian cells. Cell Rep. Methods 1, 100062 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sharma, K., Marucci, L. & Abdallah, Z. S. Fluxgat: integrating flux sampling with graph neural networks for unbiased gene essentiality classification. arXiv. 2403.18666 (2024).
Han, T., Nazarbekov, A., Zou, X. & Lee, S. Y. Recent advances in systems metabolic engineering. Curr. Opin. Biotechnol. 84, 103004 (2023).
Article CAS PubMed Google Scholar
Cachera, P. et al. CRI-SPA: a high-throughput method for systematic genetic editing of yeast libraries. Nucleic Acids Res. 51, e91 (2023).
Article CAS PubMed PubMed Central Google Scholar
Fang, L. et al. Genome-scale CRISPRi screen identifies pcnB repression conferring improved physiology for overproduction of free fatty acids in Escherichia coli. Nat. Commun. 16, 3060 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Djoumbou-Feunang, Y. et al. Biotransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification. J. Cheminform. 11, 1–25 (2019).
Article Google Scholar
Schneider, P., von Kamp, A. & Klamt, S. An extended and generalized framework for the calculation of metabolic intervention strategies based on minimal cut sets. PLoS Comput. Biol. 16, e1008110 (2020).
Article CAS PubMed PubMed Central Google Scholar
Merzbacher, C., Mac Aodha, O. & Oyarzún, D. A. Modelling host-pathway dynamics at the genome scale with machine learning. Metab. Eng. 91, 480–491 (2025).
Article CAS PubMed Google Scholar
Schuetz, R., Zamboni, N., Zampieri, M., Heinemann, M. & Sauer, U. Multidimensional optimality of microbial metabolism. Science 336, 601–4 (2012).
Article ADS CAS PubMed Google Scholar
Shoval, O. et al. Evolutionary trade-offs, Pareto optimality, and the geometry of phenotype space. Science 336, 1157–60 (2012).
Article ADS CAS PubMed Google Scholar
Zhao, Q., Stettner, A. I., Reznik, E., Paschalidis, I. C. & Segrè, D. Mapping the landscape of metabolic goals of a cell. Genome Biol. 17, 109 (2016).
Article PubMed PubMed Central Google Scholar
Richelle, A. et al. Model-based assessment of mammalian cell metabolic functionalities using omics data. Cell Rep. Methods 1, 100040 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hausser, J. & Alon, U. Tumour heterogeneity and the evolutionary trade-offs of cancer. Nat. Rev. Cancer 20, 247–257 (2020).
Article CAS PubMed Google Scholar
Brunk, E. et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat. Biotechnol. 36, 272–281 (2018).
Article CAS PubMed PubMed Central Google Scholar
Heinken, A. et al. Genome-scale metabolic reconstruction of 7302 human microorganisms for personalized medicine. Nat. Biotechnol. 41, 1320–1331 (2023).
Article CAS PubMed PubMed Central Google Scholar
Gustafsson, J. et al. Generation and analysis of context-specific genome-scale metabolic models derived from single-cell RNA-Seq data. Proc. Natl. Acad. Sci. 120, e2217868120 (2023).
Article PubMed PubMed Central Google Scholar
Srivatsa, S. et al. Discovery of synthetic lethal interactions from large-scale pan-cancer perturbation screens. Nat. Commun. 13, 7748 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Ye, L. et al. A genome-scale gain-of-function CRISPR screen in CD8 T cells identifies proline metabolism as a means to enhance CAR-T therapy. Cell Metab. 34, 595–614.e14 (2022).
Article PubMed PubMed Central Google Scholar
Yang, J. H. et al. A white-box machine learning approach for revealing antibiotic mechanisms of action. Cell 177, 1649–1661.e9 (2019).
Article PubMed PubMed Central Google Scholar
Faure, L., Mollet, B., Liebermeister, W. & Faulon, J.-L. A neural-mechanistic hybrid approach improving the predictive power of genome-scale metabolic models. Nat. Commun. 14, 4669 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Gopalakrishnan, S. et al. COSMIC-dFBA: a novel multi-scale hybrid framework for bioprocess modeling. Metab. Eng. 82, 183–192 (2024).
Article CAS PubMed Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, 2009).
Wainwright, M. J. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, 2019).
Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 21, 1470–1480 (2024).
Article CAS PubMed Google Scholar
Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024).
Article CAS PubMed PubMed Central Google Scholar
Megchelenbrink, W., Huynen, M. & Marchiori, E. optGpSampler: an improved tool for uniformly sampling the solution-space of genome-scale metabolic networks. PLOS ONE 9, e86587 (2014).
Article ADS PubMed PubMed Central Google Scholar
Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. Cobrapy: constraints-based reconstruction and analysis for Python. BMC Syst. Biol. 7, 1–6 (2013).
Article Google Scholar
Merzbacher, C., Mac Aodha, O. & Oyarzún, D. A. Code and data for “Accurate prediction of gene deletion phenotypes with Flux Cone Learning”. Zenodo. https://doi.org/10.5281/zenodo.15518666 (2025).

Download references

Acknowledgements

C.M. and D.A.O. were supported by the United Kingdom Research and Innovation (grant EP/S02431X/1, UKRI Centre for Doctoral Training in Biomedical AI).

Author information

Authors and Affiliations

School of Informatics, University of Edinburgh, Edinburgh, UK
Charlotte Merzbacher, Oisin Mac Aodha & Diego A. Oyarzún
School of Biological Sciences, University of Edinburgh, Edinburgh, UK
Diego A. Oyarzún

Authors

Charlotte Merzbacher
View author publications
Search author on:PubMed Google Scholar
Oisin Mac Aodha
View author publications
Search author on:PubMed Google Scholar
Diego A. Oyarzún
View author publications
Search author on:PubMed Google Scholar

Contributions

C.M. performed data analysis, model training, and benchmarking. O.M.A. advised on machine learning aspects. DAO designed the research and provided overall supervision.

Corresponding author

Correspondence to Diego A. Oyarzún.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Harrison Steel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Merzbacher, C., Mac Aodha, O. & Oyarzún, D.A. Accurate prediction of gene deletion phenotypes with Flux Cone Learning. Nat Commun 16, 8492 (2025). https://doi.org/10.1038/s41467-025-63436-9

Download citation

Received: 08 March 2025
Accepted: 18 August 2025
Published: 26 September 2025
DOI: https://doi.org/10.1038/s41467-025-63436-9

Subjects

Abstract

Similar content being viewed by others

Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality

Genetically personalised organ-specific metabolic models in health and disease

Local flux coordination and global gene expression regulation in metabolic modeling

Introduction

Results

Learning the shape of the metabolic space

Best predictive accuracy of metabolic gene essentiality

Prediction of small molecule synthesis

Discussion

Methods

Generation of flux sampling data

Experimental fitness labels

Training of supervised learning models

Escherichia coli

Saccharomyces cerevisiae

Chinese hamster ovary cells

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links