Fig. 4: EvoWeaver rivals STRING without reliance on external data.
From: EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals

a Predictive accuracy was compared on 1514 pairs of gene groups that overlapped between STRING and the Multiclass benchmark. Area under the ROC curve (AUROC) is shown for discerning between pairs sharing the same pathway in KEGG (i.e., positives) versus pairs in different pathways (i.e., negatives). STRING’s predictions are a composite of seven evidence streams. Sequentially incorporating evidence streams from least to most beneficial demonstrates their marginal impact on STRING’s reported Total Score. Text Mining and Databases were the most impactful STRING evidence streams. Despite STRING’s predictions incorporating KEGG itself into its Databases evidence stream, EvoWeaver’s Random Forest predictions roughly match STRING’s predictions without Text Mining while only using sequence information. EvoWeaver greatly outperforms STRING when both are limited to only de novo predictors (i.e., Gene Fusion, Cooccurrence, and Gene Neighborhood for STRING), even when trained on CORUM (EvoWeaver Transfer). b As expected, some of EvoWeaver’s component predictors were modestly correlated with STRING’s evidence streams. For example, STRING’s Cooccurrence score is correlated with EvoWeaver’s Phylogenetic Profiling algorithms (red box), and STRING’s Gene Neighborhood score is correlated with EvoWeaver’s Gene Organization algorithms (green box). Spearman’s correlation is calculated in the same manner as in Fig. 2. Source data are provided as a Source Data file.