Fig. 2: Unsupervised Reconstruction of Biological Networks Associated with Army Combat Fitness Test Performance.

A Overview of the method used to generate the PhenoMol Models in comparison to a simpler modeling approach (i.e., Control Models). In both cases, a 5-fold stratification subsampling of the cohort was used to control overfitting. The control models are trained using Sparse Partial Least Squares Regression (sPLSR) on 80% of the data and tested on the remaining 20% of data. The PhenoMol generated models include a network-based data reduction methodology that utilizes the Prize Collecting Steiner Forest (PCSF) algorithm and the Omics Integrator (OI) algorithm. The inputs to these algorithms include both prizes and an interactome of prior data. The output of this analysis is the Principal Network (PN), which contains a set of prize nodes that were measured and Steiner nodes that were not measured but were robustly selected by the PCSF algorithm. A PN model is then generated by sPLSR that only utilizes molecular features found within the PN. B Scatter plot presenting the predicted vs measured ACFT Total Points for the 65 male cadets for the PN models (blue circles) and the control models (red x’s) for sPLSR generated models across six analysis runs of five-fold cross validations (N = 30). C Box plot presenting the root mean square error (RMSE) for predicting the ACFT Total Points of 65 male cadets. The box plot presents the RMSE for thirty sPLSR models (i.e., six analysis runs of five-fold cross validations). The RMSE for the PN models was significantly lower than the control models with a two-sample paired Wilcoxon signed rank test p-value = 0.00011.