Extended Data Fig. 8: 50 mNODE training repeats with different initializations have very consistent predictive performances on real microbial community datasets.

For the dataset PRISM+NLIBD, three performance metrics are adopted for comparing model performances: : the mean SCC \(\bar{\rho }\), the top-50 mean SCC \({\bar{\rho }}_{50}\), and the number of metabolites with SCCs larger than 0.5, 0.4, and 0.3 (denoted as Nρ>0.5, Nρ>0.4, and Nρ>0.3 respectively). All datasets are randomly divided into training and test sets with the 80/20 ratio except for the PRISM and NLIBD dataset. In all box plots, the middle orange line is the median, the box extends from the first quartile (Q1) to the third quartile (Q3) of the data, the black whiskers extend from the box by 1.5 × IQR (where IQR is the interquartile range), and outlier unfilled black circles are those beyond the range defined by two whiskers. The sample size n=50 for all box plots.