Extended Data Fig. 4: The training sample size needed to reach a great predictive performance scales linearly with the number of species in synthetic data.

Synthetic data in this figure are generated by the microbial consumer-resource model with nutrient sampling probability pn = 1.0. For the case with 100 species and varying number of metabolites (100, 200, or 300), three metrics are used for comparing model performances: a1 the mean SCC \(\bar{\rho }\), a2 the top-50 mean SCC \({\bar{\rho }}_{50}\), and a3 the number of metabolites with SCCs larger than 0.8 divided by the number of metabolites Nρ>0.8/Nm. b1-b3 The performance metrics for the case with 200 species and varying numbers of metabolites (100, 200, or 300).