Fig. 2: sPLS models for any NDD.

a–f AUC (area under the curve) from repeated tenfold CV (cross-validation) of an sPLS (sparse partial least squares) model for risk of any NDD (neurodevelopmental disorders) at different time points. The box plots represent the diagnosis predictions for the median cross-validated model. The center of the boxes represents the median, their bounds represent the 25th and 75th percentiles, and the lower and upper ends of whiskers represent the smallest and largest values, respectively, no further than 1.5 × IQR (interquartile range) from the respective end of the box plot. Sample sizes were N = 581 at gestational week 24, N = 579 at one week postpartum, and N = 520, 535, 477, and 520 at 6 months, 18 months, 6 years, and 10 years, respectively. g ROC curve of sPLS models of any NDD. ROC (receiver operating characteristic) curves were constructed for sPLS models at each timepoint, with the best-performing resample selected based on AUC. h Representation of the top 50 metabolites related to risk of any NDD, contributing to the optimal set of loadings for the 10-times repeated 10-fold cross-validation sPLS model (n = 100 separate model runs) using the entire samples of week 24 (N = 581 samples). Bars depict the median across repeats ± SD. Source data are provided as a Source data file.