Fig. 1: Study design, analysis framework, and accuracy assessment of brain age prediction models.

a A brain age prediction model was trained using 20-fold cross-validation on healthy participants with a single pre-pandemic scan (training set). The model was applied to an unseen set comprising the Pandemic group (G1) and the No Pandemic group (G2). G1 was further subdivided into Pandemic–COVID-19 (G3) and Pandemic–No COVID-19 (G4). b Imaging-derived phenotypes (IDPs) were extracted from grey matter (GM) and white matter (WM) across scan times. Separate prediction models were trained by tissue type and sex using pre-pandemic data, and then applied independently to scans from different time points to estimate brain age gap (BAG). Statistical analyses assessed pandemic- and infection-related effects using longitudinal data. c Scatter plots show predicted vs. chronological age for GM and WM models in females (males shown in Supplementary Fig. 2). The diagonal line indicates perfect prediction. ‘N’ is the number of subjects used for training. Model performance was evaluated using Pearson’s correlation (r) and mean absolute error (MAE), averaged across 100 repetitions. d Relationship between BAG and chronological age for GM and WM models, aggregated across sexes. The black regression line indicates no age-related bias. e Predicted brain ages at two time points show high reproducibility in both groups (Pearson’s r > 0.96). Intraclass correlation coefficients were 0.981 (95% CI: 0.977–0.985) for the Pandemic group and 0.983 (95% CI: 0.980–0.985) for the No Pandemic group, confirming temporal stability. Partial correlation analyses, controlling for chronological age, yielded r = 0.86 (95% CI: 0.83–0.88) for the Pandemic group and r = 0.88 (95% CI: 0.87–0.90) for the No Pandemic group. f Boxplots compare BAG distributions between the training set (N = 15,334) and unseen (first scan) set (N = 996), and between Pandemic (N = 432) and No Pandemic (N = 564) groups for GM and WM models. No significant differences were observed (GM: p(FDR) = 0.44, 0.23; WM: p(FDR) = 0.99, 0.28). Each scatter point represents a participant. Asterisks (****) indicate FDR-corrected p ≤ 0.0001; ‘ns’ denotes non-significant differences.