Fig. 1: In-sample versus out-of-sample effect estimates in multivariate BWAS. | Nature

Fig. 1: In-sample versus out-of-sample effect estimates in multivariate BWAS.

From: Reply to: Multivariate BWAS can be replicable with moderate sample sizes

Fig. 1

a–e, Methods comparison between our previous study1 (split-half) and Spisak et al.8 (cross-validation followed by split-half). ‘Marek, Tervo-Clemmens’ and ‘Spisak’ refer to the methodolgies described in ref. 1 and ref. 8, respectively. For a–e, HCP 1200 Release (full correlation) data were used to predict age-adjusted total cognitive ability. Analysis code and visualizations (x,y scaling; colours) are the same as in Spisak et al.8. The x axes in a–e always display the split-half out-of-sample effect estimates from the second (replication) half of the data (correlation between true scores and predicted scores; as in Spisak et al.8 and in our previous study1; Supplementary Methods). a, In-sample (training correlation; y axis) as a function of out-of-sample associations (plot convention in our previous study1). b, Matched comparison of the true in-sample association (training correlations, mean across folds; y axis) in the method proposed by Spisak et al.8. c, The proposed correction by Spisak et al.8 that inserts an additional cross-validation step to evaluate the first half of data, which by definition makes this an out-of-sample association (y axis). d, Replacing the cross-validation step from Spisak et al.8 with a split-half validation provides a different (compared with c) out-of-sample association of the first half of the total data (that is, each of the first stage split halves is one-quarter of the total data; y axis). The appropriate and direct comparison of in-sample associations between Spisak et al.8 and our previous study1 is comparing b to a, rather than c to a. The Spisak et al. method8 (cross-validation followed by split-half validation) does not reduce in-sample overfitting (b) but, instead, adds an additional out-of-sample evaluation (c), which is nearly identical to split-half validation twice in a row (d), and makes it clear why the out-of-sample performance of these two methods is likewise nearly identical. e, Correspondence between out-of-sample associations (to the left-out half) from the additional cross-validation step proposed by Spisak et al.8 (mean across folds; y axis) and the original split-half validation from our previous study1 (x axis). The identity line is shown in black. f, In-sample (r; light blue) and out-of-sample (r; dark blue) associations as a function of sample size. Data are from figure 4a–d of ref. 1. g, Published literature review of multivariate r (y axis) as a function of sample size (data from ref. 15) displayed with permission. For f and g, best fit lines are displayed in log10 space. h, Overlap of f and g.

Back to article page