Extended Data Fig. 1: Model performance and misclassification frequency are robust to analysis approach.
From: Brain–phenotype models fail for individuals who defy sample stereotypes

(a) Classification accuracy for each phenotypic measure using FC calculated from all in-scanner conditions in the Yale dataset, and five different analysis pipelines: an alternative, 368-node parcellation for FC matrix generation, two alternative classification algorithms (ensemble of weak learners and neural network), an alternative phenotypic binarization threshold (mean split), and an alternative (10-fold) cross-validation approach (see Methods for additional description of each analysis). Box plot line and hinges represent median and quartiles, respectively; whiskers extend to most extreme non-outliers; outliers plotted individually (+). Number of classified individuals and size of training sample same as in main analyses (see Supplementary Table 4) for all analyses except mean split (4 measures [see below], number classified = 109-127, training sample size = 72-82) and 10-fold (number classified same as in main analyses, training sample size = 34-110). r1, rest 1; r2, rest 2; grad, gradual-onset continuous performance task; sst, stop signal task; gfc, general FC. (b) Misclassification frequency (MF), averaged across in-scanner conditions and phenotypic measures to derive a single value per participant, compared between each alternative analysis and main-text analyses. rs, two-tailed rank correlation, n = 128-129, P values FDR adjusted. Note that phenotype mean split is equivalent to mean ± 1/3 × s.d. for scaled scores; mean split-based model accuracy is not reported for these measures, nor are they included in the calculation of misclassification frequency. Given the limited mean split-based results, we repeated this analysis in the HCP data, with comparable results (mean misclassification frequency rs = 0.86, P < 0.0001). 10-fold results reflect 1,000 analysis iterations per phenotypic measure and in-scanner condition (50 per cross-validation partition); all other analyses reflect 100 iterations. In this and all subsequent figures: BNT, Boston Naming Test; WRAT, Wide Range Achievement Test; VL, verbal learning; FW, finger windows; LN, letter–number sequencing; Trails, trail making; VF, verbal fluency; CW, colour–word interference; 20Q, 20 questions; Vocab, vocabulary; MR, matrix reasoning.