Extended Data Fig. 7: Forward inference classification with logistic regression. | Nature Neuroscience

Extended Data Fig. 7: Forward inference classification with logistic regression.

From: A data-driven framework for mapping domains of human neurobiology

Extended Data Fig. 7

a, Logistic regression classifiers were trained to predict whether coordinates were reported within brain structures based on the occurrences of mental function terms in full texts. Classifier features included term occurrences thresholded by mean frequency across the corpus, then the mean frequency of terms in each domain. Activation coordinate data were mapped to 118 structures in a whole-brain atlas. Training was performed in 70% of articles (n = 12,708), hyperparameters were tuned on a validation set containing 20% of articles (n = 3,631), then classifiers were evaluated in a test set containing 10% of articles (n = 1,816). Plots are colored by the domain to which structures were assigned in the data-driven framework, and by the domain with highest PPMI for the RDoC and DSM frameworks. Test set ROC curves are shown for b, the data-driven framework, c, RDoC, and d, the DSM. e-g, For each brain structure, the significance of the test set ROC-AUC was determined by a one-sided permutation test comparing the observed value to a null distribution, and the p value was FDR-corrected for multiple comparisons (* FDR < 0.001). Observed test set values are shown with solid lines. Null distributions (gray) were computed by shuffling true labels over 1,000 iterations. Bootstrap distributions (colored) were computed by resampling articles in the test set with replacement over 1,000 iterations. h, The difference in mean ROC-AUC was assessed for each framework pair by a two-sided bootstrap test. The data-driven framework had higher ROC-AUC than both RDoC (99.9% CI of the difference = [0.020, 0.049]) and the DSM (99.9% CI of the difference = [0.055, 0.091]). RDoC had higher ROC-AUC than the DSM (99.9% CI of the difference = [0.024, 0.058]). Solid lines denote means of the bootstrap distributions obtained by macro-averaging across brain structure classifiers. i-j, Difference in ROC-AUC between the data-driven and expert-determined frameworks. Maps were thresholded to show differences with FDR < 0.001 based on permutation testing.

Back to article page