Extended Data Fig. 5: Framework generated from mental function terms and subsequently mapped to brain circuits.
From: A data-driven framework for mapping domains of human neurobiology

To control for the contribution of brain coordinate data, the framework was rederived solely from mental function terms. Mental function terms were clustered by k-means according to PMI-weighted co-occurrences in the training set of 12,708 articles. The top 25 function terms were assigned to each domain by rpb of binarized occurrences with the centroid of occurrences across ‘seed’ terms from clustering. The number of terms per domain and name for each domain were determined as before. Circuits were mapped from PPMI of brain structures with the centroid of domain terms (FDR < 0.01). a, Domains are visualized for k = 6, the same dimensionality as the data-driven framework in the main text. Term size is scaled to rpb with the centroid of seed terms. Term-based domains are linked to the RDoC and DSM domains illustrated in Fig. 4. Links between domains were computed across the corpus of 18,155 articles by Dice similarity of mental function terms and brain structures (FDR < 0.05 based on permutation testing over 10,000 iterations). The Dice similarity of links with RDoC and DSM frameworks across the corpus is shown for b, the data-driven framework based on brain coordinates and mental function terms (as in Fig. 4), and c, the framework based only on terms. Dice similarity with d, RDoC and e, the DSM was macro-averaged across domains, and a one-sided bootstrap test assessed the difference in means between the data-driven frameworks. The term-based framework was more similar to RDoC than the framework also based on coordinates (99.9% CI=[0.022, 0.098]), and to the DSM (95% CI=[0.001, 0.034]). Bootstrap distributions were computed by resampling function terms and brain structures over 10,000 iterations.