Extended Data Fig. 10: Additional visualizations of article partitioning by domains.
From: A data-driven framework for mapping domains of human neurobiology

Dice distance between articles is shown for binarized vectors of the mental function terms that occurred in the full text and the brain structures to which reported coordinate data were mapped. Articles were split into sets for discovery (n = 12,708) and replication (n = 5,447), then matched to domains based on the Dice similarity of their term-structure vectors. Domain assignments are represented by the color coding scheme established in Fig. 4 for a, the data-driven framework, b, RDoC, and c, the DSM. Shaded areas represent the lower triangle of distances between articles within each domain partition. d-f, Dice distance between articles visualized with t-SNE. Distances were computed between the terms and structures of articles in the full corpus (n = 18,155), and dimensionality of the 18,155 ×18,155 matrix was reduced by principal component analysis. The first 10 principal components (18,155 ×10) were taken as inputs to t-SNE (perplexity = 25, early exaggeration = 15, learning rate = 500, and maximum iterations = 1,000). Articles are visualized separately for the discovery and replication sets, with colors and shapes corresponding to domain assignments in each framework.