Extended Data Fig. 4: Neural network approach to data-driven ontology generation. | Nature Neuroscience

Extended Data Fig. 4: Neural network approach to data-driven ontology generation.

From: A data-driven framework for mapping domains of human neurobiology

Extended Data Fig. 4

The procedure in Fig. 1a was repeated using neural network classifiers in place of logistic regression. Neural network classifiers were comprised of 8 fully connected layers and fit with learning rate = 0.001, weight decay = 0.001, neurons per layer = 100, dropout probability = 0.1 (first 7 layers), and batch size = 1,024. In Step 3, neural networks were trained over 100 epochs to predict term and structure occurrences within domains. In Step 4, neural networks were trained over 500 epochs to predict domain term list and circuit occurrences. a, Validation set ROC-AUC plotted for forward inference, reverse inference, and their average. b, Data-driven solution for k = 6. Term size is scaled to frequency in the corpus of 18,155 articles with activation coordinate data. The number of terms per domain was selected in Step 3 to maximize neural network performance in the validation set. Brain maps show structures included in each circuit as a result of clustering by PMI-weighted co-occurrences with function terms. c, Article partitioning based on maximal similarity to terms and structures in domain prototypes visualized by multidimensional scaling. d, Modularity was assessed by comparing the mean Dice distance of function and structure occurrences of articles between domains versus within domains. Observed values are colored by domain; null distributions in gray were computed by shuffling distance values across article partitions over 1,000 iterations. e, Generalizability was assessed by Dice similarity of each domain’s ‘prototype’ vector of function terms and brain structures with the occurrences of terms and structures in each article of the domain’s partition. Observed values are colored by domain; null distributions in gray were computed by shuffling terms and structures in each prototype over 1,000 iterations.

Back to article page