Fig. 3: A machine learning pipeline for drug prediction and performance of one-versus-rest classification.

a The pipeline consisted of three steps. First, c-Fos+ cell counts for each brain region undergo normalization, Yeo-Johnson transformation, and robust scaling, into c-Fos scores. Second, the Boruta procedure is used to select the set of informative brain regions. Third, c-Fos scores from this set of brain regions were used to fit a ridge logistic regression model. For each iteration, 75% of the data in each drug condition were used for region selection and training through the three steps, and the remaining 25% of the data were withheld initially, but then processed and tested with the ridge logistic regression model. The entire process was iterated using different splits of the data 100 times. b Linear discriminant analysis of the c-Fos scores to visualize the data in a low-dimensional space. c The confusion matrix shows the mean proportion of predicted labels for each of the true labels across all splits. d The composite precision-recall curves for each drug condition across all splits and the grand average across all drugs. The values in parentheses are the area under the precision-recall curve for each condition.