Fig. 3: Classification accuracies are maintained when only a subset of species are used.

For each of the 7 training sets used for LOSO validation within the SMG datasets, we conducted a feature selection based on a differential abundance analysis performed on the training set. We then selected the 20 features with the largest absolute effect size and used them to build new Ridge regression models, which were then tested on the held-out data. a LOSO model accuracies (avg. - average). b Heatmap of the joint set of features selected in at least one dataset with average relative weights colour-coded. Here, features are divided based on their enrichment in PD or controls. In both panels the models are referred to by the study used as a test set (x-axis). White tiles in the heatmap refer to species that were not included in the respective model.