Fig. 4: Disease specificity of classification models is significantly improved by pooling data from multiple studies.

Dots indicate the proportions of samples spuriously predicted as PD in datasets of other neurodegenerative diseases (to assess the disease specificity of PD classification models). All classification models (Ridge regression) trained on individual PD studies (16S data only) as well as on pooled data (LOSO) were evaluated for false-positive prediction (FPR) rates on datasets obtained for other diseases (AD = Alzheimer’s disease; MS = multiple sclerosis, all data generated using 16S sequencing). Models were originally adjusted to a 10% FPR on the controls from PD datasets. Thus, an FPR above this level in patients with other diseases indicates a lack of disease specificity as previously established13. Average and standard deviation of AUC values (17 in total) are reported for models trained on a single study (orange dots and vertical lines) as well as for LOSO models (purple). On average, LOSO models were found to be much more disease-specific (average FPR = 18.7%, horizontal purple line) than models trained on a single dataset (average FPR = 35.1%, horizontal orange line).