Extended Data Fig. 1: Overview of information used in various microbiome batch-correction prediction benchmarks.

Top legend: Description of information that is typically incorporated in microbiome batch correction, which is 1) the samples themselves; 2) the labels to be predicted in downstream modeling; 3) other covariates; and 4) target labels. a, The primary batch-correction evaluation strategy used in this work for DEBIAS-M, in which the samples from all studies are used during batch correction, but only the labels from the training set are used during batch correction or model training. b, The primary batch-correction evaluation strategy used for methods that require an outcome label for all studies, in which a covariate is provided during batch-correction instead of the target label. c, The batch-correction strategy used in our ‘online’ benchmark in Extended Data Fig. 2a–c, in which no information from the test set is used during batch correction or model training. Once all bias-correction factors and predictive model weights are learned and fixed for the training set, bias correction is performed separately for the test set by adjusting its bias-correction factors to optimize cross-batch similarity. d, An approach that has been used in some previous benchmarks, in which the labels of the test set are used during batch correction itself. This risks ‘information leakage’, and is used in this work only in Extended Data Fig. 3a, b, and, for ConQuR and percnorm, also Extended Data Fig. 6. e, A description of how DEBIAS-M is used as a preprocessing step (Fig. 6 and Extended Data Fig. 9a). DEBIAS-M is applied to samples from all studies, but receives the target label only for studies from the training set. A prediction model is then trained on the training set which was processed by DEBIAS-M, and evaluated on the test set.