Extended Data Fig. 2: DEBIAS-M maintains its performance in log space and without observing the test set during training.

a, The progression of cross-batch similarity loss as a fitted online DEBIAS-M model adapts to samples from a previously unobserved study, by solely minimizing the cross-batch similarity loss. b, the predictive performance of the fitted online DEBIAS-M model throughout the adaptation iterations. Although not directly used during the adaptation itself, the auROC of the model’s prediction on the held-out test increases as the cross-batch similarity increases. c, Box and swarm plots (Box, IQR; line, median; whiskers, nearest point to 1.5*IQR) comparing the auROC of DEBIAS-M (fitted and evaluated using the strategy in Extended Data Fig. 1a) with ‘Online DEBIAS-M’ (fitted and evaluated using the strategy in Extended Data Fig. 1c) on each held-out study, for the benchmarks used in Fig. 2a–c. Online DEBIAS-M demonstrated equivalent predictive performance on held-out studies. p - two-sided Wilcoxon signed-rank test. See Supplementary Tables 1, 3, 4 for information on studies and sample sizes. d-g, Same as Fig. 2, but comparing log-additive DEBIAS-M to batch-correction methods on clr-transformed data. Most methods operate only in relative abundance or count space, and we therefore transformed their outcome with centered log transformation. For a fair comparison, we also clr-transformed the output of regular DEBIAS-M (denoted ‘CLR(DEBIAS-M)’) PLSDA-batch was run on clr-transformed abundances. Voom-SNM is not included as it cannot operate on clr-transformed values and its output is not in non-negative relative abundance or count space. *, Fisher’s multiple comparison of one-sided DeLong tests p < 0.01 vs. DEBIAS-M. See Supplementary Table 7 for exact p values.