Fig. 2: Detailed overview over the BERT algorithm.

A Input batches are processed in a binary tree of pairwise batch-effect correction steps (left) using ComBat or limma for features with sufficient numerical values from the respective input batches, whereas features with fully missing data in either of the batches are propagated to the next tree level without further changes, since batch-effects can not be quantified without another batch to compare (right). B BERT first processes the input datasets in a parallel phase, in which sub-trees are integrated on an iteratively decreasing number of independent BERT processes, followed by a sequential phase, in which the remaining intermediate batches are integrated into the final output data. C Biological and other known conditions can be specified as categorical covariates and will be considered accordingly by the underlying batch-effect correction algorithms. D Users may specify arbitrary samples as references, for which the batch effect is modeled linearly by BERT, followed by the co-integration of all other, non-reference samples.