Fig. 1: Schematic overview of scBalance.

a The method is constructed based on the supervised learning framework, which contains a dataset-balancing module and a dropout neural network module. Step 1 Upper: With our adaptive weighted sampling, scBalance will automatically choose the weight for each cell type in the reference dataset and construct the training batch. Lower: Users can choose an external dataset-balancing method, such as scSynO, instead of using our internal balancing method. Only the classifier will be used in this case. Step 2: While training, scBalance will iteratively learn mini batches from a three-layer neural network until the cross-entropy loss converges. b Dropout setting in different stages. In the training stage, scBalance randomly disables neurons in the network. The dropout layer is binary with a rate of 0.5. All the dropped units will be reconnected in the testing stage. The prediction will be processed by a fully connected neural network. c Evaluation of balancing methods shows that our sampling method outperforms simple oversampling and downsampling methods as well as the SMOTE method. The p-value is from a significance test of scBalance and SMOTE (n = 5 for each boxplot). d Comparison of running times among different sampling techniques.