Fig. 2: The framework of SCCAF-D. | Nature Communications

Fig. 2: The framework of SCCAF-D.

From: Alleviating batch effects in cell type deconvolution with SCCAF-D

Fig. 2

SCCAF-D first combines different single-cell datasets from the same tissue to an integrated dataset and optimises the cell annotation. It then identifies representative cells, which are self-consistent, using a ‘self-projection’ approach from SCCAF (the block boxed by orange dashed line). The integrated dataset is split into training set and testing set, whereas training set is used to train a machine learning model of logistic regression. The machine learning model is applied to the test set to give a predicted cell type score matrix. This predicted score matrix is compared with the original cell type labels from the optimised cell annotation. For each cell type, the top 100 cells with the highest predicted scores are selected as reference data. Subsequently, this optimised reference is used to perform cell type deconvolution with DWLS, yielding cell proportions. The UMAP visualisation of single-cell reference data before and after SCCAF-D optimisation, with grey shape indicating cells excluded by SCCAF-D and other cells of other colours as selected representative cells.

Back to article page