Cell type annotation is a core task for single cell RNA-sequencing, but current bioinformatic tools struggle with some of the underlying challenges, including high dimensionality, data sparsity, batch effects and a lack of labels. In a self-supervised approach, a transformer model called scBERT is pretrained on millions of unlabelled public single cell RNA-seq data and then fine-tuned with a small number of labelled samples for cell annotation tasks.
- Fan Yang
- Wenchuan Wang
- Jianhua Yao