Fig. 1: Overview of Spatial-ID.

Stage 1 involves knowledge transfer from reference datasets. Stage 2 involves feature embedding of gene expression and spatial information, and employs self-supervised strategy to train a classifier (CLS) using the generated pseudo-labels in stage 1. Stage 3 uses the optimal model derived from Stage 2 to perform cell type annotation. a Reference scRNA-seq datasets are employed to pretrain deep neural network (DNN) models. b Based on the cell type probabilities distributions \(D\) produced by pretrained DNN, pseudo-labels \(L\) are generated by adjusting the temperature parameter \(T\). c A deep autoencoder is used to learn encoded gene representation \(X\) through reducing the dimension of the gene expression matrix \(I\). The gene expression matrix \({I}^{{\prime} }\) reconstructed by decoder is used to optimize the autoencoder by minimizing with the input gene expression matrix \(I\). d A spatial neighbor graph is constructed to represent the spatial relationships between neighboring cells, where the relationship weight of each pair of cells is negatively associated with Euclidean distance. Therefore, the spatial neighbor graph is represented as an adjacency matrix \(A\). e A variational graph autoencoder (VGAE, a kind of GCN) is used to embed the encoded gene representations \(X\) from autoencoder and the adjacency matrix \(A\), and then generate the spatial embedding \(S\) as output. The reconstructed adjacency matrix \({A}^{{\prime} }\) is used to optimize the VGAE by minimizing with the input adjacency matrix \(A\).