Fig. 4: Nicheformer accurately transfers cell-type, niche and region label to unseen spatial and dissociated data in the brain.
From: Nicheformer: a foundation model for single-cell and spatial omics

a, Single cells resolved in space on an example slice (n = 114,396 cells) of the MERFISH mouse brain dataset with niche label superimposed. b, Test-set F1 macro of niche and brain region label prediction of the fine-tuned Nicheformer model, the linear-probing model and a linear-probing baseline computed based on embeddings generated with Geneformer, scGPT, scVI and PCA, respectively. For scVI and PCA, both embeddings generated from a random 1% subset of the SpatialCorpus as well as embeddings generated from the training set of the original dataset are evaluated. c, UMAP of dissociated scRNA-seq dataset with original author cell-type label superimposed. ET, extratelencephalic neurons; IT, intratelencephalic neurons; CT, corticothalamic neurons; NP, near-projecting neurons; OPC, oligodendrocyte precursor cells. d, Nicheformer can transfer spatial niche and region labels onto dissociated single-cell data. e, Nicheformer accurately classifies cells from the dissociated motor cortex to relevant cell types (n = 9 of 33 distinct ones in the classifier) trained on the whole mouse brain MERFISH dataset. f,g, Nicheformer correctly projects dissociated single cells to niche (f) and region (g) labels to provide spatially dependent labels. STRd, dorsal striatum; STRv, ventral striatum; RHP, retrohippocampal region; HIP, hippocampal formation; TH, thalamus. f, Nicheformer misclassified parts of layer 2/3 (L2/3) IT neurons as residing in the subpallium GABAergic niche (highlighted in the red box). Additionally, the deep cortical excitatory neurons L6b, L6 CT, L6 IT, and L6 IT Car3 (highlighted in the red box) should be classified as pallium glutamatergic niche instead of subpallium GABAergic by Nicheformer. g, Most of the non-neuronal cells (84.7% of all non-neuronal cells, n = 133) were misclassified as not belonging to the isocortex or the adjacent brain regions (highlighted in the red box). h, Cell-type abundances in the scRNA-seq dataset measuring the primary motor cortex in the mouse. i–k, Classification uncertainty of label transfer of the dissociated scRNA-seq dataset to the MERFISH mouse brain data for cell-type label (i), niche label (j) and region label (k) with a value of 0 representing a high uncertainty and 1 being a lower uncertainty, that is, high certainty. k, Observed high uncertainty for parts of the Glut and GABA neurons for the region prediction of the isocortex, CTXsp and OLF, which are neighboring brain regions.