Fig. 6: Application of stLearn stSME imputation to spatial datasets with morphological information.

a Schematic showing stSME integration of three data types (imaging morphology (I), gene expression (G) and spatial location/distance (D). stSME finds biologically relevant reference spots, to then adjust existing spots, or predict gene expression for new spots (pseudo-spots) by imputation. b Rescue of dropout (zero values; blue arrows) by stSME for gene markers of the Cornu Ammonis (CA) 3 (Lhfpl1) and dentate gyrus (DG; Pla2g2f) regions of the mouse hippocampus. Note that the imputation is specific to biologically relevant spots. c Effects of imputation on library size (total gene counts per spot; top), and the number of spots with missing values (bottom). d Simulation approach assessing stSME imputation performance using mouse brain Visium ST data. Louvain clustering was performed with imputed values after randomly removing 20% of values from the original (log transformed UMI counts) data as a ’leave-out’ validation strategy. Note that clusters without stSME imputation are much noisier, and also that the hippocampal CA1 (cluster 6) and CA3 (cluster 17) sub-regions could not be separated (white arrows). e Box plot showing poorer clustering results when stSME is not used, as assessed by adjusted Rand index (ARI; data was randomly subsampled 80% from 2702 spots of a brain section, with a total of n = 10 simulations). ARI was calculated using the full data clustering results as the reference. f Robustness and performance of stSME imputation method for the top-2000 highly variable genes (HVGs) across two replicate sections of the Visium human breast cancer ST dataset (10x Genomics; Block A, sections 1 and 2; see “Methods” section for details). Data points are the spatial autocorrelation (Moran’s I index) for the same set of imputed HVGs in section 1 (x-axis) and section 2 (y-axis); colour coding reflects sparsity of the gene in the original UMI count matrix. g Imputation of gene expression in regions without data (i.e. array gaps) improves tissue coverage and clustering in human breast cancer samples. Bottom images show zoomed-in displays of boxed DCIS boundary region, showing cluster location and expression of breast cancer markers SFRP2 and MGP (abundant in DCIS).