Abstract
Spatial transcriptomics aims to elucidate how cells coordinate within tissues by connecting cellular states to their native microenvironments. Imaging-based assays are especially promising, capturing molecular and cellular features at subcellular resolution in three dimensions. Interpretation of such data, however, hinges on accurate cell segmentation. Assigning individual molecules to the correct cells remains challenging. Here we re-analyze data from multiple tissues and platforms to find that segmentation errors currently confound most downstream analysis of cellular state, including differential expression, neighbor influence and ligand–receptor interactions. The extent to which misassigned molecules impact the results can be striking, frequently dominating the results. Thus, we show that matrix factorization of local molecular neighborhoods can effectively identify and isolate such molecular admixtures, thereby reducing their impact on downstream analyses, in a manner analogous to doublet filtering in single-cell RNA sequencing. As the applications of spatial transcriptomics assays become more widespread, accounting for segmentation errors will be important for resolving molecular mechanisms of tissue biology.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The NSCLC CosMx dataset26 is publicly available at https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/nsclc-ffpe-dataset/. The NSCLC scRNA-seq dataset52 can be found at GEO accession GSE127465. The mouse hypothalamus MERFISH dataset23 is available at https://doi.org/10.5061/dryad.8t8s248. The mouse hypothalamus scRNA-seq dataset25 is available at GEO accession GSE113576. The mouse ileum MERFISH data can be found at https://doi.org/10.5061/dryad.jm63xsjb2. The mouse ileum scRNA-seq data53 can be found at GEO accession GSE92332. The pancreatic cancer Xenium data29 can be found at https://www.10xgenomics.com/datasets/ffpe-human-pancreas-with-xenium-multimodal-cell-segmentation-1-standard. The corresponding pancreatic cancer snRNA-seq dataset54 we used can be found at http://singlecell.charite.de/cellbrowser/pancreas/. The breast cancer Xenium data27 were downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast and the matching scRNA-seq data10 can be found at GEO accession GSM7782698. The ovarian cancer dataset28 was downloaded from https://www.10xgenomics.com/datasets/xenium-prime-ffpe-human-ovarian-cancer and the matching scRNA-seq data55 can be found at https://www.10xgenomics.com/datasets/17k-human-ovarian-cancer-scFFPE. Processed dataset files can be found at http://pklab.org/peterk/cellAdmix/.
Code availability
Our tool for removing admixture can be found on the GitHub repository at https://github.com/kharchenkolab/cellAdmix and on Zenodo56 at https://doi.org/10.5281/zenodo.17957138. Jupyter notebooks for running all analysis in the manuscript can be found on the GitHub repository at https://github.com/kharchenkolab/cellAdmix-Notebooks and on Zenodo51 at https://doi.org/10.5281/zenodo.17713750.
References
Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. 40, 308–318 (2022).
Armingol, E., Officer, A., Harismendy, O. & Lewis, N. E. Deciphering cell–cell interactions and communication from gene expression. Nat. Rev. Genet. 22, 71–88 (2021).
Fang, S. et al. Computational approaches and challenges in spatial transcriptomics. Genomics Proteomics Bioinformatics 21, 24–47 (2023).
Wu, L. et al. An invasive zone in human liver cancer identified by Stereo-seq promotes hepatocyte–tumor cell crosstalk, local immunosuppression and tumor progression. Cell Res. 33, 585–603 (2023).
Liu, S. et al. Spatially mapping T cell receptors and transcriptomes reveals distinct immune niches and interactions underlying the adaptive immune response. Immunity 55, 1940–1952 (2022).
Avraham-Davidi, I. et al. Spatially defined multicellular functional units in colorectal cancer revealed from single cell and spatial transcriptomics. eLife 14, RP104815 (2025).
Zhao, T. et al. Spatial genomics enables multi-modal study of clonal heterogeneity in tissues. Nature 601, 85–91 (2022).
Toninelli, M., Rossetti, G. & Pagani, M. Charting the tumor microenvironment with spatial profiling technologies. Trends Cancer 9, 1085–1096 (2023).
Hirz, T. et al. Dissecting the immune suppressive human prostate tumor microenvironment via integrated single-cell and spatial transcriptomic analyses. Nat. Commun. 14, 663 (2023).
Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat. Commun. 14, 8353 (2023).
Chen, J. H. et al. Human lung cancer harbors spatially organized stem-immunity hubs associated with response to immunotherapy. Nat. Immunol. 25, 644–658 (2024).
Allen, W. E., Blosser, T. R., Sullivan, Z. A., Dulac, C. & Zhuang, X. Molecular and spatial signatures of mouse brain aging at single-cell resolution. Cell 186, 194–208 (2023).
Kilfeather, P. et al. Single-cell spatial transcriptomic and translatomic profiling of dopaminergic neurons in health, aging, and disease. Cell Rep. 43, 113784 (2024).
Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).
Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792 (2022).
Farah, E. N. et al. Spatially organized cellular communities form the developing human heart. Nature 627, 854–864 (2024).
Tian, L., Chen, F. & Macosko, E. Z. Moving genomics into tissues. Nat. Biotechnol. 41, 773–782 (2023).
Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40, 517–526 (2022).
Moffitt, J. R. et al. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc. Natl Acad. Sci. USA 113, 11046–11051 (2016).
Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).
Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).
Greenwald, N. F. et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat. Biotechnol. 40, 555–565 (2022).
Petukhov, V. et al. Cell segmentation in imaging-based spatial transcriptomics. Nat. Biotechnol. 40, 345–354 (2022).
Park, J. et al. Cell segmentation-free inference of cell types from in situ transcriptomics data. Nat. Commun. 12, 3545 (2021).
Moffitt, J. R. et al. Molecular, spatial and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).
He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).
10x Genomics. High Resolution Mapping of the Breast Cancer Tumor Microenvironment Using Integrated Single Cell, Spatial and In Situ Analysis of FFPE Tissue (10x Genomics, accessed 2 February 2025); https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast
10x Genomics. FFPE Human Ovarian Cancer with 5K Human Pan Tissue and Pathways Panel Plus 100 Custom Genes (10x Genomics, accessed 6 February 2025); https://www.10xgenomics.com/datasets/xenium-prime-ffpe-human-ovarian-cancer
10x Genomics. FFPE Human Pancreas with Xenium Multimodal Cell Segmentation (10x Genomics, accessed 22 May 2024); https://www.10xgenomics.com/datasets/ffpe-human-pancreas-with-xenium-multimodal-cell-segmentation-1-standard
Fu, X. et al. BIDCell: biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data. Nat. Commun. 15, 509 (2024).
Jones, D. C. et al. Cell simulation as cell segmentation. Nat. Methods 22, 1331–1342 (2025).
Arnol, D., Schapiro, D., Bodenmiller, B., Saez-Rodriguez, J. & Stegle, O. Modeling cell–cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211 (2019).
Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 78 (2021).
Garcia-Alonso, L. et al. Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro. Nat. Genet. 53, 1698–1711 (2021).
Tanevski, J., Flores, R. O. R., Gabor, A., Schapiro, D. & Saez-Rodriguez, J. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol. 23, 97 (2022).
Jerby-Arnon, L. & Regev, A. DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data. Nat. Biotechnol. 40, 1467–1477 (2022).
Shao, X. et al. Knowledge-graph-based cell-cell communication inference for spatially resolved transcriptomic data with SpaTalk. Nat. Commun. 13, 4429 (2022).
Cang, Z. et al. Screening cell–cell communication in spatial transcriptomics via collective optimal transport. Nat. Methods 20, 218–228 (2023).
Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat. Commun. 14, 7739 (2023).
Fischer, D. S., Schaar, A. C. & Theis, F. J. Modeling intercellular communication in tissues using spatial graphs of cells. Nat. Biotechnol. 41, 332–336 (2023).
Mason, K. et al. Niche-DE: niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell–cell interactions. Genome Biol. 25, 14 (2024).
Mitchel, J. et al. Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition. Nat. Biotechnol. 43, 1192–1201 (2025).
Chen, C. et al. Role of cancer‑associated fibroblasts in the resistance to antitumor therapy, and their potential therapeutic mechanisms in non‑small cell lung cancer. Oncol. Lett. 21, 413 (2021).
Wang, F.-T., Sun, W., Zhang, J.-T. & Fan, Y.-Z. Cancer‑associated fibroblast regulation of tumor neo‑angiogenesis as a therapeutic target in cancer. Oncol. Lett. 17, 3055–3065 (2019).
Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).
Wang, Y. et al. iTALK: an R package to characterize and illustrate intercellular communication. Preprint at bioRxiv 10.1101/507871 (2019).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at arXiv https://doi.org/10.48550/arXiv.1412.6980 (2017).
Gaujoux, R. & Seoighe, C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 11, 367 (2010).
Wang, G., Kossenkov, A. V. & Ochs, M. F. LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates. BMC Bioinformatics 7, 175 (2006).
Mitchel, J., Gao, T., Petukhov, V. & Kharchenko, P. cellAdmix-Notebooks. Zenodo https://doi.org/10.5281/zenodo.17713750 (2025).
Zilionis, R. et al. Single cell transcriptomics of human and mouse lung cancers reveals conserved myeloid populations across individuals and species. Immunity 50, 1317–1334 (2019).
Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature 551, 333–339 (2017).
Tosti, L. et al. Single-nucleus and in situ RNA-sequencing reveal cell topographies in the human pancreas. Gastroenterology 160, 1330–1344 (2021).
10x Genomics. Human Ovarian Cancer FFPE Single Cell Gene Expression Flex (Next GEM) (10x Genomics, accessed 6 February 2025); https://www.10xgenomics.com/datasets/17k-human-ovarian-cancer-scFFPE
Mitchel, J., Gao, T., Petukhov, V. & Kharchenko, P. cellAdmix. Zenodo https://doi.org/10.5281/zenodo.17957138 (2025).
Acknowledgements
This work was supported by the National Institutes of Health (NIH R35GM127131). T.G. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (DRG-2552-25). We also thank S. Sunyaev for his mentorship and supervision of J.M.
Author information
Authors and Affiliations
Contributions
P.V.K. conceptualized the study and advised on all analyses as well as the correction procedure that was presented. J.M. analyzed the impact of segmentation errors and our correction procedure on downstream methods (DE, GSEA, LR analysis and multicellular covariation); designed and implemented the cell-bridging and membrane-based automated factor-annotation methods; benchmarked the factor-annotation methods; and organized the figures and wrote the figure legends with input from the other authors. T.G. and J.M. designed and implemented the NMF- and CRF-based correction procedure, with advice from P.V.K. and V.P., and generated admixture prevalence plots and schematic diagrams. T.G. designed and implemented the enrichment-based factor-annotation method and analyzed intracellular structures revealed by our method. V.P. developed the admixture probability score metric and applied it to compare segmentation methods and re-annotated datasets, matching cell types between corresponding spatial and scRNA-seq data. E.C. and J.M. designed and implemented the membrane separation scoring metric. E.C. carried out the neighborhood prediction analysis. J.M. and V.P. computed marker genes from the scRNA-seq datasets and created the example plots showing membrane and DAPI stains. J.M., V.P. and T.G. created the cellAdmix R package. All authors contributed to writing the manuscript.
Corresponding author
Ethics declarations
Competing interests
P.V.K. was an employee of Altos Labs. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Sergio Marco Salas and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Prevalence of foreign cell type markers in scRNA-seq data compared to spatial, after adjusting for modality-specific differences.
a, To demonstrate that the spatial modality has more admixing compared to scRNA-seq, we computed the expression of malignant marker genes in each cell type. The bar height shows the total expression of select malignant marker genes after correcting for modality-specific expression differences of these markers as identified in the malignant cell type. This is calculated by first computing the mean fractional expression of each marker in each cell type. Expression for a marker is normalized by its ratio of spatial to scRNA expression in malignant cells, and these values are summed over all markers (making the first two bars equal height). b, As with a but for admixture of enterocyte markers into other cell types in the mouse ileum datasets.
Extended Data Fig. 2 Additional evidence of segmentation errors inducing cell-type admixture in two Xenium datasets.
a, Pancreas cell type markers identified in scRNA-seq data. Dot plot of top markers, showing relative gene expression in scRNA-seq data (left) or the corresponding spatial dataset (right). b, Two segmented endothelial cells showing all molecules assigned to each cell. Color and shape indicate if the molecule is a marker gene identified in the scRNA-seq data for endothelial cells or stellate cells (source of admixture). Purple and green background colors correspond to DAPI and membrane stain intensities, respectively. c, Comparing the dataset-wide prevalence of stellate cell admixture quantity into each cell type for the spatial data versus scRNA-seq data. Admixture quantity (color) is calculated as the percent of a total cell’s transcripts coming from stellate cell marker genes (identified in scRNA-seq). The fraction of cells of a given type with each admixture degree is shown on the y-axis. d, Interaction-changed genes for endothelial cells for those located near stellate cells (n = 862) versus those not near stellate cells (n = 608). scRNA-seq marker genes of stellate cells and acinar cells are shown as red and blue colors, respectively. P-values were calculated from a two-sided Wilcoxon rank-sum test and were FDR adjusted. The marker enrichment p-values were calculated with a GSEA test (two sided). e, Xenium data from a pancreatic adenocarcinoma patient sample. Cell types are shown as different colored points. f, Ovarian cancer cell type markers identified in scRNA-seq data. Dot plot of top markers, showing relative gene expression in scRNA-seq data (left) or the corresponding spatial dataset (right). g, One example segmented macrophage cell, showing all molecules assigned to that cell. Color and shape indicate if the molecule was a marker gene identified in the scRNA-seq data for macrophage cells or proliferative malignant cells (source of admixture). Purple background corresponds to DAPI stain intensity, and green background corresponds to membrane stain intensity. h, Comparing the dataset-wide prevalence of stromal fibroblast cell admixture quantity into each cell type for the spatial data versus scRNA-seq data. Admixture quantity (color) is calculated as the percent of a total cell’s transcripts coming from stromal fibroblast cell marker genes (identified in scRNA-seq). The fraction of cells of a given type with each admixture degree is shown on the y-axis.
Extended Data Fig. 3 Effects of segmentation errors on differential expression in a mouse hypothalamus MERFISH dataset.
a, Hypothalamus cell type markers identified in scRNA-seq data. Dot plot of top markers, showing relative gene expression in scRNA-seq data (left) or the corresponding spatial dataset (right). b, Hypothalamus data for one mouse. Cell types are shown as different colored points. Circled regions indicate which contexts were used for comparison in differential expression (red is the ACA region, which is compared to all others). c, Cell-type composition inside and out of the ACA region. Composition is shown as the fraction of all cells in a region coming from cells of a given cell type. d, Differential expression of microglia between ACA and other regions. scRNA-seq marker genes of mature oligodendrocytes cells and inhibitory neurons are shown as red and blue colors, respectively. P-values were calculated from a two-sided Wilcoxon rank-sum test and were FDR adjusted. The marker enrichment p-values were calculated with a GSEA test (two-sided). ACA, anterior commissure area.
Extended Data Fig. 4 Benchmarking different segmentation methods with an admixture probability score.
a, Admixture probability scores for the mouse ileum dataset using three different segmentations (Baysor, ComSeg, and ProSeg). The scores for each cell type are shown as the trimmed mean scores relative to the original Baysor segmentation. We also show the scores for our cleaning procedure applied to each different segmentation. Box plot centerlines represent the median values, limits are the first and third quartiles, and whiskers end at the limits ±1.5 times the interquartile range. b, Admixture probability scores for the ovarian cancer dataset using four different segmentations (Xenium, Baysor, BIDCell, and ProSeg). The scores for each cell type are shown as the trimmed mean scores relative to the original Xenium segmentation. We also show the scores for our cleaning procedure applied to each different segmentation. Box plots are constructed the same way as in a.
Extended Data Fig. 5 Benchmarking different parameters for NMF and different strategies for automated factor annotation.
a, For each dataset, we applied a cleaning procedure with the listed parameters. For each cell type, we then calculated the trimmed mean of admixture probability over all cells and divided this value by the trimmed mean of admixture probabilities from the uncleaned data (cell types with an uncleaned mean less than .001 were excluded). We also compared the per-cell type (‘ct’) NMF versus all cell types together (‘joint’) NMF over 5, 15, 20, or 30 factors. Each combination of NMF type and number of factors was tested with each of the three automated factor-annotation methods (enrichment, membrane, or cell-bridging). Box plot centerlines represent the median values, limits are the first and third quartiles, and whiskers end at the limits ±1.5 times the interquartile range.
Extended Data Fig. 6 Admixture scores per cell in all datasets after cleaning with joint NMF, CRF, and the cell-bridging automated factor annotation.
a, Admixture scores were computed per cell and are shown for the original versus cleaned cells of each cell type in the leftmost panels. The middle panels illustrate the number of molecules per cell before and after cleaning to highlight the small extent to which coverage is affected. The rightmost panels show the admixture score ratios per cell, which are computed by dividing the score after cleaning by the score before cleaning. Thus, cells with scores below 1 saw a reduction in admixture after cleaning. Cell types with an uncleaned mean less than .001 were excluded. The number of factors used for each dataset (top to bottom) was 20, 20, 20, 15, 15. Two-sample t-tests (two-sided) were used to compute p-values in comparisons of scores pre- versus post-cleaning. The p-values for the ovarian cancer dataset comparisons are: malignant cells lining cyst P = 2.2 × 10−7, T & NK cells P = 2.1 × 10−209, macrophages P = 1.1 × 10−4,034, fallopian tube epithelium P = 6.6 × 10−1, granulosa cells P = 1.7 × 10−5, smooth muscle cells P = 3.2 × 10−105, stromal associated fibroblasts P = 2.4 × 10−78, endothelial cells P = 1.3 × 10−303, pericytes P = 1.2 × 10−105, tumor associated fibroblasts P = 5.3 × 10−1,474, ciliated epithelial cells P = 1.3 × 10−5, inflammatory tumor cells P = 9.9 × 10−1, tumor cells P = 1.7 × 10−734, VEGFA+ tumor cells P = 2.3 × 10−15, proliferative tumor cells P = 2.8 × 10−407. The p-values for the mouse ileum dataset are: tuft P = 5.0 × 10−2, goblet P = 7.3 × 10−39, stem + TA P = 1.4 × 10−101, enterocyte P = 2.3 × 10−26. The p-values for the breast cancer dataset are: B cells P = 1.8 × 10−41, CD4+ T cells P = 2.9 × 10−2,790, plasma P = 6.8 × 10−1,591, mast cells P = 3.5 × 10−4, CD8+ T cells P = 1.1 × 10−2,317, macrophages2 P = 4.5 × 10−237, stromal P = 6.2 × 10−3,318, myoepi ACTA2+ P = 5.6 × 10−187, macrophages1 P = 1.7 × 10−2,453, IRF7+ DCs P = 5.6 × 10−19, perivascular-like P = 2.2 × 10−85, endothelial P = 1.1 × 10−855, LAMP3+ DCs P = 1.9 × 10−7. The p-values for the mouse hypothalamus dataset are: microglia P = 1.9 × 10−56, endothelial P = 1.7 × 10−101, OD immature P = 2.4 × 10−24, astrocyte P = 3.2 × 10−34, excitatory P = 4.6 × 10−73, inhibitory P = 4.1 × 10−253. The p-values for the pancreatic cancer dataset are: ductal P = 2.3 × 10−278, macrophage P = 6.5 × 10−105, endothelial P = 5.3 × 10−124, acinar P = 2.5 × 10−1,006, alpha–beta–delta–gamma P = 1.9 × 10−13, stellate P = 8.9 × 10−45. Box plot centerlines represent the median values, limits are the first and third quartiles, and whiskers end at the limits ±1.5 times the interquartile range.
Extended Data Fig. 7 Additional examples demonstrating the impact of the cleaning procedure on downstream analyses.
a, Fraction of native or foreign (malignant) markers removed per cell from NSCLC fibroblast cells with the cleaning procedure. Only considering cells that started with at least 10 molecules of foreign or malignant markers, respectively. Far more cells had a large proportion of malignant markers removed compared to fibroblast markers, demonstrating how the procedure minimizes alterations to native transcripts. b, Impact of data cleaning on goblet DE for crypt (stem + TA enriched) versus villi (enterocyte enriched) regions in the mouse ileum dataset. scRNA-seq markers of stem + TA cells are colored red and markers of enterocytes are colored blue. c, Impact of data cleaning on macrophage DE for tumor versus stroma regions in the ovarian cancer dataset. scRNA-seq markers of proliferative malignant cells are colored orange, malignant cells in red, and stromal fibroblasts in blue. d, Impact of data cleaning on microglia DE for ACA versus other regions in the mouse hypothalamus dataset. scRNA-seq markers of mature oligodendrocytes and inhibitory neurons are colored red and blue, respectively. e, Interaction-changed genes for endothelial cells for those located near stellate cells (n = 862) versus those not near stellate cells (n = 608) in the pancreatic cancer dataset. scRNA-seq marker genes of stellate cells and acinar cells are shown as red and blue colors, respectively. The p-value is from GSEA to test for enrichment of marker genes among top DE genes. f, Impact of cleaning on scITD results in the mouse ileum dataset. Showing goblet expression-factor 2 associations for pre versus postcleaning. scRNA-seq markers of stem + TA cells are colored red and markers of enterocytes are colored blue. g, GSEA results for goblet DE compared between crypts versus villi in the mouse ileum data. P-values were FDR adjusted. h, GSEA results for macrophage DE compared between tumor versus stroma regions in the ovarian cancer dataset. P-values were FDR adjusted.
Extended Data Fig. 8 NMF factors can also extract patterns of biological intracellular variation.
a, Visualization of subcellular compartments within enterocyte cells from the mouse ileum dataset. Colors indicate molecules assigned to different compartments/factors by CRF. b, Top NMF factor loadings for NMF run on enterocytes. For each NMF factor, the left panel shows factor loadings on genes. The right panel shows a heatmap of normalized gene expression (z-scores) in different cell types, where blue indicates low expression and red indicates high expression. c, Top NMF factor loadings for NMF run on US-O2 osteosarcoma cell line dataset. For each NMF factor, the left panel shows factor loadings on genes. d, Visualization of subcellular compartments within US-O2 osteosarcoma cells.
Supplementary information
Supplementary Information
Supplementary Note.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mitchel, J., Gao, T., Petukhov, V. et al. Impact and correction of segmentation errors in spatial transcriptomics. Nat Genet (2026). https://doi.org/10.1038/s41588-025-02497-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41588-025-02497-4


