Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Impact and correction of segmentation errors in spatial transcriptomics

Abstract

Spatial transcriptomics aims to elucidate how cells coordinate within tissues by connecting cellular states to their native microenvironments. Imaging-based assays are especially promising, capturing molecular and cellular features at subcellular resolution in three dimensions. Interpretation of such data, however, hinges on accurate cell segmentation. Assigning individual molecules to the correct cells remains challenging. Here we re-analyze data from multiple tissues and platforms to find that segmentation errors currently confound most downstream analysis of cellular state, including differential expression, neighbor influence and ligand–receptor interactions. The extent to which misassigned molecules impact the results can be striking, frequently dominating the results. Thus, we show that matrix factorization of local molecular neighborhoods can effectively identify and isolate such molecular admixtures, thereby reducing their impact on downstream analyses, in a manner analogous to doublet filtering in single-cell RNA sequencing. As the applications of spatial transcriptomics assays become more widespread, accounting for segmentation errors will be important for resolving molecular mechanisms of tissue biology.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Segmentation errors cause admixture of molecules between adjacent cells.
Fig. 2: Segmentation errors confound context-dependent DE results.
Fig. 3: Segmentation errors can cause false-positive LR inferences and confound multicell-type expression covariation across spatial regions.
Fig. 4: Factorization-based correction method minimized admixture from segmentation errors while preserving biological variability.
Fig. 5: The correction procedure reduces the impact of segmentation errors on common downstream analyses in the NSCLC dataset.

Similar content being viewed by others

Data availability

The NSCLC CosMx dataset26 is publicly available at https://nanostring.com/products/cosmx-spatial-molecular-imager/ffpe-dataset/nsclc-ffpe-dataset/. The NSCLC scRNA-seq dataset52 can be found at GEO accession GSE127465. The mouse hypothalamus MERFISH dataset23 is available at https://doi.org/10.5061/dryad.8t8s248. The mouse hypothalamus scRNA-seq dataset25 is available at GEO accession GSE113576. The mouse ileum MERFISH data can be found at https://doi.org/10.5061/dryad.jm63xsjb2. The mouse ileum scRNA-seq data53 can be found at GEO accession GSE92332. The pancreatic cancer Xenium data29 can be found at https://www.10xgenomics.com/datasets/ffpe-human-pancreas-with-xenium-multimodal-cell-segmentation-1-standard. The corresponding pancreatic cancer snRNA-seq dataset54 we used can be found at http://singlecell.charite.de/cellbrowser/pancreas/. The breast cancer Xenium data27 were downloaded from https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast and the matching scRNA-seq data10 can be found at GEO accession GSM7782698. The ovarian cancer dataset28 was downloaded from https://www.10xgenomics.com/datasets/xenium-prime-ffpe-human-ovarian-cancer and the matching scRNA-seq data55 can be found at https://www.10xgenomics.com/datasets/17k-human-ovarian-cancer-scFFPE. Processed dataset files can be found at http://pklab.org/peterk/cellAdmix/.

Code availability

Our tool for removing admixture can be found on the GitHub repository at https://github.com/kharchenkolab/cellAdmix and on Zenodo56 at https://doi.org/10.5281/zenodo.17957138. Jupyter notebooks for running all analysis in the manuscript can be found on the GitHub repository at https://github.com/kharchenkolab/cellAdmix-Notebooks and on Zenodo51 at https://doi.org/10.5281/zenodo.17713750.

References

  1. Palla, G., Fischer, D. S., Regev, A. & Theis, F. J. Spatial components of molecular tissue biology. Nat. Biotechnol. 40, 308–318 (2022).

    Article  PubMed  CAS  Google Scholar 

  2. Armingol, E., Officer, A., Harismendy, O. & Lewis, N. E. Deciphering cell–cell interactions and communication from gene expression. Nat. Rev. Genet. 22, 71–88 (2021).

    Article  PubMed  CAS  Google Scholar 

  3. Fang, S. et al. Computational approaches and challenges in spatial transcriptomics. Genomics Proteomics Bioinformatics 21, 24–47 (2023).

    Article  PubMed  Google Scholar 

  4. Wu, L. et al. An invasive zone in human liver cancer identified by Stereo-seq promotes hepatocyte–tumor cell crosstalk, local immunosuppression and tumor progression. Cell Res. 33, 585–603 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Liu, S. et al. Spatially mapping T cell receptors and transcriptomes reveals distinct immune niches and interactions underlying the adaptive immune response. Immunity 55, 1940–1952 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Avraham-Davidi, I. et al. Spatially defined multicellular functional units in colorectal cancer revealed from single cell and spatial transcriptomics. eLife 14, RP104815 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Zhao, T. et al. Spatial genomics enables multi-modal study of clonal heterogeneity in tissues. Nature 601, 85–91 (2022).

    Article  PubMed  CAS  Google Scholar 

  8. Toninelli, M., Rossetti, G. & Pagani, M. Charting the tumor microenvironment with spatial profiling technologies. Trends Cancer 9, 1085–1096 (2023).

    Article  PubMed  CAS  Google Scholar 

  9. Hirz, T. et al. Dissecting the immune suppressive human prostate tumor microenvironment via integrated single-cell and spatial transcriptomic analyses. Nat. Commun. 14, 663 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  10. Janesick, A. et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat. Commun. 14, 8353 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Chen, J. H. et al. Human lung cancer harbors spatially organized stem-immunity hubs associated with response to immunotherapy. Nat. Immunol. 25, 644–658 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Allen, W. E., Blosser, T. R., Sullivan, Z. A., Dulac, C. & Zhuang, X. Molecular and spatial signatures of mouse brain aging at single-cell resolution. Cell 186, 194–208 (2023).

    Article  PubMed  CAS  Google Scholar 

  13. Kilfeather, P. et al. Single-cell spatial transcriptomic and translatomic profiling of dopaminergic neurons in health, aging, and disease. Cell Rep. 43, 113784 (2024).

    Article  PubMed  CAS  Google Scholar 

  14. Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol. 39, 313–319 (2021).

    Article  PubMed  CAS  Google Scholar 

  15. Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 185, 1777–1792 (2022).

    Article  PubMed  CAS  Google Scholar 

  16. Farah, E. N. et al. Spatially organized cellular communities form the developing human heart. Nature 627, 854–864 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Tian, L., Chen, F. & Macosko, E. Z. Moving genomics into tissues. Nat. Biotechnol. 41, 773–782 (2023).

    Article  PubMed  CAS  Google Scholar 

  18. Cable, D. M. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat. Biotechnol. 40, 517–526 (2022).

    Article  PubMed  CAS  Google Scholar 

  19. Moffitt, J. R. et al. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc. Natl Acad. Sci. USA 113, 11046–11051 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science 361, eaat5691 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Stringer, C., Wang, T., Michaelos, M. & Pachitariu, M. Cellpose: a generalist algorithm for cellular segmentation. Nat. Methods 18, 100–106 (2021).

    Article  PubMed  CAS  Google Scholar 

  22. Greenwald, N. F. et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat. Biotechnol. 40, 555–565 (2022).

    Article  PubMed  CAS  Google Scholar 

  23. Petukhov, V. et al. Cell segmentation in imaging-based spatial transcriptomics. Nat. Biotechnol. 40, 345–354 (2022).

    Article  PubMed  CAS  Google Scholar 

  24. Park, J. et al. Cell segmentation-free inference of cell types from in situ transcriptomics data. Nat. Commun. 12, 3545 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Moffitt, J. R. et al. Molecular, spatial and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  26. He, S. et al. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 40, 1794–1806 (2022).

    Article  PubMed  CAS  Google Scholar 

  27. 10x Genomics. High Resolution Mapping of the Breast Cancer Tumor Microenvironment Using Integrated Single Cell, Spatial and In Situ Analysis of FFPE Tissue (10x Genomics, accessed 2 February 2025); https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast

  28. 10x Genomics. FFPE Human Ovarian Cancer with 5K Human Pan Tissue and Pathways Panel Plus 100 Custom Genes (10x Genomics, accessed 6 February 2025); https://www.10xgenomics.com/datasets/xenium-prime-ffpe-human-ovarian-cancer

  29. 10x Genomics. FFPE Human Pancreas with Xenium Multimodal Cell Segmentation (10x Genomics, accessed 22 May 2024); https://www.10xgenomics.com/datasets/ffpe-human-pancreas-with-xenium-multimodal-cell-segmentation-1-standard

  30. Fu, X. et al. BIDCell: biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data. Nat. Commun. 15, 509 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Jones, D. C. et al. Cell simulation as cell segmentation. Nat. Methods 22, 1331–1342 (2025).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Arnol, D., Schapiro, D., Bodenmiller, B., Saez-Rodriguez, J. & Stegle, O. Modeling cell–cell interactions from spatial molecular data with spatial variance component analysis. Cell Rep. 29, 202–211 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 22, 78 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Garcia-Alonso, L. et al. Mapping the temporal and spatial dynamics of the human endometrium in vivo and in vitro. Nat. Genet. 53, 1698–1711 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Tanevski, J., Flores, R. O. R., Gabor, A., Schapiro, D. & Saez-Rodriguez, J. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol. 23, 97 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Jerby-Arnon, L. & Regev, A. DIALOGUE maps multicellular programs in tissue from single-cell or spatial transcriptomics data. Nat. Biotechnol. 40, 1467–1477 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Shao, X. et al. Knowledge-graph-based cell-cell communication inference for spatially resolved transcriptomic data with SpaTalk. Nat. Commun. 13, 4429 (2022).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Cang, Z. et al. Screening cell–cell communication in spatial transcriptomics via collective optimal transport. Nat. Methods 20, 218–228 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat. Commun. 14, 7739 (2023).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Fischer, D. S., Schaar, A. C. & Theis, F. J. Modeling intercellular communication in tissues using spatial graphs of cells. Nat. Biotechnol. 41, 332–336 (2023).

    Article  PubMed  CAS  Google Scholar 

  41. Mason, K. et al. Niche-DE: niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell–cell interactions. Genome Biol. 25, 14 (2024).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Mitchel, J. et al. Coordinated, multicellular patterns of transcriptional variation that stratify patient cohorts are revealed by tensor decomposition. Nat. Biotechnol. 43, 1192–1201 (2025).

    Article  PubMed  CAS  Google Scholar 

  43. Chen, C. et al. Role of cancer‑associated fibroblasts in the resistance to antitumor therapy, and their potential therapeutic mechanisms in non‑small cell lung cancer. Oncol. Lett. 21, 413 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Wang, F.-T., Sun, W., Zhang, J.-T. & Fan, Y.-Z. Cancer‑associated fibroblast regulation of tumor neo‑angiogenesis as a therapeutic target in cancer. Oncol. Lett. 17, 3055–3065 (2019).

    PubMed  PubMed Central  CAS  Google Scholar 

  45. Barkas, N. et al. Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat. Methods 16, 695–698 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Wang, Y. et al. iTALK: an R package to characterize and illustrate intercellular communication. Preprint at bioRxiv 10.1101/507871 (2019).

  47. Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019).

  48. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at arXiv https://doi.org/10.48550/arXiv.1412.6980 (2017).

  49. Gaujoux, R. & Seoighe, C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 11, 367 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Wang, G., Kossenkov, A. V. & Ochs, M. F. LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates. BMC Bioinformatics 7, 175 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Mitchel, J., Gao, T., Petukhov, V. & Kharchenko, P. cellAdmix-Notebooks. Zenodo https://doi.org/10.5281/zenodo.17713750 (2025).

  52. Zilionis, R. et al. Single cell transcriptomics of human and mouse lung cancers reveals conserved myeloid populations across individuals and species. Immunity 50, 1317–1334 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Haber, A. L. et al. A single-cell survey of the small intestinal epithelium. Nature 551, 333–339 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  54. Tosti, L. et al. Single-nucleus and in situ RNA-sequencing reveal cell topographies in the human pancreas. Gastroenterology 160, 1330–1344 (2021).

    Article  PubMed  CAS  Google Scholar 

  55. 10x Genomics. Human Ovarian Cancer FFPE Single Cell Gene Expression Flex (Next GEM) (10x Genomics, accessed 6 February 2025); https://www.10xgenomics.com/datasets/17k-human-ovarian-cancer-scFFPE

  56. Mitchel, J., Gao, T., Petukhov, V. & Kharchenko, P. cellAdmix. Zenodo https://doi.org/10.5281/zenodo.17957138 (2025).

Download references

Acknowledgements

This work was supported by the National Institutes of Health (NIH R35GM127131). T.G. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (DRG-2552-25). We also thank S. Sunyaev for his mentorship and supervision of J.M.

Author information

Authors and Affiliations

Authors

Contributions

P.V.K. conceptualized the study and advised on all analyses as well as the correction procedure that was presented. J.M. analyzed the impact of segmentation errors and our correction procedure on downstream methods (DE, GSEA, LR analysis and multicellular covariation); designed and implemented the cell-bridging and membrane-based automated factor-annotation methods; benchmarked the factor-annotation methods; and organized the figures and wrote the figure legends with input from the other authors. T.G. and J.M. designed and implemented the NMF- and CRF-based correction procedure, with advice from P.V.K. and V.P., and generated admixture prevalence plots and schematic diagrams. T.G. designed and implemented the enrichment-based factor-annotation method and analyzed intracellular structures revealed by our method. V.P. developed the admixture probability score metric and applied it to compare segmentation methods and re-annotated datasets, matching cell types between corresponding spatial and scRNA-seq data. E.C. and J.M. designed and implemented the membrane separation scoring metric. E.C. carried out the neighborhood prediction analysis. J.M. and V.P. computed marker genes from the scRNA-seq datasets and created the example plots showing membrane and DAPI stains. J.M., V.P. and T.G. created the cellAdmix R package. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Peter V. Kharchenko.

Ethics declarations

Competing interests

P.V.K. was an employee of Altos Labs. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Sergio Marco Salas and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Prevalence of foreign cell type markers in scRNA-seq data compared to spatial, after adjusting for modality-specific differences.

a, To demonstrate that the spatial modality has more admixing compared to scRNA-seq, we computed the expression of malignant marker genes in each cell type. The bar height shows the total expression of select malignant marker genes after correcting for modality-specific expression differences of these markers as identified in the malignant cell type. This is calculated by first computing the mean fractional expression of each marker in each cell type. Expression for a marker is normalized by its ratio of spatial to scRNA expression in malignant cells, and these values are summed over all markers (making the first two bars equal height). b, As with a but for admixture of enterocyte markers into other cell types in the mouse ileum datasets.

Extended Data Fig. 2 Additional evidence of segmentation errors inducing cell-type admixture in two Xenium datasets.

a, Pancreas cell type markers identified in scRNA-seq data. Dot plot of top markers, showing relative gene expression in scRNA-seq data (left) or the corresponding spatial dataset (right). b, Two segmented endothelial cells showing all molecules assigned to each cell. Color and shape indicate if the molecule is a marker gene identified in the scRNA-seq data for endothelial cells or stellate cells (source of admixture). Purple and green background colors correspond to DAPI and membrane stain intensities, respectively. c, Comparing the dataset-wide prevalence of stellate cell admixture quantity into each cell type for the spatial data versus scRNA-seq data. Admixture quantity (color) is calculated as the percent of a total cell’s transcripts coming from stellate cell marker genes (identified in scRNA-seq). The fraction of cells of a given type with each admixture degree is shown on the y-axis. d, Interaction-changed genes for endothelial cells for those located near stellate cells (n = 862) versus those not near stellate cells (n = 608). scRNA-seq marker genes of stellate cells and acinar cells are shown as red and blue colors, respectively. P-values were calculated from a two-sided Wilcoxon rank-sum test and were FDR adjusted. The marker enrichment p-values were calculated with a GSEA test (two sided). e, Xenium data from a pancreatic adenocarcinoma patient sample. Cell types are shown as different colored points. f, Ovarian cancer cell type markers identified in scRNA-seq data. Dot plot of top markers, showing relative gene expression in scRNA-seq data (left) or the corresponding spatial dataset (right). g, One example segmented macrophage cell, showing all molecules assigned to that cell. Color and shape indicate if the molecule was a marker gene identified in the scRNA-seq data for macrophage cells or proliferative malignant cells (source of admixture). Purple background corresponds to DAPI stain intensity, and green background corresponds to membrane stain intensity. h, Comparing the dataset-wide prevalence of stromal fibroblast cell admixture quantity into each cell type for the spatial data versus scRNA-seq data. Admixture quantity (color) is calculated as the percent of a total cell’s transcripts coming from stromal fibroblast cell marker genes (identified in scRNA-seq). The fraction of cells of a given type with each admixture degree is shown on the y-axis.

Extended Data Fig. 3 Effects of segmentation errors on differential expression in a mouse hypothalamus MERFISH dataset.

a, Hypothalamus cell type markers identified in scRNA-seq data. Dot plot of top markers, showing relative gene expression in scRNA-seq data (left) or the corresponding spatial dataset (right). b, Hypothalamus data for one mouse. Cell types are shown as different colored points. Circled regions indicate which contexts were used for comparison in differential expression (red is the ACA region, which is compared to all others). c, Cell-type composition inside and out of the ACA region. Composition is shown as the fraction of all cells in a region coming from cells of a given cell type. d, Differential expression of microglia between ACA and other regions. scRNA-seq marker genes of mature oligodendrocytes cells and inhibitory neurons are shown as red and blue colors, respectively. P-values were calculated from a two-sided Wilcoxon rank-sum test and were FDR adjusted. The marker enrichment p-values were calculated with a GSEA test (two-sided). ACA, anterior commissure area.

Extended Data Fig. 4 Benchmarking different segmentation methods with an admixture probability score.

a, Admixture probability scores for the mouse ileum dataset using three different segmentations (Baysor, ComSeg, and ProSeg). The scores for each cell type are shown as the trimmed mean scores relative to the original Baysor segmentation. We also show the scores for our cleaning procedure applied to each different segmentation. Box plot centerlines represent the median values, limits are the first and third quartiles, and whiskers end at the limits ±1.5 times the interquartile range. b, Admixture probability scores for the ovarian cancer dataset using four different segmentations (Xenium, Baysor, BIDCell, and ProSeg). The scores for each cell type are shown as the trimmed mean scores relative to the original Xenium segmentation. We also show the scores for our cleaning procedure applied to each different segmentation. Box plots are constructed the same way as in a.

Extended Data Fig. 5 Benchmarking different parameters for NMF and different strategies for automated factor annotation.

a, For each dataset, we applied a cleaning procedure with the listed parameters. For each cell type, we then calculated the trimmed mean of admixture probability over all cells and divided this value by the trimmed mean of admixture probabilities from the uncleaned data (cell types with an uncleaned mean less than .001 were excluded). We also compared the per-cell type (‘ct’) NMF versus all cell types together (‘joint’) NMF over 5, 15, 20, or 30 factors. Each combination of NMF type and number of factors was tested with each of the three automated factor-annotation methods (enrichment, membrane, or cell-bridging). Box plot centerlines represent the median values, limits are the first and third quartiles, and whiskers end at the limits ±1.5 times the interquartile range.

Extended Data Fig. 6 Admixture scores per cell in all datasets after cleaning with joint NMF, CRF, and the cell-bridging automated factor annotation.

a, Admixture scores were computed per cell and are shown for the original versus cleaned cells of each cell type in the leftmost panels. The middle panels illustrate the number of molecules per cell before and after cleaning to highlight the small extent to which coverage is affected. The rightmost panels show the admixture score ratios per cell, which are computed by dividing the score after cleaning by the score before cleaning. Thus, cells with scores below 1 saw a reduction in admixture after cleaning. Cell types with an uncleaned mean less than .001 were excluded. The number of factors used for each dataset (top to bottom) was 20, 20, 20, 15, 15. Two-sample t-tests (two-sided) were used to compute p-values in comparisons of scores pre- versus post-cleaning. The p-values for the ovarian cancer dataset comparisons are: malignant cells lining cyst P = 2.2 × 10−7, T & NK cells P = 2.1 × 10−209, macrophages P = 1.1 × 10−4,034, fallopian tube epithelium P = 6.6 × 10−1, granulosa cells P = 1.7 × 10−5, smooth muscle cells P = 3.2 × 10−105, stromal associated fibroblasts P = 2.4 × 10−78, endothelial cells P = 1.3 × 10−303, pericytes P = 1.2 × 10−105, tumor associated fibroblasts P = 5.3 × 10−1,474, ciliated epithelial cells P = 1.3 × 10−5, inflammatory tumor cells P = 9.9 × 10−1, tumor cells P = 1.7 × 10−734, VEGFA+ tumor cells P = 2.3 × 10−15, proliferative tumor cells P = 2.8 × 10−407. The p-values for the mouse ileum dataset are: tuft P = 5.0 × 10−2, goblet P = 7.3 × 10−39, stem + TA P = 1.4 × 10−101, enterocyte P = 2.3 × 10−26. The p-values for the breast cancer dataset are: B cells P = 1.8 × 10−41, CD4+ T cells P = 2.9 × 10−2,790, plasma P = 6.8 × 10−1,591, mast cells P = 3.5 × 10−4, CD8+ T cells P = 1.1 × 10−2,317, macrophages2 P = 4.5 × 10−237, stromal P = 6.2 × 10−3,318, myoepi ACTA2+ P = 5.6 × 10−187, macrophages1 P = 1.7 × 10−2,453, IRF7+ DCs P = 5.6 × 10−19, perivascular-like P = 2.2 × 10−85, endothelial P = 1.1 × 10−855, LAMP3+ DCs P = 1.9 × 10−7. The p-values for the mouse hypothalamus dataset are: microglia P = 1.9 × 10−56, endothelial P = 1.7 × 10−101, OD immature P = 2.4 × 10−24, astrocyte P = 3.2 × 10−34, excitatory P = 4.6 × 10−73, inhibitory P = 4.1 × 10−253. The p-values for the pancreatic cancer dataset are: ductal P = 2.3 × 10−278, macrophage P = 6.5 × 10−105, endothelial P = 5.3 × 10−124, acinar P = 2.5 × 10−1,006, alpha–beta–delta–gamma P = 1.9 × 10−13, stellate P = 8.9 × 10−45. Box plot centerlines represent the median values, limits are the first and third quartiles, and whiskers end at the limits ±1.5 times the interquartile range.

Extended Data Fig. 7 Additional examples demonstrating the impact of the cleaning procedure on downstream analyses.

a, Fraction of native or foreign (malignant) markers removed per cell from NSCLC fibroblast cells with the cleaning procedure. Only considering cells that started with at least 10 molecules of foreign or malignant markers, respectively. Far more cells had a large proportion of malignant markers removed compared to fibroblast markers, demonstrating how the procedure minimizes alterations to native transcripts. b, Impact of data cleaning on goblet DE for crypt (stem + TA enriched) versus villi (enterocyte enriched) regions in the mouse ileum dataset. scRNA-seq markers of stem + TA cells are colored red and markers of enterocytes are colored blue. c, Impact of data cleaning on macrophage DE for tumor versus stroma regions in the ovarian cancer dataset. scRNA-seq markers of proliferative malignant cells are colored orange, malignant cells in red, and stromal fibroblasts in blue. d, Impact of data cleaning on microglia DE for ACA versus other regions in the mouse hypothalamus dataset. scRNA-seq markers of mature oligodendrocytes and inhibitory neurons are colored red and blue, respectively. e, Interaction-changed genes for endothelial cells for those located near stellate cells (n = 862) versus those not near stellate cells (n = 608) in the pancreatic cancer dataset. scRNA-seq marker genes of stellate cells and acinar cells are shown as red and blue colors, respectively. The p-value is from GSEA to test for enrichment of marker genes among top DE genes. f, Impact of cleaning on scITD results in the mouse ileum dataset. Showing goblet expression-factor 2 associations for pre versus postcleaning. scRNA-seq markers of stem + TA cells are colored red and markers of enterocytes are colored blue. g, GSEA results for goblet DE compared between crypts versus villi in the mouse ileum data. P-values were FDR adjusted. h, GSEA results for macrophage DE compared between tumor versus stroma regions in the ovarian cancer dataset. P-values were FDR adjusted.

Extended Data Fig. 8 NMF factors can also extract patterns of biological intracellular variation.

a, Visualization of subcellular compartments within enterocyte cells from the mouse ileum dataset. Colors indicate molecules assigned to different compartments/factors by CRF. b, Top NMF factor loadings for NMF run on enterocytes. For each NMF factor, the left panel shows factor loadings on genes. The right panel shows a heatmap of normalized gene expression (z-scores) in different cell types, where blue indicates low expression and red indicates high expression. c, Top NMF factor loadings for NMF run on US-O2 osteosarcoma cell line dataset. For each NMF factor, the left panel shows factor loadings on genes. d, Visualization of subcellular compartments within US-O2 osteosarcoma cells.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mitchel, J., Gao, T., Petukhov, V. et al. Impact and correction of segmentation errors in spatial transcriptomics. Nat Genet (2026). https://doi.org/10.1038/s41588-025-02497-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41588-025-02497-4

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing