Abstract
Spatial omics (SO) produces high-definition mapping of subcellular molecules within tissue samples. Mapping transcripts to anatomical regions requires segmentation, but this remains challenging in tissue cross-sections with tubular structures like axons in peripheral nerve or spinal cord. Neural networks could address misidentification but are hindered by the need for extensive human annotations. We present SiDoLa-NS (Simulate, Don’t Label-Nervous System), an image-driven (top-down) approach to SO analysis in the nervous system. We utilize biophysical properties of tissue architectures to design synthetic images of tissue samples, eliminating reliance on manual annotation and enabling scalable training data generation. With synthetic samples, we trained supervised instance segmentation convolutional neural networks (CNNs) for nucleus segmentation, achieving precision and F1-scores>0.95. We further identify macroscopic tissue structures in mouse brain (mAP50=0.869), spinal cord (mAP50=0.96), and pig sciatic nerve (mAP50=0.995). This framework sets the stage for transferable models across species and tissue architectures—accelerating SO applications in neuroscience and beyond.
Similar content being viewed by others
Introduction
Spatial omics (SO) have revolutionized molecular mappings by enabling transcriptomic, proteomic, and metabolomic analyses within tissues1,2,3,4,5,6,7,8,9,10,11. The immense throughput has provided transcriptome- and proteome-wide data across assorted tissues, including the spinal cord12,13,14,15 and mouse brain, each with unique physiological properties16,17,18,19,20,21. Yet, they fall short when it comes to leveraging image-derived features that capture the nuanced spatial organization of tissues, due to challenges both in aligning molecular data with histological context and in extracting high-dimensional spatial patterns from tissue images. Addressing how molecular identities vary across tissue structures requires precision mapping and the integration of information across various scales: the cellular, regional, and whole tissue22,23.
Most applications of SO are molecule-driven, i.e. the spatial distributions of sets of transcripts are used to make distinctions24,25. These ‘bottom-up' approaches often employ clustering algorithms that learn from the variability in local and global gene expression patterns26,27,28,29,30,31,32,33. GASTON models the transcriptomic space as a topographic map34 and partitions tissues using piece-wise functions. BANKSY35 and NiCo36 consider transcriptomic features between neighboring cells to cluster in a spatially informed manner. Baysor37 defines cells by grouping key transcripts using Markov random fields and can incorporate spatial information to generate more precise clusters. Approaches like SpatialGlue38 and BrainAlign39 integrate multi-omics data and even enable cross-species regional alignment. These approaches continue to offer valuable insights, but precise expression-based whole-tissue segmentation can be enhanced by utilizing well-defined spatial markers from microscopy data at multiple scales37,38,39.
Top-Down approaches determine regional information such as anatomical structures, pathologies from image features, or by alignment/registration. This methodology allows researchers to ask questions about molecular changes in a visually defined region and to cross-validate bottom-up transcriptomic analyses. Existing tools use interpretable algorithms based on local pixel values40,41 or registration42,43. Recent methods leverage deep learning, including autoencoders44, convolutional neural networks45,46, transformers47,48,49,50, and combinations51,52,53 to segment and label relevant cells or tissue structures in immunofluorescence and histology datasets. Much focus is placed on developing methods to determine training techniques that maximize the potential of these models’ complex processing abilities54,55. These tools rely on either human labeling or computational labeling. Human-labeled data enhances interpretability but increases the time to produce training sets, can introduce errors/biases, lacks widely available data56, and may hinder novel discoveries57. Computational techniques enable nuanced embeddings but remove human oversight, degrading biological interpretability and noise sensitivity58,59. Despite this problem, there is a growing desire for tools that perform non-manual segmentation of biological data60. DeepSlice45 and FTUNet48 handle macro-scale atlas mapping and lesion segmentation, iStar47 and UniFMiR61 can enhance micro-scale data, and meso-scale analyses bridge tissue structures and the cells defining them62,63.
In this manuscript, we present SiDoLa-NS, a tool for tissue segmentation across micro-, meso-, and macro-scales of cell and tissue organization. By defining regional identity through image features alone, SiDoLa-NS can avoid biases from global gene expression and double dipping (i.e. defining and evaluating a region with the same gene set). We validate SiDoLa-NS with internal and publicly available brain, spinal cord, and peripheral nerve images and show the versatility of our tool across various platforms, including 10X Xenium, Visium SD/HD, H&E, and immunofluorescence sections. At the micro-scale, SiDoLa-NS performs cell and nuclei segmentation; at the meso-scale, it allows within- and between-fascicle analyses in the sciatic nerve; at the macro-scale, SiDoLa-NS defines brain regions based on the Allen Brain and Spinal Atlases64,65,66,67,68,69,70,71. SiDoLa-NS automatically annotates nervous system tissue and gives researchers ways to group the data across the scales and ask unbiased molecular questions: How do gene expression patterns of specific cell types vary across brain regions? What structural changes underlie molecular disruptions in disease? How do meso-scale features, such as clusters or bands of cells, relate to macro-scale organization in the nervous system?
But how can these models be trained without the burden of extensive manual annotations? By simulating training images which mimic tissue structure at each level of interest for identification of nuclei and cell boundaries, intermediate features (like clusters or bands of cells), and broader atlas-defined regions. This is distinct from diffusion and GAN-based approaches, which take, as input, real microscopic images for synthetic data generation. Our in-silico image simulations offer 1) full control over feature distributions, 2) imperfection and noise representation (yielding model robustness), and 3) access to a virtually infinite and annotator-error-free training set. This results in Omics datasets that are equally rich in quantifiable spatial and molecular data.
Results
Overview of SiDoLa-NS
We present SiDoLa-NS (Simulate, Don’t Label – Nervous System), a suite of tools designed for detection and classification of image features across a spectrum of nervous system tissues. SiDoLa-NS (pronounced See Doh La, think Do-Re-Mi) employs an image-based top-down approach, assigning labels to biologically defined regions of an image (Fig. 1A). SiDoLa-NS models are trained on fully synthetic/simulated images reflecting different staining modalities like immunohistochemistry (IHC) and immunofluorescence (IF), not by generative AI (Fig. 1B). Our approach generates high-resolution datasets and models imperfections in tissue samples72. For example, images from biological samples may contain 1) partially out-of-field, 2) broken, 3) poor focus/contrast/resolution, and 4) distorted or warped tissue sections. By simulating and representing these challenging sections, SiDoLa-NS can succeed where other models may encounter difficulties.
A Schematic illustrating the distinction between bottom-up and top-down methods. Top-Down approaches capture distinct image-based features and representations of a tissue. Bottom-Up approaches consider omics data (such as gene expression) to build molecular definitions of regions and cell types. B Simulations create a massive synthetic dataset by stringing together biophysical nodes that build Cells and Regions of the image and then transform them from 3D geometry into images with an optics engine. Synthetic micrographs are shown below the nodes, giving examples of variation and features for a mouse coronal brain slice. C A mouse coronal brain Nissl section from the Allen Brain Atlas, shown as the “original slice”, with cartoons below and the concept of multiple spatial scales where segmentation identifies individual instances of a variety of features. Specifically, this approach identifies atlas-based regions at a macroscopic scale, as well as cell density and other meso-scale patterning in the tissue and is also able to segment cells to incorporate subcellular micro-scale information, like cell type and structure. D SiDoLa-NS was tested on three nervous system tissue types including mouse brain and spinal cord, as well as porcine sciatic nerve (a human-scale nerve cross section).
SiDoLa-NS can also perform multi-scale data integration. This is useful for tissue such as the brain, where it is essential to understand the spatial extents of regions to find clusters, lamina, and striations of cells, while detecting nuclei (Fig. 1C). Here, we explore SiDoLa-NS on three publicly available datasets from mouse and porcine neuronal tissues, including the brain, spinal cord, and sciatic nerve across various imaging modalities (Fig. 1D). SiDoLa-NS is designed for a wide range of platforms, from classical IHC and IF tissue sections, to the full range of SO technologies. These can be explored using the web application https://sidolans01.mgifive.org/.
SiDoLa-NS is trained on biophysical simulations of nuclei in brain slices to segment a histological section of mouse brain
We began with multi-scale segmentation of a Visium HD mouse brain coronal slice. We generated a fully synthetic image set (with 4353 pairs and 1.96 x 106 objects) to mimic DAPI stained nuclei in neurons, oligodendrocytes, and other cells (Fig. 2A 1). SiDoLa-NS-Micro-CNS, an instance segmentation CNN, was trained to find nuclei using this dataset (Methods). We tested SiDoLa-NS-Micro-CNS on a publicly available hematoxylin and eosin (H&E) coronal cross section of mouse brain from 10X Genomics’ Visium HD platform. Varied morphology was observed in a single channel of the H&E staining (Fig. 2A 2). SiDoLa-NS-Micro-CNS successfully segments nuclei at various DAPI intensities (Fig. 2A 3). At the “meso-scale”, we observed two distinct cellular populations: high density ‘striated’ regions and diffuse ‘laminar’ regions (Fig. 2B 1). Striated regions included granular cell zones like Dentate Gyrus and Hippocampal CA regions (Fig. 2B 2, 3) in the Visium HD brain slice73. Model performance was measured with the synthetic validation dataset. SiDoLa-NS-Micro-CNS achieved a mean Average Precision for Intersection over Union (IoU) greater than 0.5 (mAP50) of 0.715 (Fig. S1).
A (1) At the micro-scale, SiDoLa-NS trained exclusively on simulated data featuring multiple cell types. (2) Micrographs of real brain images were used to run SiDoLa-NS-Micro-CNS at this scale and (3) identified nuclei (light and dark blue bounding boxes) from background. B At the meso-scale, simulated images (1) featuring bands of higher density cells. (2) Micrographs from brain slice, notice density patterns. (3) SiDoLa-NS classes overlaid, which differentiated high density striated regions (cyan) and adjacent cells (blue). C (1) At the macro-scale, SiDoLa-NS was trained on simulated images featuring full or hemi coronal sections of brain tissue. (2) SiDoLa-NS-Macro-mCB segmented real images of an H&E coronal brain slice, (3) highlighting biologically defined regions. D Comparison of Top-Down and Bottom-Up approaches side-by-side. The left-brain image and lower UMAP are SiDoLa-NS top-down, while the right-brain diagram and upper UMAP are bottom-up. The UMAP was constructed from transcriptomic data, and its colors map to Seurat clusters in the upper UMAP, and to SiDoLa-NS labeled brain regions in the lower.
To test generalizability, we employed SiDoLa-NS-Micro-CNS on publicly available DAPI-stained images of lung adenocarcinoma (10X Genomics Visium HD). Without parameter adjustment, we observed excellent segmentation and detection of nuclei, even in high-density regions of this tissue (Fig. S2). For the tissues examined here, existing nuclei-segmentation methods are likely sufficient. We present SiDoLa-NS primarily to highlight its versatility, as it can be adapted from single-nucleus tasks to whole-brain simulations.
SiDoLa-NS is trained on biophysical simulations of brain slices and used to map brain regions in a histological section of mouse brain
We sought to assign regional identities to the detected nuclei. To accomplish this, we simulated fully synthetic coronal slices of mouse brain (single hemispheres and full slices) with named regions based on the Allen Brain Atlas reference (Fig. 2C 1). CNNs were trained on 144,438 training images and 1092 validation images. SiDoLa-NS delineated distinct brain areas (Fig. 2C 2, 3). The primary CNN (SiDoLa-NS-Macro-mCB) achieved an excellent precision-recall curve (F1 = 0.95 at a confidence of 0.381) and the confusion matrix on the validation set indicated negligible misclassification. We measured a top mAP50 of 0.977 on the validation set (Fig. S3).
Another feature of the SiDoLa-NS suite is robust performance with noisy or blurry samples. We tested SiDoLa-NS-Macro-mCB on images where noise was introduced in a calibrated fashion (“Methods”). SiDoLa-NS maintained mAP50 above 0.783 up to 70% noise mix (Fig. S4A, B). Next, we simulated images with complete (even) and regional (distorted) focusing errors. SiDoLa-NS-Macro-mCB had mAP50 above 0.803 and 0.722 across even and distorted images, respectively (aperture f-number was brought as low as 0.0013). We tested robustness against staining modality, inferring on unseen Nissl-stained brain samples with labels provided by the Allen Brain Atlas. SiDoLa-NS was able to accurately delineate each brain region (Fig. S4C-E). We next asked how SiDoLa-NS labels compared with outside expert annotations of mouse brain regions74 and found high correspondence across Cortex, Thalamus, Striatum, and Hippocampus (Fig. S5).
To assess the complementary roles of top-down and bottom-up approaches to cellular labeling, we compared SiDoLa-NS to Visium HD’s Seurat pipeline for clustering and spatial overlay (Fig. 2D, “Methods”). In practice the two methods work in concert: top-down excelled at finding boundaries and full regional coverage, while bottom-up distinguished unique cell types within regions. For example, cells that SiDoLa-NS assigned to the Striatum-Like Amygdala, Lateral Hypothalamus, and Thalamus were difficult to differentiate by molecular cluster. Using both approaches together can help disambiguate bottom-up numbered clusters and map them to top-down named atlas regions (Fig. S4F).
We also compared our image-based regional definitions against expression-based regional definitions. Since “bottom-up” methodologies perform clustering without assigning biological labels, overlap with molecularly defined regions was the basis for our comparing regions. The comparisons (Fig. S6) are against MILWRM (Multiplex Image Labeling with Regional Morphology), SpaGCN (Spatially variable genes by Graph-CNN), BANKSY (Building Aggregates with a Neighborhood Kernel and Spatial Yardstick) and GraphST26,28,75. MILWRM performed the best; BANKSY, SpaGCN, and SiDoLa-NS were close behind (accuracy 69.4%, 64.4%, 63.9%, and 60.1%) and GraphST results contrasted the most with patterns defined by MILWRM (accuracy 51.7%). We noticed that molecular-defined labels incorrectly included hindbrain and olfactory bulb in a mid-coronal slice, suggesting expression-based methods lack anatomical resolution and are more prone to region misidentification.
SiDoLa-NS for spinal cord nuclei detection and atlas mapping
SiDoLa-NS was evaluated on publicly available immunofluorescence (IF) sections of mouse spinal cord stained with DAPI (for DNA in nuclei), NeuroTrace (a fluorescent Nissl which stains neurons), and ChAT antibody (marks cholinergic neurons)70. At the micro-scale, SiDoLa-NS-Micro-CNS segmented the nuclei despite this IF section being lower resolution than the brain (Fig. 3A 1, 2, 3).
A (1) At the micro-scale, SiDoLa-NS trains on simulated data. (2) Micrographs of tiles from cross sectioned spinal cords from dataset70. (3) SiDoLa-NS-Micro-CNS predictions (blue squares) of nuclei in the spinal cord. B (1) At the macro-scale, SiDoLa-NS trains on simulated images of spinal cord sections. (2) Micrographs of mouse spinal cord cross sections. (3) SiDoLa-NS-Macro-mSC prediction on different spinal cord regions (different colors). C Cellular and regional results on the whole spinal cord section. Marker size is governed by the measured cell size, and coloration is by regions, some of which are labeled. D Cellular representation of the same spinal cord section from (C) but colored to define 3 main regions within the spinal cord (blue Dorsal horn, yellow Ventral horn, and green is White matter). Below a bar chart depicts the nuclei count per region. E, F The cellular spinal cord representation, where cells are colored to depict. E Nissl/NeuroTrace+ neurons in orange, and F ChAT+ cells in red, with the bar charts measuring the % of cells positive in each region.
We then simulated whole and hemi spinal cord histological slices and varied parameters including noise, warp, and tissue degradation (Fig. 3B 1). With this simulated image set (12,609 training, 30 validation), we trained SiDoLa-NS-Macro-mSC. Predictions on biological IF whole-tissue sections (Fig. 3B 2) highlighted various spinal lamina and prominent regions in the white and grey matter (Fig. 3C 3). SiDoLa-NS-Macro-mSC achieved a top precision of 0.96 at a confidence of 0.92 on the validation set (Fig. S8). We next examined its robustness to variations in focal distance and tissue rotation; the CNN maintained an mAP50 above 0.70 within a range of -200 µm to +1000 µm from the standard distance in training images (Fig. S8A). Similarly, we found SiDoLa-NS-Macro-mSC had mAP50 above 0.60 up to 40 degrees of rotation along the y-axis (tilting one edge of the image towards the camera).
We evaluated SiDoLa-NS-Macro-mSC on an unseen Nissl-stained image to examine the model’s abilities under varying staining conditions. SiDoLa-NS-Macro-mSC achieved a top precision of 0.96 at a confidence of 0.929 (Fig. S8B, C). The mAP50 was class-variable: larger, well-bounded regions like dorsolateral fasciculus achieved a precision of 0.995. Thinner layers, like Lamina 1 and 2, had lower precisions (0.145 and 0.622, respectively).
To validate regional assignments on the IF mouse spinal cord sections, we compared the staining intensities in the predicted atlas regions. SiDoLa-NS multiscale results provide quantitative measurements for predicted objects at the micro-, meso-, and macro-scales, including area, channel intensity, and texture (measured as standard deviation of pixel intensities). We combined the SiDoLa-NS-Macro-mSC predicted regions to three generic groupings: white matter, dorsal, or ventral gray matter (Fig. 3D). As expected, regions labeled white matter were located on the outer perimeter of the spinal cord, enclosing dorsal and ventral regions (Figs. 3E, S8D). We next quantified the variable feature distributions across these groupings. After applying an intensity threshold, we examined ChAT-positive cell distribution (Fig. 3F). As expected, the ventral horn was 21% ChAT+ compared to <1% in the dorsal horn76.
Axonal and nuclear segmentation within sciatic nerve fascicles
Peripheral nerves comprise multiple cell types, including multiple Schwann cell subtypes, many fibroblast populations, and numerous types of vasculature, all of which play roles in nerve health and disease77. We evaluated SiDoLa-NS’s ability to detect axons, nuclei, and meso-scale features in the nerve. In both H&E and DAPI staining, there is a distinction between myelinated axons (larger, lower intensity) and Schwann cell nuclei (smaller and brighter)78. We sought to detect axons and Schwann cell nuclei as well as the fascicles in which they are bundled within a cross section of a human-sized sciatic nerve. Thus, we created simulations (7198 training and 612 validation images, 2.78 x 105 objects in the training set) at the cellular (micro-) scale with simple geometry (Fig. 4A 1). We trained SiDoLa-NS-Micro-PNS to segment myelinated axons and nuclei as unique classes. The model achieved perfect precision at a confidence of 0.894, and an F1 score of 0.9 at a confidence of 0.4 on simulated validation images (Fig. S9). We next used a publicly available cross section of a porcine sciatic nerve (Fig. 4A 2), from which these instances were segmented effectively (Fig. 4A 3).
A Micrographs of a simulated cross section (1) of sciatic nerve with high intensity nuclei and dim axons. (2) Example tiles of a cross section of a porcine sciatic nerve and (3) SiDoLa-NS-Micro-PNS prediction on axons (dark blue) and nuclei (light blue) on the same sections. B Micrographs of simulated images (1) of textured bundles (fascicles) and perineurium using varying background contrast. (2) Micrographs of a cross section of a porcine sciatic nerve. (3) SiDoLa-NS-Meso-pSN prediction of fascicles on porcine sciatic nerve section (blue circles). C Ranked histogram showing distribution of myelinated-axon (blue line) and nuclei (red line) candidates’ relative pixel areas within all regions of the porcine sciatic nerve section shown (n ~ 10,000 per class). Inset is a small region of the nerve showing non-fascicle-contained cells (gray), axons (blue), and nuclei (red). D Micrograph showing overview of entire sciatic nerve, with SiDoLa-NS-Meso-pSN fascicle detection below. The dashed box indicates the area shown in higher magnification on the right. Circular markers are axons or nuclei, and their color indicates different fascicle instances. E Histograms depicting the cell contents within fascicles. The blue histogram represents axons, and the red histogram represents nuclei. Fascicles with >= 10 are shown except for one outlier with 740 axons not shown.
Many neurodegenerative diseases are largely selective for a single class of neurons. For example, sensory neurons in chemotherapy-induced neuropathy CIPN79,80 or motor neurons in ALS25, and fascicles enriched in axons from these neurons are differentially impacted81. Thus, we created meso-scale simulations for images of fascicles as ellipsoid objects with varying size, degree of overlap, and fascicle/background intensity contrast (Fig. 4B 1) with 6,588 training and 304 validation images (89,074 training objects). SiDoLa-NS-Meso-pSN, an automated fascicle segmentation CNN was trained on this dataset. On simulated validation images, this model’s F1 reached 1.0 for confidence values between 0.1 and 0.9 (Figs. S10, 4B 2,3).
Next, we applied the model on the porcine sciatic nerve section at micro- and meso-scales, where nuclei, axons, and fascicles are all one dataset. We confirmed the expected variation in the size (area) of Schwann cells and myelinated axons (Fig. 4C). SiDoLa-NS-Meso-pSN detected discrete fascicles throughout the tissue slice (Fig. 4D). We extracted the variation in fascicle content for the number of myelinated axons or nuclei revealing that porcine sciatic nerve fascicles had approximately 100 axons and 70 nuclei, consistent with manual annotation of the biological images (Fig. 4E). Thus, SiDoLa-NS-Meso-pSN enabled basic analyses of content variation between and within fascicles solely from a histological (H&E) stained nerve section.
Integrating gene expression and anatomical context via macro-scale region assignment
Thus far, gene expression patterns have been used only to validate SiDoLa-NS’s predictions at the macro- (reference) scales. The top-down analysis after SiDoLa-NS inference allows for downstream investigations via the predicted macro regions. These can be used, for example, to define marker genes for distinct tissue regions. It can also be used to better compare regional expression changes between case and control sections.
In the mouse brain, 10X Visium HD data were linked to SiDoLa-NS-Micro-CNS predicted cells. Then RNA expression patterns were analyzed across the SiDoLa-NS-Macro-mCB predicted regions. Region-specific genes (the ratio of average transcript count in regionx to that in all other regions) were visualized (Fig. 5A). Some expression profiles were similar between related regions, like Ltih3, Calb2, and Agt in the hypothalamus and hypothalamic-adjacent regions. The four top-ranked (highest-ratio) genes are known regional markers, including Tshz2 in the retrosplenial area82 and Neurod6 in the hippocampus83,84 and less commonly used marker genes like Prkcd in the ventral group of the dorsal thalamus. An extended hierarchically clustered heatmap shows additional differential gene expression (Fig. S11); some showed regional localization in the coronal mouse brain (Fig. 5B). We visualized other genes with high regional specificity, including Tac1, Penk, Mobp, Acta2, and Prox1 (Fig. S12). These genes exhibit sharply localized expression, possibly indicating functional specification.
A Heatmap showing the expression of the ‘top ratio’ genes for 21 regions from the mouse cortical Visium HD brain slice (green is high average counts, red is low). B Diagram of individual cells segmented with SiDoLa-NS-Micro-CNS, colored by total counts per cell for the specified gene (gray = 0, black = mean, red = max). C DAPI-stained transverse lumbar spinal cord section. There is some tissue delamination around the edges and a clockwise rotation of the slice (Scale bar 400 microns). D Cellular and regional results on the transverse section in C. Markers correspond to cells, marker size is governed by the measured cell size, and coloration is by regions. Low confidence (< 0.3) results discarded. E Heatmap showing the expression of the ‘top ratio’ genes for 14 regions from the mouse spinal cord slice (green is high average counts, red is low). Some regions were grouped for clarity: Central Region contains Lateral Cervical Nucleus and Central Canal; Dorsal Medial Region contains Dorsal corticospinal tract, Dorsal Nucleus, and Dorsolateral fasciculus; Dorsal Lateral Region contains Rubrospinal tract and Lateral spinal nucleus; Lamina subgroups were merged, including Lamina 10 with the Lamina 7 group. Highlighted genes appear again in subpanel F. F Some selected top hits from Kruskal-Wallis analysis were used to color the slice. Colors represent which of the 6 selected genes each cell expressed in the highest proportion. Note that regional distinctions are solely based on RNA expression patterns, not image features. Spp1/Meis2 marked cells co-express these two genes.
In a different 10X Xenium-processed mouse spinal cord lumbar section (Fig. 5C), a similar analysis was performed with SiDoLa-NS-Macro-mSC. The transcript panel, designed for brain-region specificity, targeted 244 genes and provided limited resolution in spinal cord. Some spinal subregions were merged to aid in visualization (Fig. 5D). A heatmap was generated following the procedure in Fig. 5A (Fig. 5E): ChAT, attributed to initiation of motor output, was most highly expressed in Laminae 7-1085,86. Spag16, which builds cilia to support CSF flow, co-localized with the central canal87. Acta2 was associated with vasculature throughout the CNS88. After revealing these gene patterns throughout the spinal cord, we performed Kruskal-Wallis analysis to sort genes by their regional selectivity. Organizing these transcriptomic “low-level” data into functionally significant “high-level” categories allowed us to generate an informative visualization of the slice’s gene expression from a bottom-up source (Fig. 5F). Nts89, Spp176,90, and Spag1687 localize as expected, whereas others like Gfap and Meis2 showed unexpected yet regionally enriched expression patterns.
Regional and cellular morphology using top-down analysis
We used a publicly available Xenium dataset including coronal slices from two transgenic CRND8 (8/18 months) mice (an Alzheimer’s disease model) and two WT (8/13 months) (Fig. 6A). We ran SiDoLa-NS-Macro-mCB on the sections to extract morphological data on single cells and atlas-defined brain regions (a one-step process) and detected differences between genotypes and age groups91,92 (Fig. 6B). We first asked which regions had changes in the number of cells across age and genotype, and found the basolateral amygdala, the dentate gyrus granule cell layer, and the lateral amygdalar nucleus (Fig. 6C upper, F-stat 20.1, 61.4, 0.63) showed increasing cell number with age in WT whereas the same regions had cell counts with the opposite trend in AD mutant mice. We noted lower region-calling confidence in the older Alzheimer’s model brain, particularly in somatosensory isocortex, auditory areas, and pallidum (Fig. 6C lower, F-stat 6.6, 47.3, 11.85), perhaps reflecting changes in these areas associated with neurologic decline such as thinning of gray matter and the cerebral cortex93 in older CRND8 mice.
A DAPI-stained mouse coronal hemi-sections from two wildtype (WT) mice at 8 and 13 months (left side), and transgenic CRND8 Alzheimer’s (TG) mice (right side). B Diagram representing the same tissue sections but with markers representing single cells, and the coloration showing atlas regions, as defined by SiDoLa-NS-Macro-mCB (coloration matches Fig. 2D). C Bar charts measuring the number of cells per region (top), and the macro confidence within the region (bottom) in the four samples, WT 8 and 13 months, and TG 8 and 18 months. The bars are grouped by region, with Basolateral amygdala, dentate gyrus granule cell layer, and lateral amygdalar nucleus on the top, and Somatosensory Isocortex, auditory areas of the Isocortex, and pallidum below. D Histograms showing the distributions of NeuroTrace (Nissl) texture (measured as standard deviation intensity) per cell, then separated by Spinal Cord regions. Horizontal lines show the mean for that region. Regions with low cell numbers were excluded. E Micrograph of Spinal cord stained with NeuroTrace, where insets show a region of lateral funiculus compared with Lamina 8, highlighting their different NeuroTrace textures.
Next, we investigated if cell morphology could be used to distinguish the atlas regions within the mouse spinal cord section. The SiDoLa-NS inference automatically measures per-cell features for each available fluorescent channel. Indeed, we found that ‘NT cell texture’ (NeuroTrace [Nissl] intensity standard deviation amongst pixels per-cell) was a strong segregator of spinal cord region (Fig. 5D, KW p≈0, H-Stat = 6139). Lamina 8 and the lateral funiculus have distinctly different NT textures (Fig. 5E). Therefore, with only image-based information at two different scales, SiDoLa-NS-Micro-CNS and SiDoLa-NS-Macro-mSC enabled spinal cord region distinction utilizing subtle morphological features.
Discussion
Since Ibn al-Haytham’s early work, optics has fascinated scientists with the power to reveal the invisible. Long before omics, histological images provided rich insights into biological architecture. Pathologists like Virchow, Malpighi, and Jacobi used microscopy to uncover unique tissue structures and pathological changes. Now with spatial omics (SO) technologies, we can link these molecular analyses with optical features of the tissues and cells and unveil new insights into the molecular underpinnings of organismal biology94,95. In the nervous system, function can be gleaned from subtle distinctions in disparate cell types and organized, intricate tissue structures characteristic of these complex tissues96,97,98,99,100. Understanding the nervous system in health and disease requires an exploration of cellular and molecular patterns within their spatial context101,102,103. By sharing a top-down approach that can be used in conjunction with SO techniques, we hope to usher in broadly applicable models capable of identifying functional regions.
Here, top-down means overlaying information evident from the image, as opposed to building up that knowledge from the molecular level (bottom-up). SiDoLa-NS is built upon the premise that training a Neural Network (NN) using only real images is insufficient. The variation in hand-labeled images provides an inadequate training set for fully generalized machine learning tools. Therefore, SiDoLa-NS adopts the novel strategy “Simulate, Don’t Label” to generate large, unbiased, and error-free datasets. The vanguard SiDoLa-NS models are used in three distinct neurological cases. First, we infer cells, lamina, and atlas regions in a coronal slice of adult mouse brain. Then, we took a mouse spinal cord and segmented it into individual cells and overlaid its reference atlas, identifying over 20 unique and identifiable regions accurately. Finally, SiDoLa-NS evaluated a cross section of porcine sciatic nerve containing distinctive bundles of axons and Schwann cell nuclei organized into fascicles, like is seen in human nerves. We demonstrate these datasets coupled with spatial transcriptomics can be used to identify marker genes of specific cell types. Additionally, the multi-scale data enables new discoveries: for instance, how are neuronal cells organized at the micro-scale to give rise to higher-scale organization in tissue? We lay out the simulation process so that others can take advantage of the pipeline for their own applications.
SiDoLa-NS and other similar approaches allow classifiers to be more general. Using NNs outside of the context in which they were trained is fraught with hallucinations and overfitting. NNs can ‘memorize’ their training data and may perform poorly on real samples because of batch effects introduced in the training set. By simulating variation beyond that seen in existing samples, the NN becomes robust to unseen variations. This subtle fact is critical, given the NNs in this manuscript are used exclusively out-of-context since the training and validation data are simulated. Another important note is that the simulated training data is generated using an engine independent from training and validation. It does not use a GAN (generative adversarial network) or diffusion approach to create the training set, avoiding recursive-generated collapse104. It is important to have a large diverse dataset to avoid producing a ‘brittle’ model. For breakthroughs like LLMs105, Alpha Fold 2106, and EVO2107, the sample size is staggeringly large. The converse, excellent quality but small datasets tend to fail out-of-context. Examples include detecting pneumonia in chest x-rays, sepsis, and other clinically-meaningful failures108,109,110. Often these smaller-trained models learn shortcuts111 or were underspecified112. Small language models may be good for niche, specific tasks, although even a paper arguing for quality over quantity only adjusted quality by increasing the duplicate data in the sample set, which is effectively making it a smaller sample size113.
This approach permits the generation of massive and high-quality training data and the representation of difficult and rare cases. It is important to note that existing tools such as CellProfiler perform comparable cellular annotations to SiDoLa-NS Micro. However, SiDoLa-NS can perform at cellular (ex: individual nuclei), meso (ex: local tissue structures like bundles of axons), and atlas (ex: entire brain/spinal cord) scales, not just identifying different cluster or patterns, but specifically identifying named brain or spinal regions. SiDoLa-NS models are also robust to high cell density when image parameters vary over a wide range. To overcome the inherent variability of manual annotation, this approach eliminates annotator bias by using simulation-derived ground truth, yielding precise and consistent object boundaries free from optical and sample artifacts. Grounded in biophysical principles, the simulation produces results that are both accurate and highly interpretable.
Although this process allows data-driven discoveries relating to well-understood functional categories, it is still essential to parse out meaningful findings from noise. Some noise comes from sample preparations for spatial analysis that may disfigure the tissue or destroy transcripts before they can be sequenced. Interpreting expression findings can also be challenging. For instance, the glial marker, Gfap is expressed throughout the spinal cord, and Meis2 appears to be linked to dorsal root ganglia114, which innervate intermediate laminae115. Additionally, changes in the pallidum’s shape often occur over longer time periods116,117, though the reduced confidence we noted in Fig. 5 may signify other well-documented changes in or around the tissue118.
With spatial-omics datasets, there is the possibility of double-dipping, which refers to conflating the significance of region-defining molecular markers with results. Although it is useful to enhance top-down image analysis with bottom-up clusters, the statistical significance of downstream analyses may be falsely inflated if compounded with assumptions made in earlier steps. Using top-down regional definitions avoids these problems119,120,121.
This work does not degrade the significance of bottom-up approaches. Rather than pitting the two approaches against one-another, we argue the relevance of both approaches for pathology and function. Comparisons oversimplify the relationship between the top-down and bottom-up approaches. Molecular definitions inherently have no “label” beyond significant genes, leading to confusion. As precision medicine defines the treatment and outcomes for more patients, the need for these top-down approaches may be limited to smaller domains. Until then, utilizing the rich features laden in microscopy will continue to be a valuable source for clinicians and researchers to draw from.
Segmenting complex but specific shapes can benefit from state-of-the-art SAM or SAM2 models122. These approaches have shown strong results in polyp, cardiac, and skin lesion settings123. However, they struggle when boundaries are ambiguous and are not inherently designed to handle multiple classes. For example, identifying the boundaries of constellations in a night sky image is difficult because no visible pixel edge marks where one ends and another begins—the boundaries exist only by reference to external definitions. The same challenge applies to segmenting nuclei in the brain or spinal cord, where the “boundary” reflects a change in pattern rather than a distinct pixel transition.
Additional exploration in NN architecture would lead to even better inference. The NNs in the manuscript are instance segmentation and/or instance classification models built on the YOLO124 Ultralytics framework. Instance segmentation is powerful since it identifies unique regions and separates individual objects, such as cells, so we can measure the features that distinguish them. We also tested MaskR-CNN125 and Feature Pyramid networks126 but found YOLO to have the most active development and efficiency. Still, YOLO and instance segmentation may not be the right approach for all applications, particularly when region boundaries are diffuse or when per-pixel accuracy is critical. Convolutional UNets typically perform anatomical region definitions tasks better, since each pixel’s location and identity are consistent. In fact, we found that for some applications, a two-stage approach can enhance performance. First, UNets are trained to ‘recode’ the raw image with pixel-based classifications. Then, an instance segmentation model is trained from those, leading to better performance (data not shown).
In summary, SiDoLa-NS enables robust, generalizable spatial analysis through simulation-trained segmentation at multiple spatial scales. By defining regions based on images of tissue structure, it allows multimodal distributions of cells and transcripts to be mapped and analyzed in context—crucial for resolving complex tissues. This top-down approach, when integrated with SO, opens the door to scalable, interpretable models that bridge structure and molecular function. Integrating this SiDoLa-NS with empirical pipelines and adaptive architectures that combine real and simulated data will produce further advances in our ability to analyze SO datasets.
Methods
Atlas mapping
Resources from the Allen Software Development Kit (SDK) were utilized for the atlas mapping pipeline of the mouse brain and spinal cord. Each atlas image had a respective structure image with color-coded regions. The structure image for each atlas image was retrieved and a dictionary for regions and their respective color code was constructed. Next, binary masks in each color code were constructed over all structure maps to visualize the unique regions. Polygon coordinates defining the contours of each region were produced and assembled into a JSON file, organized by slice number and region name. These region outlines were used to generate macro-level brain and spinal cord simulations.
Simulations overview
Simulations have the advantage of providing truly massive datasets, where we can easily generate hundreds of thousands of images in a few hours. There is no annotator bias, perfectly resolved ground truth segmentation masks and labels, and no memorization issues since the classifier never sees real images until inference. With the simulated images, we represent examples of various tissue and optical aberrations that may be present in a physical tissue section:
-
1.
Out-of-field sections are characterized by missing regions which fall out of the field of the image
-
2.
Broken tissue is a common occurrence during the tissue handling process, resulting in “cracks” in the tissue
-
3.
Poor focus/contrast/resolution is a result of the automated imaging failing to capture photos in the highest fidelity, and
-
4.
Distorted or warped tissue sections are also likely a result of the tissue handling process.
These aberrations are not easily reversible, and so analysis must be resilient to variations and combinations of these. We also over-represent examples that are difficult for existing segmentation algorithms, such as extremely high-cell density. Finally, ground truth labels and segmentation masks for the training images can be dynamically updated during simulation generation, enabling the entire schema to evolve—an impractical task if manually re-labeling 100K images. At no point during training are real biological samples used; both training and optimization utilize simulations, reserving the biological samples solely for downstream testing and analysis.
All simulations were done in the CAD software Blender 4.0+ using a feature called ‘Geometry Nodes’ which allowed for a visual workflow that encoded the geometry representing different scales of biology. Individual ‘frames’ were used to control variations and produce the different training and test image datasets needed for teaching the NNs. While each tissue type and scale used distinct geometries, a few general rules were common: The basic concept was to build a plane or volume to fill with cells, then to ‘instance’ cell objects and randomly fill them into the region. Both the regions, the fill density/pattern, and the cells were all created dynamically, and random variations were added for each frame. The more variation that was shown to the NN, the better it was able to generalize on real tissue.
Simulation engine details
The exclusive use of simulated microscopy images allows for generalizable identification of the nuclei or cell bodies, organized cellular clusters, and reference regions from anatomical atlases. However, these systems are only as capable as the simulated images from which they are trained. We asked whether we could develop a framework to produce biophysically informed simulations that would be compatible with the SO platforms discussed in Fig. 1. The pipeline below was used to create the simulations used in this manuscript but could also serve as a baseline from which to build other synthetic datasets for anatomical regions of interest.
Continuing the multiscale theme, this pipeline operates at micro-, meso-, and macro-scales (Fig. S13A-C). Each biological component is built as a 3D structure with a variety of properties defined and bounded by input parameters58. These components can be thought of as a function, and we represent them as a node with input parameters and outputs that return the geometry of that component (Fig. S13D). So, we first define the features of single cells (Cell Node), such as size, shape, and intensity, all of which can vary based on cell type and scan parameters and can be controlled through input parameters. The geometry for these (unique) individual cells is then used as an input to the next block (Region Node), which defines the makeup of a region or cluster (the details for the processes in the pipeline are defined in more detail below in Fig. S13E and are indicated by a banner along the top). Cells are scattered throughout this region, with parameters for density and regional variation. The shape of these regions is either simulated generically, for example as laminar stripes, clusters, rings or is generated from a set of polygons which define a region from an anatomical atlas. Specifically, for sciatic nerve fascicles, we used rings of cells around a central ellipse of cells. For mesoscale inside the brain, we used higher density lamina that traverse across an image region as a stripe. Next, the geometry from each region is output and used to assemble the whole section (Section Node) that defines what will be a whole output image. Here, the parameters are randomized over a uniform random distribution to change the atlas morphology, so that it includes normal physiological variation but also goes beyond to encompass a variety of pathological phenotypes.
We utilized the open-source CAD software Blender to produce simulations, since it can be controlled through python scripting and with a visual generative workflow, called Blender Geometry Nodes (BGN), and has extensive optics and physics capabilities. The BGN pipeline for a micro-scale simulation focuses on cellular details (Fig. S14). A simplified version of the BGN pipeline that was used to create the macro mouse coronal brain dataset features transforms to distort the regions and overall section (Fig. S13E). The banner displayed over the pipeline defines the primary functions of each part of the pipeline: Regions, Cells within Region, Edge, Tears, Skew, Deform, and Empty. For the macro-scale reference regions, we defined their boundaries as exact copies of the reference polygons for one hemisection and then had a switch to mirror those regions into a full set of hemispheres or just to use a single hemisection. The density of cells within the region was defined from a table (which we could manually set) or varied randomly between regions. Varying cell density randomly has the benefit of reducing our models’ dependence on density in learning region labels. Each region was produced iteratively, indicated under ‘Cells within Region’. The geometry from the regions and cells contained within was then joined together to define the whole section.
From here, the geometry of the simulated tissue section was manipulated. Specifically, we include within the BGN 1) Edge, which are extra cells and debris that are beyond the defined regions of the atlas (as there is often extra chunks of tissue that show up in a tissue section from errors in histological slicing), 2) Tears, which are meant to simulate rips or tears in the tissue. These two features are disabled when generating ground truth, as we want to reconstruct the tissue as if these imperfections didn’t exist. Additionally, we include two distortions: 3) Skew, which simulates different cut angles in sections and 4) Deform, which changes the relative size of the regions as would occur in samples from different individuals (the variation included purposely goes beyond common variation). Finally, there is a block that regulates 5) Empty, or background simulations, to reduce the influence of background variation and false positives in object recognition. More examples of these manipulations are shown in Figs. S15, S16 and a practical guideline is included in https://sidolans01.mgifive.org/. Although not shown in Fig. S13, shading information is included to control how light is emitted from these objects and how it can interact with each object is also controlled using Blender Shader Nodes to enable variation and patterning.
At this point, the complex 3D geometry needs to be transformed into a 2D image, and the steps for that process are displayed below (Fig. S13F). To do this, we use optics and the rendering engine within Blender. Specifically, we employ a virtual camera with a 15.5 mm focal length and a low f-stop (0.01) to focus on the tissue section from about 1.4 mm away. Then the rendering engine ‘Cycles’ calculates the light being emitted by or bouncing off the samples and uses it to produce an image. After the image layer is produced, we use a compositor layer to add varying amounts of white noise (0–95%), making the image slightly harder to interpret. This process is repeated (per ‘frame’) for the total number of images and produces the simulated ‘microscope’ images for our dataset. At the same time, the software also calculates the boundary coordinates of the various objects within the scene from the camera’s point of view, and projects them onto a 2D plane so they are scaled to the rendered image. These coordinates are used to export the ground truth data, a set of polygon coordinates and class labels. For the ground truth construction, a switch is flipped that turns off some of the variation, such as the rips and the noise, so a clean reconstruction can be learned from the noisy microscope image. Also, during this process, all the parameters utilized are saved as metadata and are exported alongside the image and label files, so all the details can be tracked for different rendering runs and versions. In the macro mouse brain set, we rendered 4,065 images/hour on multiple desktops, with ~700 image sets an hour on a computer with only an 8th generation Intel CPU or NVIDIA 1000 series graphics card (i.e. standard, not specialty hardware).
The parameter input settings are constant for a single ‘frame’ of the BGN pipeline but are varied for different frames. Therefore, when producing the dataset, the software runs through one to hundreds of thousands of frames, creating the combinatorial variability of the training set. At the same time, we also produced a small, simulated validation dataset (hundred to thousand images). These images can then be fed into any number of image-based NN, such as a convolutional UNet architecture, instance segmentation architectures such as Mask-RCNN or YOLO124, or used with Vision Transformers (ViT). In this manuscript, we presented data from YOLO instance segmentation, and neural architecture search (NAS) was used to find the optimal settings. The metadata exported during rendering is loaded during training so that it can be later utilized during inference.
SiDoLa-NS-macro-CNS mouse coronal brain and spinal cord
These sections were both produced with the same pipeline explained in Fig. 6. After atlas mapping, the regions were imported into Blender as curve objects using a custom script. These were fed into the Geometry Nodes pipeline described extensively in Fig. 6E. Additional details and images are found in Figs. S12, S13.
SiDoLa-NS-macro-CNS comparison to bottom-up approaches
The comparison of our image-based regional definitions against expression-based regional definitions was done using Kaur et al.’s75 consensus tissue domain detection on a publicly available mouse coronal (standard) Visium dataset. We applied SiDoLa-NS-Macro-mCB to the whole mouse brain slice, leveraging the automatic linking of region labels to cell-level segmentations and transcripts. We compared the atlas predictions (top-down) to the bottom-up directly by overlaying overlayed the predicted region labels on the Seurat-generated UMAPs and compared this to the overlaying the Seurat clusters over the brain image. This yielded four sets of labels to compare with SiDoLa-NS-Macro-mCB atlas labels. All the methods used the expression data and variations of unsupervised clustering to define numbered discontinuous spatial regions. This comparison is biased towards the expression techniques, since we compared all the labels back to the molecular definitions of the regions from the consensus tissue domain manuscript.
SiDoLa-NS-meso-pSN porcine sciatic nerve
Sciatic nerves were digitally generated using Blender with geometry nodes to simulate biophysical properties. The aim was to replicate natural structures within a sciatic nerve, including fascicles and Schwann cells. The Blender simulations were simplified to basic geometric forms, including ovular fascicles, circular Schwann cell nuclei, and circular epineurium. Additionally, varying sizes and counts of fascicles were distributed to ensure a manageable, yet flexible, simulation for training. This effectively prepared the model for real-world data. These simulated components were meticulously rendered into a set of.bmp files through Blender scripting, which produced the dataset.
SiDoLa-NS-micro-CNS mouse coronal brain and spinal cord
The basis of this simulation was to instance ellipsoid nuclei on many points across a plane. The nuclei varied in size, aspect ratio, rotation, and shading (which featured bright heterochromatin spots). We also introduced two other meso-scale features. The first of which was a bright laminar stripe or striation. These were bands of much higher density nuclei compared to the adjacent cells (Fig. S14). The other feature was a region devoid of cells, usually thinner than the high-density lamina, that also crossed the image. Finally, a slight amount of depth-of-field with a low f-Stop was used to simulate slight changes in focus.
SiDoLa-NS-micro-PNS mouse peripheral nervous system
This micro model is the original from which others were derived. It consisted of a plane that was filled with points that were split between two different cell type instances, each of which could be scattered in different ratios. One cell type was Schwann cell nuclei, which are more round, small and bright. The other type was axons, which are slightly larger, often oblong, and dim to near invisible (by DAPI staining). The plane which contained these was circular shaped and had additional geometry outside of it representing the epineurium. The plane was also shifted around the frame of the camera so that occasional different edges of the perineurium were exposed.
Optics, noise, and ground truth polygons
A virtual camera within Blender was used to translate the three-dimensional geometry into an image. This was done with the rendering engine “cycles” or “EEVEE” and involved calculating where the emitted light from the fluorescent objects fell or where light from the background in an IHC setting was blocked by the objects. The camera had depth-of-field and a small aperture to give slight out-of-focus regions. The camera was also tilted and twisted to give different viewing angles and raised or lowered to adjust the overall field-of-view and zoom. White noise was added on top of the final image with the compositor and mixed at different ratios to make the SiDoLa-NS models robust against noise. In addition to the standard rendering, we also extracted the coordinates of the vertices making up the 3D geometry, and projected them onto a plane from the camera’s point of view, converting them into camera x,y coordinates (world to camera). These vertices were then reassembled into polygons directly, or convex hull was applied. In more complex setups (atlas regions), we also down sampled the resolution of the polygons so that the exported segmentation files were more manageable. These files were jointly exported in Yolo.txt or coco.json format automatically as the images were rendered.
Model training
The images created could be used with a variety of instance segmentation or object detection architectures. All the inferences in this document utilized Ultralytics YOLO v8 or v9 pre-trained foundation models in a Python environment with CPU or GPU. For micro models, where the exact outline of the nucleus was not required, we trained ellipsoid nuclei segmentations into bounding boxes with YOLO v9c and v9e. For more complex shapes, like in the macro models, we trained with YOLO v8n-seg and b8l-seg foundation models to extract the full polygon masks from the inferences. The number of images used in training is listed throughout the results. Generally, the training was for 10-200 epochs, the image size was between 448 and 640 pixels in width and height, and the batch size was between 8 and 32 images per batch. All these factors were varied, along with the random seed, for each NAS run. The results of the run were copied into a folder that was named by the resulting mAP75 on the validation dataset, and then evaluated on biological examples as a quick reference.
Transcript processing
The 10X Visium HD Brain, acquired from a public source, was provided at 2, 8, and 16 µm resolution in the binned outputs folder. We used pandas to export the tissue parquet files as CSV files, acquiring the SRT data in micron coordinates. The 10X Xenium Spinal Cord was provided as a CSV file, with individual transcripts identified by micron coordinates. The CSV mapped directly to the DAPI-stain image of the tissue sample. Given the slice’s width and height in microns, SiDoLa-NS automated the scaling required to map the transcripts to the segmented cells.
Data collection and image preprocessing
Mouse coronal brain
The Visium HD Brain H&E image was acquired from 10X Genomics. The microscopic image provided was an H&E-stained section. Utilizing ImageJ, we split these images into its constituent channels and selected the channel that best highlighted nuclear features (red channel) for further processing. For both cases, we adjusted the image brightness and contrast, applying a linear normalization to the whole image. For macro/meso/micro inference, these images were directly given to the program, without down sampling or tiling. For macro, an ensemble of three atlas models were used. The macro class is assigned based on whether the center of the cell is within the boundary of the larger region.
Mouse Alzheimer’s brains
Four brain hemi-sections from a publicly-available dataset, “Xenium In Situ Analysis of Alzheimer's Disease Mouse Model Brain Coronal Sections from One Hemisphere Over a Time Course - 10x Genomics” were downloaded. The DAPI images were rotated to be upright in ImageJ, and then targeted for a macro/meso/micro inference as above. Statistics for effects on these tissues were linear regression against this scale: WT 8 = 0, TG 8 = 0.2, WT 13 = 0, TG 18 = 1 for regional confidence, and this scale WT 8 = 1, TG 8 = 1, WT 13 = 1.5, TG 18 = 0.5 for cell numbers per region.
Mouse spinal cord immunofluorescence
For the spinal cord analysis, we acquired the IF images from a spinal atlas manuscript70. The spinal cord dataset included nine different cross sections of spinal cords, each containing three different channels of ChAT, Nissl, and DAPI stains. We selected Slide1-6_Region0007_Channel555 nm,475 nm,395 nm_Seq0051 and labeled each spinal cord from one to nine starting from the top and moving horizontally. Taking the second spinal cord, we cropped it into a square and split it into three separate channels using ImageJ. The Nissl channel was then used for our SiDoLa-NS models’ predictions.
Mouse spinal cord xenium
Mice were housed and used following the institutional animal study guidelines and protocols approved by the Institutional Animal Care and Use Committee of Washington University in St. Louis. For spinal cord dissection and fixation, a C57Bl6-J female, 109 days old mouse was first deeply anesthetized in a 5% isoflurane induction chamber until complete loss of pedal reflex was confirmed. While under deep anesthesia, the animal was euthanized by cervical dislocation followed by thoracotomy. The right atrium was then incised, and transcardial perfusion was performed through the left ventricle with 10 mL of ice-cold RNase-free phosphate-buffered saline (PBS) to remove blood, followed by 10 mL of freshly prepared 4% formaldehyde (Sigma Aldrich, cat. 100496) for tissue fixation. To preserve RNA integrity, the entire spine, including the spinal cord, surrounding bone, and musculature, was rapidly dissected and incubated in 4% formaldehyde at 4 °C for 48 h. After fixation, the surrounding bone and muscle structures were removed by dissection, and the exposed spinal cord was transferred to 70% ethanol at 4 °C until processing. The fixed spinal cord tissue was dehydrated and embedded in paraffin using a Tissue Processor (Leica TP 1020) through graded ethanol, xylene, and melted paraffin immersions. The spinal cord sample was then embedded in a cold paraffin block, sectioned into 6 µm slices with a microtome, and floated in a 42 °C water bath. Tissue sections were mounted on Xenium slides (10X Genomics, PN-1000460) within the 12 mm x 24 mm imageable area. Slides were dried for 30 min at room temperature, incubated for 3 hours at 42 °C in a dryer oven, and placed in a desiccator to dry overnight at room temperature. Finally, the slides with the lumbar region were hybridized with 10X Mouse Brain Panel probes (which included ChAT) and processed using the Xenium Analyzer following the manufacturer’s instructions.
Porcine sciatic nerve
The porcine sciatic nerve slide was obtained from Saarland University’s histology site https://mikroskopie-uds.de/127. The sciatic nerve cross section had a scan area of 6.2 mm x 4.8 mm and H&E staining. The image was downloaded as a.zif file, which was then loaded into ImageJ and the highest resolution version of the sciatic nerve was exported as a 38,885 x 30,286 pixel 8-bit image. The channels were then split. Both channel 1 and channel 2 highlighted the fascicles within the sciatic nerve. Due to channel 2’s more distinct contrast between the fascicles and the background, channel 2 was used for Fig. 4. Channel 3 highlighted the axons within the fascicles. The pixels were then inverted to create a black background with white fascicles and adjusted to enhance the visibility of the fascicles and axons against the background and to mimic DAPI staining.
Inference
SiDoLa-NS must work on all parts of the image at different scales, and then that information needed to be fused together, along with any other omics data that corresponds with the image dataset. This was accomplished with the macro/micro inference package included in the code repository. The input allowed a single micro-scale model to do the cell segmentation, and then allowed as many meso- or macro-scale models to be included. Some parameters were set up for each model such as the location of the images, and then the software ran all the inferences and linked the data together. For the micro-scale, the multiple channels available from the image were measured across three masks: Box, Poly, and Voronoi (which represent the bounding box, the full polygon outline if available, and the space all around the cell until the next cell’s boundary is reached). Area, Size, Intensity, and Texture (SD) were measured for each mask and reported on a per-cell basis. If there was more than one macro-scale model, then hard and soft-voting ensembles were generated and reported.
Analysis
Model evaluations
SiDoLa-NS models were evaluated during training utilizing the Ultralytics validation tools. For all our models, the validation set used during training consisted of simulated images. For each validation image, many metrics including precision, recall, IoU, and confidence were recorded. Precision and recall were used to generate an F1 score (\(\frac{{precision}}{{recall}}\)) as well as graphed across confidence values. Additionally, a confusion matrix was generated for visualization of classification accuracy, particularly in multi-class tasks like brain and SC atlas mapping.
Robustness analysis
Various robustness analyses were conducted on the macro models for noise, degree of rotation, and focus. We generated a small set of 18 – 25 images with calibrated variations in one of the three ways. For noise, we altered the image resolution by reducing the pixel contrast. For degree of rotation and zoom, the camera was respectively rotated or moved along the z-axis relative to the reference object. We also adjusted the F-number for the camera’s aperture for focus. For each of those images, we additionally generated a ground truth, which was used in evaluation. The mAP50 was primarily utilized in assessing model performance.
Marker gene analysis
Because SiDoLa-NS-Macro-mCB and SiDoLa-NS-Micro-CNS were tested directly on the Visium HD mouse brain slice, we mapped the gene expression matrix at the 16 µm bin level to the data based on micron position. For the Xenium spinal cord slice, SiDoLa-NS-Macro-mSC and SiDoLa-NS-Micro-CNS were both evaluated, and transcripts automatically assigned to cells. This resulted in each detected cell having a brain region assignment and a known gene count. For the list of the ‘top ratio’ genes, we ran a Kruskal Wallis test including all genes against brain regions to determine the highest rank of genes. From there, the ratio of each gene’s count to the total region gene counts allowed us to rank these genes per region, yielding our list of the top 4 genes.
UMAPs and clustering
The UMAPs were viewed using the Seurat embeddings from the Visium HD dataset at the 16um bin level. The UMAP1 and UMAP2 data were plotted and colored by cluster. For the cluster definitions, we utilized the provided graph-based clusters, which we joined to the UMAP with the barcodes. The SiDoLa-NS UMAP in Fig. 2D uses the same underlying projection data but was colored with the SiDoLa-NS-Macro-mCB predictions for the respective barcode region.
Data availability
The data generated and analyzed during this study are included in this article and its Supplemental Data Files. The underlying code is available in Gitlab and can be access with this link https://gitlab.com/buchserlab/sidola-ns. The training/validation datasets for this study is available in HuggingFace and can be accessed via this link https://huggingface.co/collections/FIVE-MGI/sidola-ns-676d91b211b32e84adbf391f. Additionally, users can use the tools on this website https://sidolans01.mgifive.org/. and get more detail on construction the simulations here https://sidolans01.mgifive.org/SimRef/SimRefMain.html.
References
Vickovic, S. et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods 16, 987–990 (2019).
Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).
Lein, E., Borm, L. E. & Linnarsson, S. The promise of spatial transcriptomics for neuroscience in the era of molecular cell typing. Science 358, 64–69 (2017).
Maniatis, S. et al. Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis. Science 364, 89–93 (2019).
Asp, M., Bergenstråhle, J. & Lundeberg, J. Spatially resolved transcriptomes—next generation tools for tissue exploration. BioEssays 42, (2020).
Benjamin, K. et al. Multiscale topology classifies cells in subcellular spatial transcriptomics. Nature https://doi.org/10.1038/s41586-024-07563-1 (2024).
Chen, X. et al. Whole-cortex in situ sequencing reveals input-dependent area identity. Nature https://doi.org/10.1038/s41586-024-07221-6 (2024).
Miyoshi, E. et al. Spatial and single-nucleus transcriptomic analysis of genetic and sporadic forms of Alzheimer’s disease. Nat. Genet. 56, 2704–2717 (2024).
Kanatani, S. et al. Whole-brain spatial transcriptional analysis at cellular resolution. Science 386, 907–915 (1979).
Wu, X. et al. Spatial multi-omics at subcellular resolution via high-throughput in situ pairwise sequencing. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-024-01205-7 (2024).
Kim, Y. et al. Seq-Scope: repurposing Illumina sequencing flow cells for high-resolution spatial transcriptomics. Nat. Protoc. 20, 643–689 (2025).
Zhang, D. et al. Spatial transcriptomics and single-nucleus RNA sequencing reveal a transcriptomic atlas of adult human spinal cord. Elife 12, (2024).
Li, X. et al. Profiling spatiotemporal gene expression of the developing human spinal cord and implications for ependymoma origin. Nat. Neurosci. 26, 891–901 (2023).
Russ, D. E. et al. A harmonized atlas of mouse spinal cord cell types and their spatial organization. Nat. Commun. 12, (2021).
Yadav, A. et al. A cellular taxonomy of the adult human spinal cord. Neuron 111, 328–344.e7 (2023).
Ortiz, C. et al. Molecular atlas of the adult mouse brain. Sci Adv 6, (2020).
Yao, Z. et al. A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain. Nature 624, 317–332 (2023).
Zhang, M. et al. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature 624, 343–354 (2023).
Bhattacherjee, A. et al. Spatial transcriptomics reveals the distinct organization of mouse prefrontal cortex and neuronal subtypes regulating chronic pain. Nat. Neurosci. 26, 1880–1893 (2023).
Salas, S. et al. Optimizing Xenium In Situ Data Utility by Quality Assessment and Best Practice Analysis Workflows. (2023).
Nott, A. & Holtman, I. R. Spatial mapping of Alzheimer’s disease across genetic subtypes. Nat. Genet 56, 2592–2593 (2024).
Tian, L., Chen, F. & Macosko, E. Z. The expanding vistas of spatial transcriptomics. Nat. Biotechnol. 41, 773–782 (2023).
Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods 19, 534–546 (2022).
Wang, P. L. et al. Peripheral nerve resident macrophages share tissue-specific programming and features of activated microglia. Nat. Commun. 11, 2552 (2020).
Yim, A. K. Y. et al. Disentangling glial diversity in peripheral nerves at single-nuclei resolution. Nat. Neurosci. 25, 238–251 (2022).
Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods 18, 1342–1351 (2021).
Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat. Commun. 14, 7739 (2023).
Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 14, 1155 (2023).
Russell, A. J. C. et al. Slide-tags enables single-nucleus barcoding for multimodal spatial genomics. Nature 625, 101–109 (2024).
Marshall, J. L. et al. High-resolution Slide-seqV2 spatial transcriptomics enables discovery of disease-specific cell neighborhoods and pathways. iScience 25, 104097 (2022).
Hu, Y. et al. Unsupervised and supervised discovery of tissue cellular neighborhoods from cell phenotypes. Nat. Methods 21, 267–278 (2024).
Si, Y. et al. FICTURE: scalable segmentation-free analysis of submicron-resolution spatial transcriptomics. Nat. Methods 21, 1843–1854 (2024).
Wu, L., Beechem, J. M. & Danaher, P. FastReseg: using transcript locations to refine image-based cell segmentation results in spatial transcriptomics. Preprint at https://doi.org/10.1101/2024.12.05.627051 (2024).
Chitra, U. et al. Mapping the topography of spatial gene expression with interpretable deep learning. in 368–371 https://doi.org/10.1007/978-1-0716-3989-4_33. (2024).
Singhal, V. et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat. Genet 56, 431–441 (2024).
Agrawal, A. Decoding cell–cell communication using spatial transcriptomics. Nat. Rev. Genet 26, 295–295 (2025).
Petukhov, V. et al. Cell segmentation in imaging-based spatial transcriptomics. Nat. Biotechnol. 40, 345–354 (2022).
Long, Y. et al. Deciphering spatial domains from spatial multi-omics with SpatialGlue. Nat. Methods https://doi.org/10.1038/s41592-024-02316-4 (2024).
Zhang, B., Zhang, S. & Zhang, S. Whole brain alignment of spatial transcriptomics between humans and mice with BrainAlign. Nat. Commun. 15, 6302 (2024).
Bankhead, P. et al. QuPath: Open source software for digital pathology image analysis. Sci. Rep. 7, (2017).
Takko, H. et al. ShapeMetrics: a userfriendly pipeline for 3D cell segmentation and spatial tissue analysis. Dev. Biol. 462, 7–19 (2020).
Klein, S., Staring, M., Murphy, K., Viergever, M. A. & Pluim, J. elastix: a Toolbox for intensity-based medical image registration. IEEE Trans. Med. Imaging 29, 196–205 (2010).
Modat, M. et al. Fast free-form deformation using graphics processing units. Comput. Methods Prog. Biomed. 98, 278–284 (2010).
Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun. 13, 1739 (2022).
Carey, H. et al. DeepSlice: rapid fully automatic registration of mouse brain imaging to a volumetric atlas. Nat. Commun. 14, 5884 (2023).
Pachitariu, M. & Stringer, C. Cellpose 2.0: how to train your own model. Nat. Methods 19, 1634–1641 (2022).
Zhang, D. et al. Inferring super-resolution tissue architecture by integrating spatial transcriptomics with histology. Nat Biotechnol https://doi.org/10.1038/s41587-023-02019-9 (2024)
Wang, Y. et al. FTUNet: A feature-enhanced network for medical image segmentation based on the combination of u-shaped network and vision transformer. Neural Process Lett. 56, (2024).
Pang, M., Roy, T. K., Wu, X. & Tan, K. CelloType: a unified model for segmentation and classification of tissue images. Nat. Methods https://doi.org/10.1038/s41592-024-02513-1 (2024).
Attarpour, A. et al. A deep learning pipeline for three-dimensional brain-wide mapping of local neuronal ensembles in teravoxel light-sheet microscopy. Nat. Methods 22, 600–611 (2025).
Zhong, C., Ang, K. S. & Chen, J. Interpretable spatially aware dimension reduction of spatial transcriptomics with STAMP. Nat. Methods 21, 2072–2083 (2024).
Cisternino, F. et al. Self-supervised learning for characterising histomorphological diversity and spatial RNA expression prediction across 23 human tissue types. Nat. Commun. 15, 5906 (2024).
Wang, X. et al. Prediction of cellular morphology changes under perturbations with a transcriptome-guided diffusion model. Nat. Commun. 16, 8210 (2025).
Lam, V. K. et al. A self-supervised learning approach for high throughput and high content cell segmentation. Commun. Biol. 8, 780 (2025).
Dunn, K. W. et al. DeepSynth: Three-dimensional nuclear segmentation of biological images using neural networks trained with synthetic data. Sci. Rep. 9, 18295 (2019).
Tudosiu, P.-D. et al. Realistic morphology-preserving generative modelling of the brain. Nat. Mach. Intell. 6, 811–819 (2024).
Lang, O. et al. Using generative AI to investigate medical imagery models and datasets. EBioMedicine 102, (2024).
Sekh, A. A. et al. Physics-based machine learning for subcellular segmentation in living cells. Nat. Mach. Intell. 3, 1071–1080 (2021).
Naseer, M., Bhatti, I. T., Hasan, O. & Shafique, M. Considering the Impact of Noise on Machine Learning Accuracy. in Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing 377–394. https://doi.org/10.1007/978-3-031-40677-5_15. (Springer Nature Switzerland, 2024).
Bussi, Y. & Keren, L. Multiplexed image analysis: what have we achieved and where are we headed?. Nat. Methods 21, 2212–2215 (2024).
Ma, C., Tan, W., He, R. & Yan, B. Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration. Nat Methods https://doi.org/10.1038/s41592-024-02244-3. (2024).
Tyson, A. L. & Margrie, T. W. Mesoscale microscopy and image analysis tools for understanding the brain. Prog. Biophys. Mol. Biol. 168, 81–93 (2022).
Goubran, M. et al. Multimodal image registration and connectivity analysis for integration of connectomic data from microscopy to MRI. Nat. Commun. 10, 5504 (2019).
Oh, S. W. et al. A mesoscale connectome of the mouse brain. Nature 508, 207–214 (2014).
Harris, J. A. et al. Hierarchical organization of cortical and thalamic connectivity. Nature 575, 195–202 (2019).
Daigle, T. L. et al. A Suite of Transgenic Driver and Reporter Mouse Lines with Enhanced Brain-Cell-Type Targeting and Functionality. Cell 174, 465–480.e22 (2018).
Genome‑wide atlas of gene expression in the adult mouse brain — E. S. Lein, M. J. Hawrylycz, A. Ao, et al. Nature 445, 168–176 (2007).
Allen Institute for Brain Science. Allen Mouse Brain Atlas. https://mouse.brain-map.org/ (2004).
Wang, Q. et al. The allen mouse brain common coordinate framework: a 3D reference atlas. Cell 181, 936–953.e20 (2020).
Fiederling, F., Hammond, L. A., Ng, D., Mason, C. & Dodd, J. Tools for efficient analysis of neurons in a 3D reference atlas of whole mouse spinal cord. Cell Rep. Methods 1, 100074 (2021).
Lein, E. S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007).
Xiong, J., Ren, J., Luo, L. & Horowitz, M. Mapping histological slice sequences to the Allen Mouse Brain Atlas without 3D reconstruction. Front. Neuroinform. 12, (2018).
Gonçalves, J. T., Schafer, S. T. & Gage, F. H. Adult neurogenesis in the hippocampus: from stem cells to behavior. Cell 167, 897–914 (2016).
Piluso, S. et al. giRAff: an automated atlas segmentation tool adapted to single histological slices. Front. Neurosci. 17 (2024).
Kaur, H. et al. Consensus tissue domain detection in spatial omics data using multiplex image labeling with regional morphology (MILWRM). Commun. Biol. 7, 1295 (2024).
Blum, J. A. et al. Single-cell transcriptomic analysis of the adult mouse spinal cord reveals molecular diversity of autonomic and skeletal motor neurons. Nat. Neurosci. 24, 572–583 (2021).
Oliveira, J. T., Yanick, C., Wein, N. & Gomez Limia, C. E. Neuron-Schwann cell interactions in peripheral nervous system homeostasis, disease, and preclinical treatment. Front. Cell Neurosci. 17, (2023).
Wolbert, J. et al. Redefining the heterogeneity of peripheral nerve cells in health and autoimmunity. Proc. Natl. Acad. Sci. 117, 9466–9476 (2020).
Sato-Yamada, Y. et al. A SARM1-mitochondrial feedback loop drives neuropathogenesis in a Charcot-Marie-Tooth disease type 2A rat model. J. Clin. Investig. 132, e161566 (2022).
Seretny, M. et al. Incidence, prevalence, and predictors of chemotherapy-induced peripheral neuropathy: A systematic review and meta-analysis. Pain 155, 2461–2470 (2014).
Piwecka, M., Rajewsky, N. & Rybak-Wolf, A. Single-cell and spatial transcriptomics: deciphering brain complexity in health and disease. Nat. Rev. Neurol. 19, 346–362 (2023).
Hu, J.-M., Chen, C.-H., Chen, S.-Q. & Ding, S.-L. Afferent projections to area prostriata of the mouse. Front. Neuroanat. 14, (2020).
Sjöstedt, E. et al. An atlas of the protein-coding genes in the human, pig, and mouse brain. Science 367, 2020 (1979).
Neurod6 information available from Human Protein Atlas. v24.proteinatlas.org (2024).
Barber, R. P. et al. The morphology and distribution of neurons containing choline acetyltransferase in the adult rat spinal cord: An immunocytochemical study. J. Comp. Neurol. 229, 329–346 (1984).
Huang, A., Noga, B. R., Carr, P. A., Fedirchuk, B. & Jordan, L. M. Spinal Cholinergic Neurons Activated During Locomotion: Localization and Electrophysiological Characterization. J. Neurophysiol. 83, 3537–3547 (2000).
Spag16 sperm associated antigen. 16 [Mus musculus (house mouse). NCBI (2024).
Acta2 actin alpha 2, smooth muscle, aorta [Mus musculus (house mouse)]. NCBI (2024).
Zhang, M.-M. et al. Neurotensin attenuates nociception by facilitating inhibitory synaptic transmission in the mouse spinal cord. Front. Neural Circuits 15, (2021).
Zayia, L. C. & Tadi, P. Neuroanatomy, Motor Neuron. (2025).
Krantic, S. et al. Hippocampal GABAergic neurons are susceptible to amyloid-β toxicity in vitro and are decreased in number in the Alzheimer’s disease TgCRND8 mouse model. J. Alzheimer’s. Dis. 29, 293–308 (2012).
Brautigam, H. et al. The isotropic fractionator provides evidence for differential loss of hippocampal neurons in two mouse models of Alzheimer’s disease. Mol. Neurodegener. 7, 58 (2012).
Dickerson, B. C. et al. The cortical signature of Alzheimer’s disease: regionally specific cortical thinning relates to symptom severity in very mild to mild AD dementia and is detectable in asymptomatic amyloid-positive individuals. Cereb. Cortex 19, 497–510 (2009).
Zhang, Y. & Imoto, S. Genome analysis through image processing with deep learning models. J. Hum. Genet. https://doi.org/10.1038/s10038-024-01275-0 (2024).
Coleman, K., Schroeder, A. & Li, M. Unlocking the power of spatial omics with AI. Nat. Methods 21, 1378–1381 (2024).
Okano, H., Hirano, T. & Balaban, E. Learning and memory. Proc. Natl. Acad. Sci. USA 97, 12403–12404 (2000).
Roh, E., Song, D. K. & Kim, M.-S. Emerging role of the brain in the homeostatic regulation of energy and glucose metabolism. Exp. Mol. Med. 48, e216 (2016).
Eugene, A. R. & Masiak, J. The Neuroprotective Aspects of Sleep. MEDtube. Sci 3, 35–40 (2015).
Ganapathy M. K., Reddy V. & Tadi P. Spinal Cord Morphology. in Neuroanatomy (StatPearls Publishing, Treasure Island, 2022).
Kovacs, G. G. Cellular reactions of the central nervous system. in 13–23 https://doi.org/10.1016/B978-0-12-802395-2.00003-1 (2018).
Jaume, G. et al. Deep learning-based modeling for preclinical drug safety assessment. (2024).
Schmauch, B. et al. A deep learning-based multiscale integration of spatial omics with tumor morphology (2024).
Sonawane, A. R. et al. Understanding tissue-specific gene regulation. Cell Rep. 21, 1077–1088 (2017).
Shumailov, I. et al. AI models collapse when trained on recursively generated data. Nature 631, 755–759 (2024).
Hoffmann, J. et al. Training compute-optimal large language models (2022).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Brixi, G. et al. Genome modeling and design across all domains of life with Evo 2. Preprint at https://doi.org/10.1101/2025.02.18.638918 (2025).
Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med 15, e1002683 (2018).
Oakden-Rayner, L., Dunnmon, J., Carneiro, G. & Re, C. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. in Proceedings of the ACM Conference on Health, Inference, and Learning 151–159 (ACM. https://doi.org/10.1145/3368555.3384468 (2020).
Wong, A. et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern. Med. 181, 1065 (2021).
Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).
D’Amour, A. et al. Underspecification presents challenges for credibility in modern machine learning. (2020).
Sajith, A. & Kathala, K. C. R. Is Training Data Quality or Quantity More Impactful to Small Language Model Performance? (2025).
Desiderio, S. et al. Touch receptor end-organ innervation and function requires sensory neuron expression of the transcription factor Meis2. https://doi.org/10.7554/eLife.89287.3 (2024).
Espinosa, F., Pop, I. V. & Lai, H. C. Electrophysiological properties of proprioception-related neurons in the intermediate thoracolumbar spinal cord. eNeuro 11. ENEURO 0331-23, 2024 (2024).
Tullo, S. et al. MR-based age-related effects on the striatum, globus pallidus, and thalamus in healthy individuals across the adult lifespan. Hum. Brain Mapp. 40, 5269–5288 (2019).
Planche, V. et al. Structural progression of Alzheimer’s disease over decades: the MRI staging scheme. Brain Commun. 4, (2022).
Cho, H. et al. Shape changes of the Basal Ganglia and thalamus in Alzheimer’s disease: a three-year longitudinal study. J. Alzheimer’s. Dis. 40, 285–295 (2014).
Chen, H., Li, D. & Bar-Joseph, Z. SCS: cell segmentation for high-resolution spatial transcriptomics. Nat. Methods 20, 1237–1243 (2023).
Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 348, (2015).
Polański, K. et al. Bin2cell Reconstructs Cells from High Resolution Visium HD Data. (2024).
Ravi, N. et al. SAM 2: Segment Anything in Images and Videos. ArXiv (2024).
Liang, Q. et al. STAR: Empowering semi-supervised medical image segmentation with SAM-based teacher-student architecture and contrastive consistency regularization. Expert Syst. Appl 293, 128650 (2025).
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 779–788. https://doi.org/10.1109/CVPR.2016.91 (IEEE, 2016).
He, K., Gkioxari, G., Dollar, P. & Girshick, R. Mask R-CNN. in 2017 IEEE International Conference on Computer Vision (ICCV) 2980–2988. https://doi.org/10.1109/ICCV.2017.322 (IEEE, 2017).
Lin, T.-Y. et al. Feature Pyramid Networks for Object Detection. in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 936–944. https://doi.org/10.1109/CVPR.2017.106. (IEEE, 2017).
Bock, R., Lang, I. & Schmitt, J. Virtual Microscopy. University of Saarland.
Acknowledgements
Thanks to Jane Dodd at Columbia and Felix Fiederling for their paper providing the spinal cord images. Thank you to Rudolf Bock at Saarland University for their virtual microscope histology site that provided the porcine sciatic nerve cross section. Thanks to Ken Lau and Harsimran Kaur at Vanderbilt University School of Medicine for helping with their MILWRM code. We would also like to thank Serena Elia, Graham Bachman, Purva Patel, Jimin Lee, Josh Milbrandt, and Saul Weiss for their help with aspects of the project. Additionally, we would like to thank the Milbrandt, DiAntonio and Mitra labs for their continued support. We would like to thank the McDonnell Genome Institute and the rest of the FIVE@MGI lab members. Finally, we would like to thank the Genetics and MGI administration and our maintenance and cleaning staff.
Author information
Authors and Affiliations
Contributions
L.K.MA., A.K.Y.Y., E.B.G., T.N., N.K.T., F.P.I., R.S., D.G., J.E.W., C.L.K., and W.J.B. wrote the manuscript. W.J.B., L.K.MA. designed the experiments. Initial planning and ideas brought by W.J.B., J.D.M., R.D.M., A.K.Y.Y., D.G., and L.K.MA. L.K.MA, A.K.Y.Y., T.N., N.K.T., W.D., and L.Y. performed the experiments. L.K.MA., A.K.Y.Y., E.B.G., T.N., N.K.T., F.P.I., R.S., and W.J.B. analyzed the results. Key Software was developed by W.J.B., L.K.MA., and N.K.T. All reviewed the paper and gave suggestions.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ali, L.M., Yim, A.K.Y., Gerbi, E. et al. Biophysical simulation enables segmentation and nervous system atlas mapping for image first spatial omics. npj Syst Biol Appl 12, 3 (2026). https://doi.org/10.1038/s41540-025-00627-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41540-025-00627-6








