Introduction

Recent innovations in highly multiplexed immunofluorescence imaging1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 have substantially increased the range of antigens that can be spatially profiled in a tissue sample, from 3–5 targets to  ~60 (see ref. 16). Segmentation is a required step for quantitatively associating their spatial expressions with individual cells. Since 2012, when AlexNet17, a deep convolutional neural network (CNN), outperformed other methods in the ImageNet classification challenge, there has been a paradigm shift towards using CNN-based deep learning (DL) frameworks18 trained on curated datasets for cell and nucleus segmentation tasks19,20,21,22,23,24,25,26,27,28. Among them, Cellpose25—a DL method based on a U-Net architecture utilizing gradient flow representation of cells—and Mesmer26—a DL method based on ResNet50 architecture—have demonstrated human-level performance in the highly multiplexed imaging context. However, due to their dependence on stochastic gradient descent and back-propagation-based optimization during the training step, it remains difficult to identify the contribution of each neuron to the eventual segmentation outcome, and as a consequence explain the source of errors in segmentation when they occur29. As a result, improving performance of these black-box DL models requires rewiring the input–output mapping via training on additional datasets30. However, in complex tissue samples with considerable heterogeneity and ambiguity in cellular organization, it is unclear whether retraining alone will consistently improve results across all samples, or if multiple DL models need to be constructed and used through a trial and error approach, with the hope that their performance will optimally generalize. Curation of accurately annotated datasets of sufficient quality that capture the tissue microenvironment diversity also remains a critical challenge.

In contrast to DL approaches, most unsupervised cell segmentation methods31,32,33,34,35,36,37,38,39,40,41,42,43 do not require training data, are explainable, and therefore where needed, can be optimized for individual images. However, to the best of our knowledge, to date no unsupervised segmentation method capable of approaching DL method performance has been reported in the literature. Here, we present a new unsupervised segmentation algorithm (UNSEG) capable of performing sub-cellular segmentation of tissue sample images with accuracy on par with state-of-the-art DL segmentation approaches such as Cellpose and Mesmer. UNSEG achieves this performance in two stages. At the first stage, UNSEG quantifies the intrinsic contrast provided by any nucleus and cell membrane-specific markers at the local and global scale, and jointly exploits it to assign each pixel to the nucleus, cell membrane, or the background class. This pixel assignment is implemented with the help of a Bayesian-like framework that computes a priori distributions and an image contrast-based likelihood function to estimate the posterior probabilities of each pixel belonging to the nucleus, cell membrane or background classes. UNSEG uses the posterior probabilities to assign the pixel to the correct compartment. At the second stage, it parses the semantic pixel assignments into topologically consistent nuclei and cells. Towards this goal UNSEG introduces a perturbed watershed algorithm to correctly partition a nucleus cluster into individual nuclei. The final output of UNSEG are nucleus and cell segmentations corresponding to the input image.

We have curated a labeled gastrointestinal tissue (GIT) dataset comprising of diverse images of gastrointestinal tissue to benchmark UNSEG performance. We anticipate that this dataset will also be useful to DL researchers and the broader research community and help ameliorate the shortage in annotated imaging datasets30. We have also tested UNSEG performance on public datasets, with images drawn from diverse tissue types and diseases beyond the gastrointestinal system, that have been labeled with different nucleus and cell membrane markers and acquired at different magnifications and resolutions. In addition, we also demonstrate applicability of UNSEG in a variety of real-world cases that include, weakly expressing markers, non-specific markers, different nucleus markers, and multiplexed ion beam imaging (MIBI). In the context of these diverse scenarios, we also discuss how quantification of segmentation accuracy can potentially be biased depending on the nature of deviation of segmentation mask from the ground truth. Finally, we note that since UNSEG does not require any training data to segment tissue images, it can be used to generate high-quality segmentation of unlabeled tissue images, which is majority of the data in real-world settings, as optimized initial estimates for improving DL models within unsupervised and semi-supervised settings. UNSEG, therefore, is an easy-to-use method for unsupervised sub-cellular segmentation of images of complex tissue samples that does not require extensive setup and performs on par with state-of-the-art DL methods. It also has the potential to improve the state-of-the-art in deep learning.

Results

UNSEG principle and design

Segmenting cells and nuclei in 2D images of tissue samples is challenging because of their complex morphology, ambiguous overlaps, and heterogeneity in the spatial distribution of nucleus and cell membrane markers within each cell. In the morphological context, although cells and their nuclei exhibit an overall convex topology, they locally deviate from it to varying degrees depending on cell types, and particularly in tumors with irregularly shaped cancer cells. In addition, many cells in a tissue-dependent manner are clumped in clusters where their shape and overlap is difficult to parse. Cells in tissues also exhibit uneven intra-cellular distribution of marker expression. Together, these degrees of complexity make it difficult to consistently segment cells and nuclei using unsupervised segmentation approaches such as classical watershed31,32,38, shape and intensity prior36,37,39,40,41, and tracking of diffused gradient flow33,34, which have primarily been developed for segmenting cells in culture that lack tissue associated heterogeneity related to cellular morphology, expression, and overlap. UNSEG framework overcomes these limitations by jointly exploiting the expression-based topology and distribution of markers specific to nuclei and cell membranes (Fig. 1). Such markers are also used in the supervised context of DL methods, such as Cellpose and Mesmer.

Fig. 1: UNSEG framework.
figure 1

Input is a two-channel image comprising of nucleus (channel 1) and cell membrane (channel 2) marker expressions. a A priori spatial probability distributions of nucleus and cell membrane marker expressions. b Likelihood map of a pixel to belong to the nucleus or cell membrane, quantified through the visual contrast function, which mimics human perception. c A posteriori local and global semantic segmentation masks respectively capturing local morphological heterogeneity and global nucleus and cell membrane topology. d Instance segmentation of nuclei from semantic segmentation masks. e Instance segmentation of cells based on individual segmented nuclei and semantic masks. Nucleus and cell segmentation results of (d, e) form the UNSEG output. See “Methods” for more details.

UNSEG combines a priori probability of each image pixel belonging to a nucleus or cell membrane (Fig. 1a) with a contrast-based likelihood function (Fig. 1b), to compute a posteriori semantic segmentation of image pixels into nucleus and cell membrane (Fig. 1c). UNSEG performs this segmentation both at the global level of the entire image, and at the local level in a neighborhood around each pixel (Fig. 1c). The local segmentation captures the local heterogeneity in nucleus and cellular morphology, while the global segmentation ensures that the overall topological structure of the nuclei and cell membranes is preserved across the entire image. The final step of UNSEG utilizes these local and global nucleus and cell semantic masks to obtain instance segmentation of individual nuclei (Fig. 1d) and cells (Fig. 1e). This step includes partitioning nucleus clusters into individual nuclei based on convexity analysis, perturbed watershed and its ancillary function we refer to as virtual cuts. The latter two are briefly described below. The details of each step are described in “Methods”.

Perturbed watershed

Classical watershed-based segmentation44,45 identifies individual nuclei in a cluster as watersheds, with each watershed basin representing a nucleus in the cluster. However, heterogeneity in the spatial distribution of nucleus marker can make it difficult to uniquely identify the individual basins. Cellpose overcomes this problem in the supervised context by developing a gradient flow field representation of each nucleus whose ground truth is annotated by a human user25. This representation provided a stable and unique representation of nucleus basins. In the unsupervised context, we have developed a perturbed watershed approach (Fig. 2 and “Methods”), where the initial watershed-based segmentation (Fig. 2i) of the nucleus cluster into individual nuclei is perturbed (Fig. 2j–m) based on an adaptive distance-transform estimate (Fig. 2h) computed from the global nucleus cluster (Fig. 2d), and local topology of the cell membrane network (Fig. 2e). Nuclei that are correctly segmented remain stable to the perturbations, while spuriously segmented nuclei collapse to a point-like object with area not exceeding a few pixels. When applied recursively, perturbed watershed partitions the nucleus cluster into individual nuclei. An example of a two-nuclei cluster is shown in Fig. 2. Initial watershed partitions the cluster into three nuclei (Fig. 2i), one of which shrinks to a point object on perturbation of the watershed seed point. The perturbation is performed in four directions: up, down, left, and right. In this example, the unstable nucleus collapsed for three of those perturbations (up, down, and left), indicating that the seed point is unstable and the corresponding segmentation is a spurious nucleus. Therefore, it is removed and the correct watershed-based segmentation (Fig. 2n) is obtained using the two remaining stable seed points and the original distance transform (Fig. 2g). We note that the perturbed watershed algorithm does not make any assumptions specific to the fluorescence-based imaging modality. It is, in fact, agnostic to the imaging modality and can be used to improve classical watershed results, wherever the latter method is applicable.

Fig. 2: Perturbed watershed method.
figure 2

a Input image fragment with two abutting nuclei. be The posterior global and local masks of the input image from which the global nucleus cluster mask and local cell membrane mask are extracted for downstream perturbed watershed analysis. f Global nucleus cluster mask with cuts corresponding to the local cell membrane mask. g Distance transform of this modified global nucleus cluster mask. h Adaptive distance-transform estimate obtained by thresholding the distance transform by davr (see “Methods”). i Initial (unperturbed) watershed segmentation. jm Perturbed watershed segmentation computed after shifting all markers from their unperturbed positions to the left (Δx = − davr), right (Δx = davr), up (Δy = davr), and down (Δy = − davr) on davr = 5 pixels, respectively. n Output segmentation of two-nuclei cluster based on the perturbed watershed.

Virtual cuts

In some cases, mostly when cell membrane marker is not present, the initial watershed segmentation step might undersegment the cluster. For such cases, we have developed the virtual cuts method that utilizes non-convex topology of the cluster to identify nuclei centroids that act as seed points for the watershed algorithm. See “Methods” for implementation details.

New dataset for benchmarking segmentation performance

As part of our UNSEG development, we have curated 75 tiff images of tissue sections from eight organs of the extended human gastrointestinal system—appendix, colon, esophagus, gallbladder, liver, pancreas, small intestine, and stomach. The immunofluorescence images were acquired via imaging of formalin-fixed paraffin-embedded (FFPE) tissue sections labeled using Hoechst and fluorescent-dye-conjugated Na+K+ATPase as respective markers for cell nuclei and membranes (see “Methods”). The image dimensions are 1000 × 1000. The images were acquired using a 0.95 numerical-aperture objective with 40× magnification, and have a pixel pitch of 0.16 μm/pixel. Our gastrointestinal tissue (GIT) dataset includes images of normal tissues as well as tissues related to chronic inflammation, cancer precursor lesions, and cancer. These images capture a wide range of tissue organization from samples with sparsely located cells to those with very high cell density. Figure 3 shows 12 representative images from the GIT dataset.

Fig. 3: Gastrointestinal tissue (GIT) dataset.
figure 3

Twelve representative tissue images from the GIT dataset drawn from different organs of the human gastrointestinal system with different pathobiology. Blue and red colors, respectively, indicate nucleus (Hoechst) and cell membrane (Na+K+ATPase) marker expressions. The dimensions of each image are 1000 × 1000 pixels. The images were acquired using microscope with 0.95 NA, 40× objective and imaging sensor with a pixel pitch of 0.16 μm/pixel.

Expert pathologists independently annotated the 75 images resulting in ground truth with 16,201 nuclei and 16,217 cells. These annotations were performed manually, without any algorithmic aid, to truly reflect human performance. The detailed description of the dataset is presented in Supplementary Table 1 and Supplementary Fig. 1, while the nuclei and cell annotations of 12 representative images are shown in Supplementary Fig. 2. To annotate nuclei and cells in the 75 images, we developed Cellthon—a Python-based graphical user interface for annotating cells and their nuclei in tissue images.

We used the GIT dataset to benchmark UNSEG performance. Moreover, we anticipate that this dataset will also serve as a resource for researchers requiring annotated datasets for future algorithm development and testing30.

UNSEG benchmarking using GIT and publicly available datasets

We used GIT and publicly available datasets to benchmark the segmentation performance of UNSEG with respect to Cellpose and Mesmer, the two state-of-the-art DL methods that have consistently demonstrated good performance in segmenting immunofluorescence imaging data particularly in the context of highly multiplexed imaging25,26. To perform the comparison with Cellpose, we used Cellpose version 2.1.0. In this version, we chose nuclei and TN2 models from the Cellpose “model zoo” to respectively segment nuclei and cells. Our choice was based on them giving the best segmentation results for the GIT dataset in comparison to all other Cellpose models. We used Cellpose size calibration procedure to estimate the cell diameter for each of the 75 images in our dataset. We also chose Mesmer model, DeepCell 0.12.6, and set the model parameter image_mpp to the pixel pitch in microns per pixel for our imaging dataset. Benchmarking was performed by computing the F1 score (Eq. (7)) as a function of intersection over union (IoU) threshold46. The IoU threshold metric quantifies the degree of overlap between algorithm prediction and the annotated ground truth. It is bounded between 0 and 1, with one indicating perfect overlap. By computing the F1 score over the IoU range, we obtain the F1 accuracy curve for each method (see “Methods” for more details).

Figure 4a shows UNSEG, Cellpose, and Mesmer segmentation results applied to four representative examples from our 75 image GIT dataset. Visual comparison shows similar performance between the different methods. One difference between UNSEG and the other two methods is that, although, UNSEG does implement boundary smoothing, it does not enforce strict shape constraints. As a consequence, the shape of UNSEG-based nucleus and cell segmentation is more irregular but also more realistic and less synthetic appearing than Cellpose and Mesmer.

Fig. 4: Comparison of UNSEG, Cellpose, and Mesmer on four example images from the GIT dataset.
figure 4

a Columns respectively correspond to appendix, esophagus, gallbladder, and small intestine tissue images. Rows show nucleus (white boundary) and cell (green boundary) segmentation results for the four examples using UNSEG, Cellpose and Mesmer, respectively. b The two rows, respectively, show nucleus and cell segmentation accuracy of UNSEG, Cellpose, and Mesmer. Accuracy is measured using number of segmented objects (see insets) and F1 score curves plotted as a function of IoU threshold between the segmented and annotated labels for nuclei and cells, respectively.

The F1 curves for the four examples (Fig. 4b) demonstrate that UNSEG performance is similar to that of the DL methods trained on about a million cells. The ground truth annotations for these four examples are shown in Supplementary Fig. 2.

The similarity in their performance on the four example images generalizes to the entire GIT dataset. The results are shown in Fig. 5. The first row depicts the median F1 curves corresponding to nucleus and cell segmentation by the three methods. The curves indicate that the three methods have similar segmentation performance. For cell segmentation, the median UNSEG performance is slightly below the other two methods, which is partly due to the conservative nature of UNSEG cell segmentation in resolving cell boundary ambiguity in cases where the tissue section capture partial cell membranes without their respective nuclei. In these cases, UNSEG does not always include their segmentation masks in the final results. (Also see, “F1 score and accuracy” section below.) Nevertheless, if we look at the pairwise 95% F1 confidence interval comparison between UNSEG performance, with Cellpose and Mesmer—the second and third rows of Fig. 5 respectively—we clearly see their almost complete overlap, indicating their overall similar performance. A more detailed version of Fig. 5 is presented in Supplementary Fig. 3. We note that we used the same UNSEG parameters to segment all 75 images in the GIT dataset and did not optimize them for every image, despite this ability being a strength of UNSEG and would have boosted its performance. The rationale for eschewing this adjustment was to demonstrate that our probabilistic reinterpretation of the two-channel image through a Bayesian lens provides UNSEG with robustness and performance stability, and prevents it from being brittle and requiring continuous adjustment. We additionally note that this is unlike our characterization of Cellpose performance, where we adjusted its size parameter for every image. Therefore, our performance curves are biased towards Cellpose. The UNSEG parameter values we used for GIT dataset are listed in Supplementary Table 2 and discussed in “Methods”.

Fig. 5: Performance comparison of UNSEG, Cellpose, and Mesmer for the entire GIT dataset.
figure 5

First row compares median F1 score performance curves for the three methods as a function of IoU threshold for nucleus and cell segmentation of images in the GIT dataset. The insert contains median F1 score values at the IoU threshold of 0.5 for three algorithms. The second and third rows, respectively, show pairwise comparison between UNSEG and Cellpose, and UNSEG and Mesmer. The comparison includes median F1 score curves along with their 95% confidence intervals. Their complete overlap indicates similar performance of all three methods.

Furthermore, we also benchmarked the segmentation performance of UNSEG with respect to Cellpose and Mesmer using publicly available, multiplexed imaging tissue datasets acquired using CODEX, Vectra, and Zeiss imaging platforms47,48. Supplementary Figs. 46, respectively, show the cell segmentation performance of UNSEG, Cellpose, and Mesmer on CODEX, Vectra, and Zeiss datasets. The Codex dataset comprises of ten 400 × 400 images of lymph nodes and tonsils. For our benchmarking, we chose CD20 and CD45RO as cell membrane markers to demonstrate the ability of UNSEG to work with different cell membrane markers. These images were acquired using an objective with 20× magnification, and imaging sensor with pixel pitch of 0.3774 μm/pixel47,48. Supplementary Fig. 4a depicts an example image of lymph node from the CODEX dataset, along with its ground truth cell annotation, the cell segmentation predicted by UNSEG, Cellpose, and Mesmer, and their corresponding F1 score-based performance curves. Due to the high cell density, lymph node samples are typically difficult to segment. This example provides a clear visual and quantitative demonstration of UNSEG performing segmentation on par with Cellpose and Mesmer. Supplementary Fig. 4b further shows that the quality UNSEG performance extends to the entire CODEX dataset.

Similarly, Supplementary Figs. 5 and 6 compare the performance of UNSEG cell segmentation with that of Cellpose and Mesmer for Vectra and Zeiss datasets47,48, respectively. The Vectra dataset includes 131 tissue images of size 400 × 400 from a range of pathologic diseases that include lung adenocarcinoma, extramammary Paget disease, pancreatic ductal adenocarcinoma, lung small cell carcinoma, colon adenocarcinoma, Hodgkin lymphoma, breast ductal carcinoma, serous ovarian carcinoma, squamous cell carcinoma, Merkel cell carcinoma, and squamous mucosa. The Zeiss dataset consists of nineteen tissue images of size 800 × 800, acquired from tissue sections of cutaneous T-cell lymphoma, pancreatic adenocarcinoma, basal cell carcinoma, and melanoma. Both Vectra and Zeiss datasets were acquired using 20× magnification objectives however pixel pitches of imaging sensors were 0.5 μm/pixel and 0.325 μm/pixel respectively47,48. Although, UNSEG performs stable and high-quality segmentation, faithfully capturing cell shapes, its F1 score-based performance is upper bounded by Cellpose and Mesmer. This is partly due to the tendency of the annotated ground truth to have on average smaller cell size, when compared to Cellpose and Mesmer estimates, which tends to favor their F1 scores (also see, “F1 score and accuracy” section below). We found this to be particularly true for Vectra dataset. For this dataset, it was also difficult to find cell membrane markers that were appropriately imaged across the different images. We, therefore, utilized pan-cytokeratin, a cytoplasmic marker for cell segmentation. Since, UNSEG has been developed for utilizing nucleus and cell membrane marker for unsupervised segmentation, and not nucleus and cytoplasm marker, we did expect reduced performance. However, the quality of UNSEG segmentation remained remarkably robust, despite the expected reduction in UNSEG F1 score values.

Applicability of UNSEG to different practical scenarios

We also tested UNSEG performance in multiple different practical scenarios.

  1. 1.

    Weakly expressing cell membrane marker: We identified a tissue image of human skin with dermatofibrosarcoma acquired from a publicly available CODEX dataset13, which is a different dataset from the one discussed above. This image has weakly expressing Na+K+ATPase as the cell membrane marker. Hoechst is the nucleus marker. The image size is 1440 × 1440 pixels. It was acquired using an objective with a 20× magnification and a sensor with a pixel pitch of 0.377 μm/pixel. As shown in Supplementary Fig. 7, UNSEG demonstrates stable and robust segmentation performance with a weakly expressing membrane marker. As this dataset lacked annotations, we did not compute the F1 curve but as the figure demonstrates, a visual, qualitative assessment of UNSEG segmentation compares favorably with Cellpose and Mesmer.

  2. 2.

    Using a non-specific cell membrane marker to segment cells: In Supplementary Fig. 5, using the Vectra dataset, we demonstrated that UNSEG is robust to using cytoplasmic markers for cell segmentation. To further test the wide applicability of UNSEG, we replaced weakly expressing Na+K+ATPase with Hyaluronan, which cannot only localize to the cell membrane but also to the cytoplasm and the extracellular matrix. We used Hoechst as the nucleus marker. Supplementary Fig. 8 shows that UNSEG performs high-quality nucleus and cell segmentation, which also compares favorably with generalist methods like Cellpose and Mesmer.

  3. 3.

    DRAQ5 as the nucleus marker: We next switched Hoechst with DRAQ5 as the marker for the nucleus, while keeping Hyaluronan as the cell membrane marker. Supplementary Fig. 9, show that UNSEG continues to provide high-quality segmentation.

  4. 4.

    Applying UNSEG to multiplexed ion beam imaging (MIBI): We also tested UNSEG sub-cellular segmentation performance on nuclei and cells in a placental tissue image acquired using MIBI, an alternative multiplexed imaging technology6,8. The image was downloaded from the Human BioMolecular Atlas Program (HuBMAP) database.49 The image size is 2048 × 2048, with pixel pitch of 0.391 μm/pixel. Due to lack of clearly identified annotation, Supplementary Fig. 10 does not show the F1 curves, but does provide a visual comparison of UNSEG, Cellpose and Mesmer performance. As before, UNSEG performance continues to be at par with deep learning methods.

F 1 score and accuracy

F1 is a well-established score for assessing segmentation accuracy. It simultaneously accounts for the proportion of correctly segmented objects and their pixel-wise matching with ground truth object profiles46. However, as we show in Supplementary Fig. 11, F1 score is biased depending on how the estimated segmentation mask deviates from the ground truth. Specifically, F1 value is higher if the size of the estimated segmentation mask is larger than the ground truth, as compared to when it is smaller. In fact, as shown in Supplementary Fig. 11, the former upper bounds the latter. Both Cellpose and Mesmer, on average, have larger cell segmentation mask estimates when compared to UNSEG. This is a contributory factor towards the higher median F1 scores for Cellpose and Mesmer, even when segmentation results from all three methods are reasonable. Supplementary Fig. 4 exemplifies this point. There, even though cell segmentation results from all three methods are reasonable, UNSEG has a slightly lower F1 curve, due to it being conservative in estimating cell size, as is discussed above in the subsection on UNSEG benchmarking.

UNSEG characteristics and use case

UNSEG employs an integrated approach to segmenting nuclei and cells that, by design, emphasizes internal consistency between each cell nucleus and its membrane. As a consequence, UNSEG guarantees that no segmented nucleus can be located beyond the boundaries of its cell. This drawback is often found in both Cellpose and Mesmer, where nucleus and cell segmentations are performed independently. Figure 6a depicts a small intestine tissue section illustrating the internal inconsistency in nucleus and cell boundaries estimated by Cellpose and Mesmer for a pair of examples highlighted with dashed boxes. In the case of Cellpose the larger nucleus is located in two cells, while in Mesmer, for region marked as 1, two cells are sharing the same nucleus. For region marked as 2, in the case of Cellpose the nucleus extends beyond the boundary of its cell. UNSEG avoids such discrepancies due to its joint segmentation of nuclei and cells. This joint processing ensures that UNSEG can unambiguously identify the cytoplasmic compartment of cells. The internal consistency among sub-cellular compartments is of particular importance in biological studies where correct sub-cellular localization of signaling pathway components is essential to study intra-cellular signaling. For example, tumor protein P53 can be sequestered in the cytoplasm, or localized in the nucleus depending on DNA damage, and other exogenous and endogenous stresses. However, in unstressed cells, it is expressed at low levels and localizes in both the cytoplasm and the nucleus50. As another example, histone methyltransferase EZH2 localizes in the nuclei, where it regulates gene expression through its canonical histone lysine methyltransferase activity51. Supplementary Fig. 12 depicts an example of such a real use case, where UNSEG is used in a multiplexed imaging context to segment cells and their nuclei based on Hoechst and Na+K+ATPase. The UNSEG-based segmentation is used to localize intra-cellular P53 and EZH2 expression in a region of healthy colon tissue with densely located cells (see “Methods”). The internal consistency of UNSEG segmentation ensures that the user is correctly able to evaluate P53 expression in the nucleus and the cytoplasm, while ensuring that the canonical activity of EZH2 in the healthy tissue is not associated with the cytoplasm.

Fig. 6: Characteristics of UNSEG method.
figure 6

a UNSEG demonstrates internal consistency between nucleus and cell boundaries. The dashed box 1 contains two nuclei, where both Cellpose and Mesmer have mismatch between the boundaries of two cells and their nuclei. The dashed box 2 contains another cell, where Cellpose has mismatch between the nucleus and its cell boundaries. b UNSEG is better at capturing complex shapes of nuclei in comparison to Cellpose and Mesmer, as exemplified by the arrows indicating examples of nuclei with complex shapes. c Runtime complexity of UNSEG as a function of number of cells and image area, assuming uniform cell distribution for the latter.

As briefly mentioned earlier, UNSEG does not impose a strict shape constraint on the segmented nuclei by allowing them to be locally non-convex. Consequently, in complex tissue sections it is, on average, better at preserving true nucleus shape than Cellpose and Mesmer, which either are usually more rounded, and in regions of the tissue with high cell density, appear like Voronoi partitions of the tissue region. Figure 6b shows an example of pancreas tissue with elongated cells that deviate from round shapes. As can be seen, the ability of UNSEG to combine knowledge of global tissue architecture and local topology, with a relaxed shape constraint allows it to better capture elongated nucleus morphology when compared to Cellpose and Mesmer. This ability is highly relevant in the context of the use case mentioned above, where users, such as cancer biologists are studying the tumor microenvironment that might include a diversity of cell shapes associated with cancer, immune, and stromal cell populations.

Runtime complexity of UNSEG is a function of number of cells and not the image size. Specifically, UNSEG runtime complexity scales approximately linearly with respect to the number of segmented cells in the image. This translates to linear scaling with respect to image area, if the spatial distribution of cells is approximately uniform. However, for sparsely populated images UNSEG runtime will be significantly sub-linear. Figure 6c shows linear dependence with respect to the number of segmented cells and the image area, under the assumption of uniform cell distribution. The results were generated using an acquired colon tissue microarray (TMA) spot with approximately uniform cell distribution. The segmentation results for the whole TMA spot are presented in Supplementary Fig. 13.

Discussion

The importance of segmenting cells and their nuclei has gained renewed prominence due to the advent of multiplexed imaging technologies that have significantly enhanced the depth of information that can potentially be extracted from samples in a cell-specific manner. However, tissue sections have complex cell organizations and unlike computer vision tasks, segmenting individual cells even by human experts is a difficult challenge, resulting in inter-observer discordance. Such discordance usually grows as the number of cells requiring annotation grows. This, in turn, affects ground truth quality used to train supervised learning models, and is a bottleneck for generating high-quality training data. The unsupervised approach provides a complementary paradigm to segmenting complex tissue images without requiring training data. Unsupervised methods are also more adaptable to individual images of varying complexity. However, to the best of our knowledge, until now no method within the unsupervised paradigm had demonstrated performance approaching supervised learning methods, particularly those based on deep learning. As a consequence, none of its advantages were relevant. UNSEG, for the first time, to the best of our knowledge, demonstrates that unsupervised cell and nuclei segmentation can achieve accuracy at par with the current state-of-the-art methods in deep learning. It also introduces the perturbed watershed algorithm, a standalone algorithm that extends the ability of classical watershed algorithm to correctly segment nucleus clusters. Perturbed watershed is applicable in all cases where the classical version can be used. Finally, like the generalist DL methods, UNSEG is not brittle, and is applicable to a range of tissue types, disease pathologies, nucleus and cell membrane markers, and multiplexed imaging modalities. It achieves accuracy on par with these methods, along with the added benefit of guaranteeing segmentation consistency between a cell and its nucleus, and being faithful to their morphology. These latter benefits can potentially be helpful in accurate sub-cellular localization of mRNA transcripts in microscopy images generated using well-established protocols for fluorescence in-situ hybridization52 and its multiplexed counterparts53,54,55,56,57 when combined with nucleus and membrane fluorescence markers.

Segmentation fundamentally involves learning features and image representations that help the algorithm identify individual cells and their nuclei. Deep learning models extract these features and representations in a supervised manner. Interestingly, UNSEG performance reveals that there is intrinsic information latent in the topology of cells and nuclei within the tissue context of an individual image that is equivalent to training on one million cells26. Importantly, this information can be acquired adaptively for every tissue image. Therefore, it is conceivable to develop adaptive DL methods that perform sub-cellular segmentation of individual unlabeled tissue images adaptively, by leveraging UNSEG as a label generator to initialize internally consistent cell and nucleus labels that a DL method can optimize and improve using self- and semi-supervised learning paradigms. For example, in a self-supervised learning framework UNSEG could be used to optimally initialize joint learning of neural network parameters and k-means-based segmentation of cells and nuclei58. Another application could be in a semi-supervised setting, where a small portion of the image is annotated, while the remaining is unlabeled. Here, UNSEG could be used to provide pseudo-labeling estimate of cell and nucleus segmentation for the unlabeled data, which can then be used to refine the DL model trained on labeled data59,60. Finally, UNSEG could be used in the setting of learning with noisy labels, where the UNSEG generated segmentation masks are noisy labels on which robust DL models can be trained61.

UNSEG performs sub-cellular segmentation based on nucleus and cell membrane compartment markers. However, its framework does not impose any constraint on the number of markers that can be used. For example, in multi-nucleated cells, UNSEG can be modified to incorporate an additional marker specific to the nuclear membrane to coherently segment multiple overlapping nuclei belonging to the same cell. Supplementary Fig. 14 depicts an example of a multi-nucleated cell, with Lamin A/C (shown in green) marking the nucleus membranes. As depicted in this figure, the modification of UNSEG utilizes the specificity of the extra marker to segment the nuclei and associate them with the same cell.

UNSEG is an easy-to-use method for sub-cellular segmentation of complex tissue images using multiplexed imaging technologies. It only uses well-known and robust Python libraries that require minimal setup and is accessible to researchers with varying computational backgrounds. In total, UNSEG has thirteen parameters (see Methods, Supplementary Table 2, and the code implementation), all with clear meaning and interpretation, and assigned default values for images having a pixel pitch of 0.16 μm/pixel. Among them, minimal area and convexity threshold are the two primary parameters (see “Methods” and Supplementary Table 2) that have the strongest effect on UNSEG execution. They can be adjusted by the user to optimize segmentation performance for individual images including relatively large images as shown in Supplementary Fig. 13. However, as we demonstrated using the GIT, CODEX, Vectra and Zeiss datasets, a single setting of these two parameters can also be used across an entire cohort of images with the same pixel pitch, without noticeably compromising segmentation quality. The user can also define the expected cell size via the dilation radius (u0) parameter. UNSEG uses this parameter only for cells without cell membrane marker expression. This parameter does not affect execution of the core UNSEG method. Two other parameters, disk radius (r0) and the kernel-size list (n0) can also be customized to more accurately account for local background noise and pixel pitch. Examples based on such customization are shown in Supplementary Figs. 4–10. The remaining parameters only marginally affect UNSEG segmentation quality, but if needed, can be used to further fine-tune UNSEG performance. The default values and reasonable adjustment ranges for all parameters are listed in Supplementary Table 2. UNSEG, therefore, is a flexible framework that can also be extended to include additional markers to enhance cell segmentation and to extract localized expression of individual markers across the tissue sample. Finally, we re-emphasize that unlike segmentation of objects in computer vision-based situational awareness tasks, segmenting cells and their nuclei, particularly in the context of tissue samples, often results in subjective ground truth. By being able to capture intrinsic, marker-specific topological structure of cell compartments, UNSEG offers opportunities to further improve current state-of-the-art deep learning methods. To aid in this task, we have also generated a GIT dataset of 75 tissue images from eight organs of the human gastrointestinal system, along with their corresponding nucleus and cell annotations independently generated by expert pathologists.

Methods

Generation of GIT dataset and other images

For GIT dataset, formalin-fixed paraffin-embedded (FFPE) tissue microarray (TMA) slides were obtained from Pantomics (Pantomics, DID381) Tissue TMA samples for Supplementary Figs. 1214 were obtained from the Department of Pathology at University of Pittsburgh Medical Center Presbyterian Hospital. The slides went through cyclic immunofluorescence antigen retrieval protocol10. The corresponding figure slides were stained in cycles with 1:200 dilution of Anti-Sodium Potassium ATPase antibody (Abcam ab198367, clone EP1845Y), 1:100 dilution of P53 antibody (Abcam ab270192, clone SP5), 1:50 dilution of EZH2 antibody (CST 45638, clone D2C9), and 1:100 dilution of \({{{\rm{LAMIN}}}}\,{{{\rm{A}}}}/{{{\rm{C}}}}\) antibody (CST 8617, clone 4C11) overnight at 4 °C in the dark, followed by staining with Hoechst 33342 (CST 4082S) for 10 min at room temperature in the dark. TMA images were acquired using a 0.95 NA and a 40× objective on a Nikon Ti2E microscope.

Seventy-five, 1000 × 1000 high-quality regions were identified and extracted from the TMA images and saved as tiff images. Expert pathologists independently annotated these images. The annotations were done using Cellthon, a Python-based cell annotation graphical user interface (GUI) we created using Tkinter toolkit62. Together these 75 images and their cell and nucleus annotations comprise the GIT dataset.

UNSEG algorithm

Input image

The input to our algorithm is a two-channel image. An example is illustrated in the “input" panel of Fig. 1 and Supplementary Fig. 15, as well as in Fig. 3 and Supplementary Fig. 13. Channel one, depicted in blue, and channel two shown in red, are respectively associated with nucleus and cell membrane marker expressions. Each channel of the image is independently scaled to 0 and 1, such that Ii: Ω → [0, 1]. Here Ii is the normalized image intensity for ith channel, Ω is the image domain, and i = 1, 2 is the indexing representing the two channels.

The algorithm performs nucleus and cell segmentation utilizing a Bayesian framework: the posterior probability estimates of nucleus and cell masks are obtained from their a priori and likelihood estimates that UNSEG computes from the normalized two-channel image. These posterior estimates are then used to obtain the final nucleus and cell segmentations. UNSEG implements this framework through four processing stages detailed below and illustrated in Fig. 1 and Supplementary Fig. 15.

Processing stage 1: computing a priori nucleus and cell membrane masks

In Stage 1, we compute a priori estimates of the image foreground for each channel. The estimates are computed at the global and local scale as described below.

A priori probability: Each channel, Ii(xy), i = 1, 2, is first pre-processed using a combination of a Gaussian filter63 and multi-level Otsu63,64,65. The standard deviation of the Gaussian filter kernel, σ is a parameter of the algorithm that allows the user to control the degree of smoothing. This and other algorithm parameters are summarized in Supplementary Table 2. Our default setting is σ = 3. A three-level Otsu is next applied to the smoothed image, and the lowest level is selected as the threshold to obtain the initial estimate of the channel foreground.

We use the initial, per-channel foreground estimate to compute the cumulative distribution function (CDF), \({{{{\mathcal{F}}}}}_{i}\) of Ii using intensity values, Ii(xy), of pixels (xy) within this estimate. Two examples of CDFs are presented in Supplementary Fig. 15. Using the monotonically non-decreasing property of CDF, we map Ii to its cumulative probabilistic representation \({P}_{i}^{e}\), where \({P}_{i}^{e}(x,y)={{{{\mathcal{F}}}}}_{i}\left({I}_{i}(x,y)\right)\). We define \({P}_{i}^{e}(x,y)\) to be the a priori probability of the pixel being the nucleus (i = 1) or cell membrane (i = 2). We note that this definition quantifies the intuition that stronger the marker intensity at a particular pixel, the higher its a priori probability. Examples of a priori probabilities for nuclei (\({P}_{1}^{e}\)) and cell membranes, (\({P}_{2}^{e}\)) are presented in Fig. 1 and Supplementary Fig. 15.

A priori global mask: We compute the a priori global mask \({M}_{i}^{g}(x,y)\), i = 1, 2 using \({P}_{i}^{e}\) and a simple filter called local mean suppression filter (LMSF) that we have developed. The foreground pixels (xy) where \({M}_{i}^{g}(x,y)=1\) are designed to be a superset of the pixels belonging to the true nucleus (i = 1) and cell membrane (i = 2) compartments of cells in Ii(xy), i = 1, 2. \({M}_{i}^{g}\), therefore, ensures that no pixels belonging to the cells are missed.

LMSF is designed to identify the valleys (or space) that exist between nuclei (or cell membranes) of closely located cells that nevertheless have some spillover marker expression, and are therefore, difficult to identify as background. We define LMSF as,

$${\hat{I}}_{i}(x,y)= \left\{\begin{array}{ll}0,\quad &{{{\bf{if}}}}\,\,\frac{{I}_{i}(x,y)}{{\bar{I}}_{i}(x,y)} < \, {t}_{0}\\ {I}_{i}(x,y),\quad &{{{\bf{otherwise}}}}\end{array}\right.,\,\, \\ {{{\rm{where}}}}\,\,{\bar{I}}_{i}(x,y)= \frac{1}{{\left(2{n}_{0}+1\right)}^{2}}{\sum}_{\xi =x-{n}_{0}}^{x+{n}_{0}}{\sum}_{\eta =y-{n}_{0}}^{y+{n}_{0}}{I}_{i}(\xi ,\eta ).$$
(1)

The above definition states that for a given pixel (xy) Ω, LMSF replaces the original intensity value with 0 only if the ratio of the pixel intensity to the average intensity, computed locally around the pixel neighborhood, is below the threshold parameter t0. The size of the kernel defining the neighborhood over which the local mean intensity is computed is parameterized by n0. We set t0 = 0.5. Consequently, all pixels with intensity value less than half the mean intensity in their respective neighborhoods are replaced with zeros, allowing us to identify valleys between cells. By varying n0 we can identify valleys and gaps of different widths. UNSEG performs LMSF filtering for n0 = 5, 10, 20, 40. If \({\hat{I}}_{i}(x,y)=0\) for any value of n0, then the final pixel value is set to 0 and assigned to be background in the global mask, \({M}_{i}^{g}(x,y)\). Thus, LMSF allows us to capture valleys of different widths. The values of n0 are user-defined and can be optimized according to complexity of individual images.

We refine the global mask \({M}_{i}^{g}(x,y)\) by reassigning those pixels currently in the foreground that have a priori probability \({P}_{i}^{e}(x,y) \, < \, {p}_{i}\), i = 1, 2 to the background. This refinement is particularly useful for images with highly heterogeneous tissue with varying marker expression. The threshold value pi should be small and by default is set to 0.01.

An example of a priori global mask is presented in Supplementary Fig. 15.

A priori local mask: Complementing \({M}_{i}^{g}(x,y)\), we next compute \({M}_{i}^{l}(x,y)\), the a priori local mask corresponding to image Ii(xy). \({M}_{i}^{l}(x,y)\) captures the local peculiarities of the compartments—nuclei or cell membranes – associated with their local structure and morphology.

First, Ii(xy) is filtered by applying a single iteration of gradient adaptive smoothing (GAS)45,66,

$${\tilde{I}}_{i}(x,y) = \frac{1}{{N}_{i}(x,y)}{\sum}_{\xi =-1}^{1}{\sum }_{\eta =-1}^{1}{I}_{i}(x+\xi ,y+\eta ){w}_{i}(x+\xi ,y+\eta ),\,\, \\ {{{\rm{where}}}}\quad {N}_{i}(x,y) = {\sum}_{\xi =-1}^{1}{\sum}_{\eta =-1}^{1}{w}_{i}(x+\xi ,y+\eta ),\\ {w}_{i}(x,y) = \exp \left[-\frac{{d}_{i}^{2}(x,y)}{2{k}_{0}^{2}}\right],\quad {d}_{i}(x,y)=\sqrt{{\left[\frac{\partial {I}_{i}(x,y)}{\partial x}\right]}^{2}+{\left[\frac{\partial {I}_{i}(x,y)}{\partial y}\right]}^{2}}.$$
(2)

This GAS-filtered image, \({\tilde{I}}_{i}(x,y)\) smooths the original image, Ii(xy), while preserving the local variations within and around cell nuclei and membranes. The local neighborhood is defined via a 3 × 3 kernel, wi, that also performs variation preserving smoothing. Here, variation is quantified via computation of local gradient and the degree of smoothing is controlled by k0, which is an algorithmic parameter. Its default setting is 1.

To obtain \({M}_{i}^{l}(x,y)\), a two-level, local Otsu is applied to \({\tilde{I}}_{i}(x,y)\) based on disk kernel whose radius r0 is an algorithmic parameter. Its default setting is 5 pixels. The Otsu output faithfully captures the local structure but is also noisy, particularly in image regions where no tissue samples are present and the gradients are being computed on the background noise. As \({M}_{i}^{g}(x,y)\) can accurately identify such background, the output of the local Otsu is restricted to where \({M}_{i}^{g}(x,y)=1\), resulting in local foreground mask \({M}_{i}^{l}(x,y)\).

An example of a priori local mask is presented in Supplementary Fig. 15.

Processing stage 2: computing a posteriori nucleus and cell membrane masks

The a priori global and local binary masks are computed independently for both channels. As a result, non-negligible probability exists for a pixel to be classified as being both in the nucleus and cell membrane. This is particularly true in tissue regions with crowded cells, or when the nature of the tissue section is such that cell membrane is laying over the nucleus. This processing stage reconciles these overlaps and generates a posteriori global and local nucleus and cell membrane masks.

Contrast-based likelihood function: Human visual perception of cell membranes and nuclei is based on inherent contrast between the two channels. Usually this contrast is visualized via imbuing the individual intensity-based channels with colors. Here, we adapt this notion to compute a visual contrast function based on nucleus and cell membrane marker-specific expression to quantify the likelihood of pixel belonging to either the nucleus or cell membrane. The first step computes the contrast function for each pixel in the a priori local mask as follows,

$${L}_{0}(x,y)=\left\{\begin{array}{ll}\frac{{I}_{2}(x,y)-{I}_{1}(x,y)}{{I}_{2}(x,y)+{I}_{1}(x,y)},\quad &{{{\bf{if}}}}\,\,{I}_{1}(x,y) \, > \, {i}_{1}\,\,{{{\bf{or}}}}\,\,{I}_{2}(x,y) \, > \, {i}_{2}\\ 0, \hfill \quad &{{{\bf{otherwise}}}}\end{array}\right.,$$

where \({i}_{i}={\min }_{(x,y)\in {{{\Omega }}}_{i}}{I}_{i}(x,y),\,{{{\Omega }}}_{i}=\left\{(x,y)\in {{\Omega }}\,| \,{M}_{i}^{l}(x,y)=1\right\},\,i=1,2\). The second step ensures that this function is consistent with the a priori global mask for each channel, resulting in the contrast-based likelihood function,

$${{{\bf{L}}}}(x,y)=\left\{\begin{array}{ll}{L}_{0}(x,y),\quad &{{{\bf{if}}}}\,\,{L}_{0}(x,y) \, < \, 0\,{{{\bf{and}}}}\,{M}_{1}^{g}(x,y)=1\,\,{{{\bf{or}}}}\,\,{L}_{0}(x,y) \, > \, 0\,{{{\bf{and}}}}\,{M}_{2}^{g}(x,y)=1\\ 0, \hfill \quad &{{{\bf{otherwise}}}} \hfill \end{array}\right..$$
(3)

L(xy) is bounded between [ − 1, 1], with the contrast of  − 1 indicating the strong likelihood that the pixel (xy) belongs to the nucleus, while 1 indicating the pixel most likely belongs to the cell membrane. Two examples of likelihood function are presented in Fig. 1 and Supplementary Fig. 15.

A posteriori global mask: We combine the a priori probability with the contrast-based likelihood function to compute the a posteriori global mask Mg(xy), such that Mg : Ω → {0, 1, 2}, where the labels 0, 1, and 2 correspond to the background, nuclei, and cell membranes, respectively. However, before performing this combination, we enhance \({P}_{i}^{e}(x,y)\) as follows,

$$\begin{array}{r}{P}_{i}^{s}(x,y)=\left\{\begin{array}{ll}1,\quad &{{{\bf{if}}}}\,\,{M}_{i}^{l}(x,y)=1\\ {P}_{i}^{e}(x,y),\quad &{{{\bf{otherwise}}}}\end{array}\right.,\end{array}$$
(4)

where i = 1, 2. This enhancement, saturates \({P}_{i}^{e}(x,y)\) —that is, sets \({P}_{i}^{e}(x,y)=1\) —where the a priori local mask is 1. It ensures graceful performance of our algorithm in the global context, when computing a posteriori global mask Mg(xy). We then compute the a posteriori global probability \({P}_{i}^{g}(x,y)\), via \({P}_{i}^{s}(x,y)\)-weighted convex combination of the likelihood and a priori belief,

$$\begin{array}{r}{P}_{1}^{g}(x,y)=\left\{\begin{array}{ll}{P}_{1}^{s}(x,y)+\left(1-{P}_{1}^{s}(x,y)\right)\,| {{{\bf{L}}}}(x,y)| ,\quad &{{{\bf{if}}}}\,\,{{{\bf{L}}}}(x,y) \, < \, 0\\ 0, \hfill \quad &{{{\bf{otherwise}}}}\end{array}\right.,\\ {P}_{2}^{g}(x,y)=\left\{\begin{array}{ll}{P}_{2}^{s}(x,y)+\left(1-{P}_{2}^{s}(x,y)\right)\,| {{{\bf{L}}}}(x,y)| ,\quad &{{{\bf{if}}}}\,\,{{{\bf{L}}}}(x,y) \, > \, 0\\ 0, \hfill \quad &{{{\bf{otherwise}}}}\end{array}\right..\end{array}$$
(5)

The final posterior global mask is obtained by either applying k-means clustering, with k = 3, or argmax operation45 on \({P}_{i}^{g}(x,y)\), i = 1, 2 (Eq. (5)) to compute Mg(xy). The default setting is argmax. We note that k-means (or argmax) is performed under the constraint that pixel (xy) Ω is assigned to the common background if both global probabilities have zeros values, i.e., \({P}_{i}^{g}(x,y)=0\), i = 1, 2. Examples of the a posteriori global mask are presented in Fig. 1 and Supplementary Fig. 15.

A posteriori local mask: We define the a posteriori local mask, Ml: Ω → {0, 1, 2}, simply by restricting the a priori probability \({P}_{i}^{e}(x,y)\) to the local mask \({M}_{i}^{l}(x,y)\),

$${P}_{i}^{l}(x,y)=\left\{\begin{array}{ll}{P}_{i}^{e}(x,y),\quad &{{{\bf{if}}}}\,\,{M}_{i}^{l}(x,y)=1\\ 0,\quad &{{{\bf{otherwise}}}}\end{array}\right.,$$
(6)

where i = 1, 2. This restriction allows us to optimally capture the local a posteriori structure of the nuclei and cell membranes in a self-consistent manner.

Similar to computing the a posteriori global mask, we either apply k-means clustering or argmax (default setting) operation on \({P}_{i}^{l}(x,y)\), i = 1, 2 (Eq. (6)) to obtain the a posteriori local mask Ml(xy). As mentioned above for the a posteriori global mask, the same constraint for the common background is also applied here. Examples are presented in Fig. 1 and Supplementary Fig. 15.

Processing stage 3: nucleus segmentation

The a posteriori global and local masks provide a semantic segmentation of image pixels comprising the tissue into nuclei and cell membranes. This, and the following processing stages are designed to obtain every instance of individual nucleus and its cell from the semantic segmentation of the tissue. Specifically, in this stage, we first segment all nuclei, and use them as a basis to identify their cells in the next stage. These steps ensure that the nucleus and cell segmentations are internally consistent with the latter always bounding the former.

To segment nuclei we process the a posteriori global mask for the nuclei, \({{{{\bf{M}}}}}_{nuc}^{g}(x,y):= {{{{\bf{M}}}}}^{g}(x,y){| }_{{{{\rm{label}}}} = 1}\) with help from the a posteriori local mask for the cell membrane, \({{{{\bf{M}}}}}_{cell}^{l}(x,y):= {{{{\bf{M}}}}}^{l}(x,y){| }_{{{{\rm{label}}}} = 2}\). Particular examples of these two masks are presented in Supplementary Fig. 15.

Convexity analysis: Nucleus segmentation begins with convex analysis of every connected component of \({{{{\bf{M}}}}}_{nuc}^{g}(x,y)\). As a part of this analysis, we compute area and the steepest concave point (SCP)37 of every component. SCP is a boundary point of the component with the largest deviation from its convex hull. The area parameter allows us to filter out exceedingly small objects that are not nuclei, while SCP helps us determine if the component is nucleus cluster (NC) or not. The component is kept for further analysis only if the area of the component exceeds a0. Otherwise it is removed. Each component that passes the area threshold, is either classified as an NC or non-NC depending on whether SCP is above or below the threshold d0. Both a0—default set to 20 pixels—and d0—default value is 4 pixels—are the primary algorithm parameters (Supplementary Table 2). The non-NC components are statistically analyzed to obtain the initial segmentation for all individual nuclei, along with a small component (SC) list comprising of small convex objects that we are less confident about being nuclei.

Convexity analysis of \({{{{\bf{M}}}}}_{nuc}^{g}(x,y)\), is illustrated in Supplementary Fig. 15.

Perturbed watershed and virtual cuts: We process the NC components using perturbed watershed (PW) and virtual cut (VC) algorithms that we have developed. Their goal is to partition the NC into individual nuclei.

PW steps are illustrated in Fig. 2. Briefly, the NC component mask (Fig. 2d) is first modified by \({{{{\bf{M}}}}}_{cell}^{l}\) (Fig. 2e). Specifically, cuts are introduced in the NC component mask where the local cell membrane is indicated in the \({{{{\bf{M}}}}}_{cell}^{l}\) spatially corresponding to the NC component (Fig. 2f). We next apply distance transform (DT) on the modified NC component and use the resulting DT image (Fig. 2g) to compute davr—the average of all non-zero DT values in the DT image. davr is used to threshold the distance transform to identify n sub-regions with large DT values indicative of interior of the sub-regions—putative nuclei—making up the NC splitting (Fig. 2h). Within every sub-region we identify a pixel with the maximal distance-transform value as the watershed seed point (marker) for that sub-region. We perform watershed segmentation of NC based on these n seed points to obtain our initial estimate of the nuclei comprising the NC (Fig. 2i). If these estimates are correct, then perturbing the markers does not affect segmentation of the NC. However, if the estimates are incorrect, then sub-region estimates are not stable on perturbation. We exploit this perturbation-based stability to identify the correct segmentation of the NC. Specifically, we perturb the marker location and recompute the watershed-based segmentation. The perturbations are implemented by shifting each watershed marker location sequentially in the horizontal and vertical directions by ± davr, resulting in four perturbations: (xj ± davr, yj) and (xjyj ± davr) with j = 1, …, n (Fig. 2j–m). Here, stands for the floor function. If during any of the four scenarios, the size of any of the n putative nuclei collapses to a point object with an area size bounded to a few pixels (Fig. 2j, l, m), we deem them as unstable and remove their corresponding seed points from the list of n seed points, and recompute the watershed-based segmentation with the remaining seed points (Fig. 2n). If the segmentation results remain stable for all four shifts, then the estimate is considered correct. To ensure that each of the segmented sub-regions are indeed nuclei and not smaller NCs, we recursively perform convexity analysis and PW on each sub-region. An example of this recursion is illustrated in Supplementary Fig. 16.

The above recursive segmentation of an NC can sometimes result in a specific pathological situation, where the convex analysis identifies a sub-region as an NC, but PW does not segment it into sub-regions. For this specific scenario, we have developed the virtual cuts (VC) approach, where a virtual cut is defined through the SCP of the NC component mask to identify virtual sub-regions. We use “virtual” to emphasize that this cut and the resulting sub-regions are only used to identify their respective watershed seed points based on which we perform the actual segmentation. The hypothesis driving the VC method is based on the idea of PW method: although the locations of the respective watershed markers identified using virtual cuts might not exactly coincide with their true locations, they do represent a perturbed version of the true location. Thus, they yield stable and accurate segmentation into the two sub-regions. These sub-regions follow the same recursive logic of the PW method detailed above. VC method is illustrated in Supplementary Fig. 15.

Finally, we process the small components in the SC list in a context-dependent manner, with small isolated SCs included in the final nucleus segmentation result. Multiple examples of nucleus segmentation are presented in Figs. 1, 4, and 6 as well as in Supplementary Figs. 7–10, 12, 13, and 15, where the contours of nuclei are outlined in white.

Processing stage 4: cell segmentation

We segment cells via the joint use of a posteriori global mask for the cell membranes \({{{{\bf{M}}}}}_{cell}^{g}(x,y):= {{{{\bf{M}}}}}^{g}(x,y){| }_{{{{\rm{label}}}} = 2}\) and the segmented nuclei.

We begin by initializing the segmented cell mask as the segmented nucleus mask. The cell mask is then expanded till its boundary coincides with that of the closest cell membrane around it. It is possible that the cell membrane marker used for cell segmentation is not expressed by all cells. Therefore, for cells without any cell membrane marker expression, the nucleus mask is morphologically dilated a small amount u0 (1–10 pixels) to obtain an estimate of the cell membrane. u0 with its 9 pixels default value is one more algorithm parameter (Supplementary Table 2). In the opposite scenario, where due to the nature of the tissue section, a cell is present with a membrane but without a nucleus, we utilize \({{{{\bf{M}}}}}_{cell}^{g}\). Specifically, the skeleton of \({{{{\bf{M}}}}}_{cell}^{g}\) is computed and subtracted from \({{{{\bf{M}}}}}_{cell}^{g}\) itself. This operation naturally reveals the cell membrane contour within \({{{{\bf{M}}}}}_{cell}^{g}\), which we identify via computing the Euler number of its connected component. When the Euler number is zero and the area of the connected component exceeds half of the average area of nuclei, the connected component is identified as the segmented cell. Examples of cell segmentation are presented in Figs. 1, 4, and 6 as well as in Supplementary Figs. 4–10, 12, 13, and 15, where the contours of the segmented cells are outlined in green.

Performance evaluation

To evaluate UNSEG performance and compare it with Cellpose25 and Mesmer26 results, we used the F1 score (or Dice coefficient) as the accuracy metric46. To compute the F1 score, we first estimated the true positive (TP), false positive (FP) and false negative (FN) values by comparing the predicted segmentation with the expert annotated ground truth and using intersection over union (IoU) as the threshold value46. The IoU threshold, ranging from 0 to 1, indicates how much of an overlap between the predicted segmentation and ground truth is considered a match, which is then used to estimate the number of TP, FP, and FN segmented objects. The F1 score is then given by

$${F}_{1}=\frac{2\,TP}{2\,TP+FP+FN}.$$
(7)

Varying the IoU threshold from 0 to 1, gives us the corresponding F1 curve as a function of the IoU threshold.

Statistics and reproducibility

Statistical robustness of UNSEG, and its reproducibility has been exhaustively tested on the GIT dataset and three publicly available datasets47,48.