Background & Summary

Mycelium semantic segmentation represents a transformative approach in fungal research, offering unprecedented capabilities for large-scale analysis of hyphal network architectures. By accurately delineating mycelium boundaries, this technology enables the quantification of growth patterns that were previously unassessable by conventional methods1,2. The ability to capture subtle morphological variations empowers researchers to investigate strain-specific characteristics, monitor environmental adaptation processes, and evaluate physiological responses to various stimuli3,4,5. These advancements are driving innovation across disciplines, from ecological studies of fungal communities to the development of fungal-based biotechnological applications in medicine and agriculture. Moreover, the precision of semantic segmentation is particularly valuable for establishing correlations between morphological features and functional traits6, thereby deepening our understanding of fungal biology and its practical applications.

However, the field currently faces two major bottlenecks. First, the lack of public benchmark datasets has created a critical resource gap, slowing down the development of segmentation algorithm and making it extremely difficult to reproduce methods or compare performance across research groups7,8,9,10,11. Second, the low contrast and high complexity of mycelium edges present dual challenges for both annotation and segmentation. Despite advances in deep learning for general image segmentation12,13,14, the delicate and intertwined nature of hyphal edges and their blurred boundaries with the culture medium often cause traditional models to either under-segment, missing new hyphae, or over-segment, including unwanted noise. Manual annotation, which requires expert knowledge to distinguish hyphae from the background pixel by pixel, takes three to five times longer than labeling conventional biological images. This cycle of scarce data and edge-processing difficulties has limited the translation of algorithms from laboratory settings to real-world applications.

To address these challenges, we developed and released the first large-scale benchmark dataset for mycelium semantic segmentation, referred as to MyceliumSeg. It comprises 20,176 RGB mycelium images from four fungal species: Ganoderma lucidum, Ganoderma sinense, Trametes spp., and Pleurotus ostreatus. Images of mycelium were acquired under diverse culture conditions and span the full growth cycle from inoculation to full petri-dish coverage, capturing varied textures, colors, and morphological patterns. These images were captured using a self-developed and commercialized FPheno2000 imaging device15, which employs a dual-light system: a 360° shadowless top light eliminates optical interference, while a bottom light enhances mycelial edge contrast and three-dimensional structure. This configuration overcomes the common issue of low edge contrast in traditional imaging, generating high-resolution images with clearer boundaries. The images accurately capture subtle morphological differences, such as the faint edges of newly grown hyphae, providing a solid foundation for pixel-level annotation and deep learning model training.

For data annotation, pixel-level annotations with fine edge labeling were provided for 567 representative samples covering the four fungal species. A multi-dimensional precise annotation framework was introduced to enhance annotation quality, featuring cross-expert labeling guidelines, conflict-detecting algorithms, and expert quality control teams to ensure high-quality, reproducible datasets. We tested three mainstream semantic segmentation algorithms including U-Net16, DeepLabv317, and SegFormer18 on this dataset. The results systematically revealed technical bottlenecks in hyphal edge segmentation: classic metrics such as mIoU and boundary-aware metrics like Boundary IoU19, the 95th percentile of Hausdorff distance (HD95)20,21, and Average Symmetric Surface Distance (ASSD)22, highlighted the unique challenges of edge processing in fungal image analysis. This benchmark offers quantifiable ways to compare algorithm performance and identifies edge segmentation as a core challenge in fungal semantic analysis.

The dataset and benchmark system established in this study offer the first end-to-end solution for mycelium semantic segmentation, spanning data acquisition, fine-grained annotation, and algorithm evaluation. Their value lies not only in the scale of 20,176 images but also in the precise edge labeling that supports various algorithmic paradigms (fully supervised, semi-supervised, and self-supervised), particularly for edge-refined segmentation. In the future, this resource will facilitate applications such as automatically analyzing fungal phenotypes and monitoring mycelial states in real time during fermentation. It will also speed up the combination of deep learning and fungal research across different fields.

Methods

In this section, we delve into the details of dataset construction and elaborate on the specific methods for data collection and the mycelium annotation. These methods are aimed at constructing a large-scale, high-quality mycelium dataset with pixel-level annotations and diverse data, so as to meet the research needs of precise segmentation.

Data collection

We collected 20,176 mycelium images with distinctive edge morphology. The samples spanning four fungal species, were stored at 4 °C in sawdust tubes, and were incubated in the dark on 90-mm Petri dishes with malt yeast glucose medium (MYG) or potato dextrose agar (PDA) culture medium in different temperatures (see Table 1). Image of these samples span diverse morphological characteristics, including growth stages, sclerotium colors, hyphal features (Fig. 1). The mycelium images were acquired using a mature, commercial data acquisition system named FPheno2000 developed by BORUIYUAN TECHNICAL (https://www.brytech.cn/). Following data acquisition process in Li et al.15, we periodically placed mycelial petri dishes at a fixed position for image acquisition. Images with a resolution of 4,608 × 3,456 pixels are collected and saved in JPG format.

Table 1 Summary of mycelium culture conditions.
Fig. 1
figure 1

Visualization of morphological diversity within MyceliumSeg. Column 1 shows variations in sclerotium color during the activation and germination stage. Column 2 presents representative morphologies from the three subsequent growth stages (hyphal expansion, network building, and maturation). Columns 3–5 highlight diverse visual characteristics observed in the network building and maturation stages.

Data annotation

Due to inherently mycelium semi-transparent edges and low-contrast morphological features, precise pixel-level annotation and inter-annotator disagreements pose a significant concern. To achieve this, we proposed the mycelium annotation process comprising three steps: (a) a multi-blind refined annotation for manual error alleviation and pixel-level accuracy; (b) a disagreement disposal protocol containing a disagreement quantification method and disagreement solution; (c) expert review process ensuring the quality of the final annotation results (Fig. 2). Following this procedure, we produced 567 annotations, requiring a total of 37 person-days of manual effort. Representative annotation results are illustrated in Fig. 3.

Fig. 2
figure 2

Annotation workflow. (a) Multi-blind refined annotation. Each image is first labelled independently by multiple annotators who cannot see one another’s work. They draw a coarse contour of each mycelium sample and then refine the boundary pixel by pixel. (b) Disagreement disposal protocol consists of disagreement quantification method and solution. Pixel-level mismatches among multiple refined annotations are quantified. Different disagreement-handling solutions are applied to each sample based on the quantified disagreement results. (c) Expert review process is used to ensure the annotation quality.

Fig. 3
figure 3

Overview of the raw image and its annotation at global and local scales. (a) Original image. (b) Edge map of the full annotation on the original image with the magnified region indicated. (c) Magnified view of the local original image; (d) Corresponding magnified view of the annotation edge.

Multi-blind refined annotation

In the annotation process, multiple annotators independently label the same image without seeing other annotators’ work. Specifically, only the mycelium growing around the sclerotium is considered as foreground, and the outermost fine edge of the mycelium is defined as the boundary of ground truth mask. Internal structural details or void regions of the mycelium are ignored. Other regions, including Petri dishes and culture medium are uniformly treated as background. Multi-blind annotation is employed to alleviate impact of potential visual confusion and blind spots caused by mycelium’s weak features in single-annotator settings. In addition, a refinement operation, dedicated to label edge details after outlining the entire hyphal contour, is integrated into annotation process.

Disagreement disposal protocol

Disagreement is inevitable in the mycelium annotation with multiple annotators. We adopted a protocol combining disagreement quantification method and disagreement solution strategy. The disagreement quantification method comprises two parts. The first part is Mutual Average Symmetric Surface Distance (mASSD). mASSD quantifies the disagreement between a sample’s designated annotation and all other annotations of that sample. The second part is sample level disagreement, which is defined as the sum of the mASSD values across all annotations of the same sample, and this total serves as an indicator of that sample’s annotation difficulty.

The metric mASSD is based on ASSD, which is used to measures the average bidirectional distance between two contours. ASSD is calculated by sampling points along one contour, finding the nearest Euclidean distance from each point to the other contour, and averaging all distances22. An increased ASSD value between two contours signifies a correspondingly greater spatial divergence between them. As shown in Eq. 1, \({{ASSD}}_{\left(i,j\right),k}\) represents the ASSD between annotators i and j on sample k:

$$ASS{D}_{(i,j),k}=ASSD({S}_{i,k},{S}_{j,k})=\frac{1}{|{S}_{i,k}|+|{S}_{j,k}|}({\sum }_{y\in {S}_{i,k}}{{\min }}_{x\in {S}_{j,k}}\Vert x-y\Vert +{\sum }_{y\in {S}_{j,k}}{{\min }}_{x\in {S}_{i,k}}\Vert x-y\Vert ),$$
(1)

where \({S}_{i,k}\) denotes point set of the k-th sample’s contour from annotator \(i\), and point in the set is represented by \(x\) and \(y\). Following all pairwise ASSDs have been obtained, the mASSD for a designated annotation is defined as the mean of its ASSD values to every other annotation of the same sample. \({{mASSD}}_{j,k}\) quantifies the average disagreement between the annotation of sample k produced by annotator \(j\) and the annotations of the same sample produced by all other annotators, i.e.

$${{mASSD}}_{j,k}=\frac{1}{\left|N-1\right|}{\sum }_{i\ne j}{{ASSD}}_{\left(i,j\right),k},(i=1,2,\ldots ,N),$$
(2)

where \(i\) is the sequence of annotator and \(N\) is the total annotators. Sample level disagreement, indicating annotation difficulty of a sample, is calculated as the sum of that sample’s mASSD values across all annotators (Eq. 3).

$${S{ample\; level\; Disagreement}}_{k}={\sum }_{j}{{mASSD}}_{j,k},(j=1,2,\ldots ,N)$$
(3)

After the disagreement quantification method, we assembled a collaborative panel combining with statistical analyses to resolve the disagreements. An interquartile range (IQR)–based outlier detection was applied to the distribution of sample level disagreement values to identify samples exhibiting elevated annotation discrepancies23. The panel would review the annotations of these samples to determine the necessity of re-annotation and would re-annotate together to ensure objective and accurate results. Moreover, for samples with disagreement scores in the normal range, the annotation with the lowest mASSD is chosen as the final annotation. This approach ensure that final annotation diverges minimally from all other annotations.

Expert review process

An expert team comprising mycologist and computer scientist reviewed and approved the annotations. If discrepancies or ambiguities remained, they would collaboratively re-annotate the data to ensure that these valuable cases are annotated with high precision and fully utilized.

Data Records

The dataset is accessible for download at Zenodo24. MyceliumSeg comprises five parts: ‘labeled-GL’, ‘labeled-GS_PO_TS’, ‘labeled-MYG_PDA_TEMP’, ‘unlabeled-GL’, and ‘unlabeled-GS_PO_TS’. The ‘labeled-GL’ folder comprises 507 labeled Ganoderma lucidum images, which is divided into two subfolders, 457 images for ‘trainset’ and 50 images for ‘testset’. The ‘labeled-GS_PO_TS’ folder comprises 30 labeled images of Ganoderma sinense, Trametes spp., and Pleurotus ostreatus. The image is equally divided into three subfolders: ‘GS’, ‘TS’, and ‘PO’. The ‘labeled-MYG_PDA_TEMP’ folder comprises 30 labeled images, equally split (10 each) among MYG-based medium (MYG), PDA-based medium (PDA), and 15 °C incubation (TEMP15), and is organized into the ‘MYG’, ‘PDA’, and ‘TEMP15’ subfolders. Each of these labeled subfolders further comprises an ‘image’ and a ‘mask’ folder: the ‘image’ folder stores raw images in ‘.jpg’ format, whereas the ‘mask’ folder holds the pixel-wise annotations in binary ‘.png’ files (0 for background, 1 for mycelium). Filenames are identical across the paired image and mask files. The ‘trainset’ contains files numbered from ‘00000001’ to ‘00000457’. The ‘testset’ contains files numbered from ‘00000458’ to ‘00000507’. The ‘GS’, ‘PO’, ‘TS’, ‘MYG’, ‘PDA’ and ‘TEMP15’ contains files numbered from ‘00018428’ to ‘00018437’, ‘00018438’ to ‘00018447’, ‘00018448’ to ‘00018457’, ‘00018458’ to ‘00018467’, ‘00018468’ to ‘00018477’ and ‘00018478’ to ‘00018487’, separately. The unlabeled data part consists of ‘unlabeled-GL’ and ‘unlabeled-GS_PO_TS’. The former part consists of seven subfolders, ‘unlabeled-GL1’ through ‘unlabeled-GL7’, which hold 17,920 Ganoderma lucidum original unlabeled ‘.jpg’ images with sequential filenames ranging from ‘00000508’ to ‘00018427’. The latter part ‘unlabeled-GS_PO_TS’ contains 1689 unlabeled images of Ganoderma sinense, Trametes spp., and Pleurotus ostreatus, with filenames consecutively numbered from ‘00018488’ to ‘00020176’.

Technical Validation

This section presents statistical analysis of the collected data from lifecycle, sclerotium and hyphal visual features. The disagreement distribution is demonstrated with boxplots. What’s more, the dataset is benchmarked across several seminal deep learning-based segmentation architectures.

Data statistical analysis

MyceliumSeg provides image data that comprehensively span all stages of mycelial growth, showcasing the unique morphological diversity characteristic of each phase (Fig. 1).

Lifecycle analysis

Table 2 provides statistics on the mycelial growth stages. Since the images were acquired throughout the mycelial cultivation process, the relative frequency of data in each growth stage proportionally reflects the temporal duration of those phases. The majority of data (10,894, 53.99%) were acquired during hyphal network construction stage, followed by the next largest share (4,426, 21.94%) collected in mycelial maturation transition stage. Data from these two stages exhibit pronounced structural and color visual features. In contrast, the smallest subset of images (2,204, 10.92%) was obtained during sclerotium activation and germination stage, characterized primarily by color‐based visual attributes. The remaining images (2,652, 13.14%) correspond to primary hyphal expansion stage, whose visual characteristics lack distinctive analytical significance25,26,27.

Table 2 Distribution of mycelial growth stage frequencies in the dataset.

Sclerotium analysis

In the sclerotium activation and germination stage (2,204, 10.92%), the visual features are reflected in sclerotial color. Overall, 66.43% of sclerotium appear yellow (see Table 3). Among these, 50.91% are pure yellow and 15.52% are a yellow and black blend. The remaining sclerotium are 14.38% gray, 13.75% brown, and 5.44% black. These images illustrate the diversity of sclerotium color patterns prior to hyphal growth.

Table 3 Distribution of sclerotium color frequencies in the dataset.

Hyphal feature analysis

In hyphal network construction (10,894, 53.99%) and mycelial maturation transition stage (4,426, 21.94%), mycelium display distinctive structural or color signatures. Table 4 lists the visual features present in the dataset and describes them, while Table 5 summarizes their distribution across the 15,320 images. 7,295 images (47.62%) exhibit a uniform density distribution, whereas 3,248 images (21.20%) show concentric density zonation. Centripetal densification is evident in 1,283 images (8.37%), and peripheral densification in 557 images (3.64%). Edge morphology statistics reveal 1,169 mycelium (7.63%) with irregular edge. Less frequent yet informative traits include hyphal pigmentation (441), heterogeneous density distribution (400), wrinkling (297), rhizomorph (275), spiral stratification (238), and internal concavity (117), each accounting for under 3% of the dataset.

Table 4 Mycelial visual features and descriptions.
Table 5 Distribution of hyphal characteristic frequency in the dataset.

Evaluation metric

In the design of evaluation system, dual considerations were incorporated: first, accounting for the methodological significance of edge segmentation precision in mycelium segmentation research. Second, addressing the limitations of classical segmentation metrics, which exhibit heightened sensitivity to mask interior regions while demonstrating insufficient sensitivity to edge segmentation accuracy.

The classical segmentation metrics used to benchmark the model are the F1-score (Eq. 5) and Intersection-over-Union (IoU) (Eq. 8). Because the dataset can be foreground-sparse, we report these metrics for the foreground class by default, i.e., the mycelium. F1-score of mycelium is defined as follows:

$${Mycelium}\,{F}_{1}=2\times \frac{{{Precision}}_{f}\times {{Recall}}_{f}}{{{Precision}}_{f}+{{Recall}}_{f}},$$
(5)

where Precisionf is the proportion of truly foreground pixels among all pixels predicted as foreground, and Recallf is the proportion of ground-truth foreground pixels that are correctly identified by the model. The Precisionf and Recallf are defined as follows:

$${{Precision}}_{f}=\frac{{TP}}{{TP}+{FP}},$$
(6)
$${{Recall}}_{f}=\frac{{TP}}{{TP}+{FN}},$$
(7)

where true positive (TP), false positive (FP) and false negative (FN) are represent the number of foreground pixels predicted as foreground, background pixels predicted as foreground and foreground pixels predicted as background. With these quantities, IoU of mycelium is expressed in Eq. (8):

$${Mycelium\; IoU}=\frac{{TP}}{{TP}+{FP}+{FN}}$$
(8)

To resolve the limitation of classic metrics, edge accuracy quantification metrics including Boundary IoU19, HD9520,21 and ASSD22 (Eqs. 913) were systematically integrated to enable precise evaluation of edge segmentation performance from different aspects. Boundary IoU calculates the intersection-over-union for mask pixels within a certain distance from the corresponding ground truth or prediction boundary contours, i.e.

$${Boundary\; IoU}(G,P)=\frac{\left|\left({G}_{d}\cap G\right)\cap \left({P}_{d}\cap P\right)\right|}{\left|\left({G}_{d}\cap G\right)\cup \left({P}_{d}\cap P\right)\right|},$$
(9)

where G is ground truth binary mask, P is prediction binary mask, and boundary regions Gd and Pd are the sets of all pixels within d pixels distance from the ground truth and prediction contours respectively. Boundary dilation ratio is the hyper-parameter that specifies the proportion of d relative to the image diagonal, and a smaller ratio imposes a stricter criterion on boundary segmentation. HD95 and ASSD are used to provide comprehensive evaluation for the results of edge segmentation from the view of the similarity between two masks. HD95 is used for measuring the impact of outliers or noise. It is defined as:

$${HD}95\left(G,P\right)={\max }\left({{HD}95}_{{GP}},{{HD}95}_{{PG}}\right),$$
(10)
$${{HD}95}_{{GP}}={{percentile}}_{95}(\mathop{{\min }}\limits_{b\in S(P)}\parallel a-b\parallel ),\forall \,a\in S(G),$$
(11)
$${{HD}95}_{{PG}}={{percentile}}_{95}\left(\mathop{{\min }}\limits_{a\in S(G)}\parallel b-a\parallel \right),\forall \,b\in S(P),$$
(12)

where S(\(\cdot \)) represents the set of points on the surface of mask, ||·|| denotes the Euclidean distance between two points, and percentile95 is the function returning the 95th percentile of distances. ASSD is a metric used to measure the average distance between the surfaces of ground truth and prediction masks, and it is mathematically formulated as:

$${ASSD}\left(G,P\right)=\frac{1}{|{S}(G)|+|{S}(P)|}\left({\sum }_{a\in S(G)}\mathop{{\min }}\limits_{b\in S(P)}\parallel a-b\parallel +{\sum }_{b\in S(P)}\mathop{{\min }}\limits_{a\in S(G)}\parallel b-a\parallel \right)$$
(13)

Implementation details

The main goals of the experimental design on MyceliumSeg dataset are two folds. First, we aim to evaluate the performance of representative segmentation baseline on the dataset for boundary-aware segmentation measurement. Second, we aim to evaluate the robustness of model in mycelium boundary-aware segmentation under different fungal species and culture conditions. By achieving these, we hope to establish a benchmark for future work and promote further research in this field.

To cover both CNN- and Transformer-based architectures, we benchmarked three representative segmentation baselines, U-Net16, DeepLabv317, and SegFormer18. For a fair comparison, we used AdamW281 = 0.9, β2 = 0.999) as the base optimizer with batch size of 4 per GPU for all models but allowed architecture-specific settings. We largely retained the default hyper-parameter settings in MMSegmentation29. For CNN-based architectures, vanilla U-Net and DeepLabv3 with ResNet-50 backbone were initiated with a learning rate of 2e-4 and a weight decay of 1e-5. The poly learning strategy with power of 0.9 was adopted. For Transformer-based architecture, SegFormer with MiT-B0 backbone adopted a lower initial learning rate of 6e-5, a higher weight decay of 1e-2, and a 3,000-iteration linear warm-up (warmup ratio = 10e-6) before switching to a polynomial schedule with power of 1.0. We trained all models for 50,000 iterations and report the last performance measured in mycelium IoU, mycelium F1-score, HD95, ASSD and Boundary IoU. Boundary dilation ratio of Boundary IoU was fixed at 0.001 to impose a more stringent criterion on edge segmentation. All experiments are implemented by PyTorch30 based on MMSegmentation using four NVIDIA 4090 GPUs with 24 G memory.

The baseline models were constructed via fully supervised learning using 507 annotated images of Ganoderma lucidum (457 for training and 50 for testing). The model with the best performance was selected for multi-dimensional robustness evaluation. For the cross-species dimension, the model was directly applied to images of Ganoderma sinense, Pleurotus ostreatus, and Trametes spp. (10 images per species) for inference to assess its robustness, respectively. For the temperature dimension, model inference tests were conducted on 10 images of Ganoderma lucidum cultured at 15 °C and the performance of baseline was referred as the result of 25 °C. For the culture medium dimension, model inference tests were conducted on 10 images of Ganoderma lucidum grown on MYG plates and 10 images of Trametes spp. grown on PDA plates.

Disagreement solution

We analyzed the distributions of the disagreement-related metrics and presented them in box plots accordingly to assess the consistency of different annotators’ results. Any instances with significant disagreement would be addressed to ensure annotation quality. The annotation disagreements among annotators, two computer science researchers and an externally contracted annotator, were quantified by analyzing the distributions of mASSD and sample level disagreement values.

The distributions of the disagreement-related metrics are presented in Fig. 4. In Fig. 4(a), the ASSD values between annotator 1 and annotator 2 are the lowest among all annotator pairs, indicating the highest level of agreement. In Fig. 4(b), annotator 1 achieves the lowest mASSD value, reflecting minimal relative disagreement with all other annotators. Additionally, guided by the sample level disagreement metric, a subset of high-disagreement, challenging samples was identified for collaborative annotation adjustments or re-annotating.

Fig. 4
figure 4

Distribution chart of disagreement related metrics. 1, 2, and 3 denote annotator indices corresponding respectively to computer researchers experienced in mycelium cultivation, computer researchers without cultivation experience, and outsourced personnel. (a) Distribution of ASSDs. (b) Distribution of mASSD and sample-level disagreements.

Benchmark evaluation

Table 6 presents the test results of three segmentation models, UNet, DeepLabv3, and SegFormer, after they underwent supervised training using the trainset comprising 457 annotated images. While all three algorithms demonstrate respectable performance in global segmentation metrics such as Mycelium F1-score and Mycelium IoU (all scores exceeding 84%), their performance in critical boundary-focused metrics, including Boundary IoU, HD95, and ASSD, was notably insufficient. Specifically, SegFormer achieves the highest score 28.60% of Boundary IoU, whereas U-Net and DeepLabv3 achieve 27.74% and 27.31%, respectively. The differences among the three models are minimal. The highest score indicates that SegFormer delivers finer edge segmentation than the other models, whereas the small margin suggests that the existing mainstream architectures remain inadequate for stringent fine-edge segmentation tasks. In contrast, DeepLabv3 outperforms U-Net and Segformer on the score of HD95 metric, achieving 63.53 compared with 139.34 and 75.95, respectively. The lowest HD95 score for DeepLabv3 indicates far less impact to complexity boundary outliers and local extreme noise, whereas the much higher score for U-Net reflects its limited ability to delineate fine boundaries under complexity edge features or noisy conditions. As for ASSD metric, SegFormer records 15.28, while U-Net and DeepLabv3 obtain 45.44 and 18.50, respectively. The lowest ASSD score indicates that SegFormer’s predicted masks achieve the greatest similarity to the ground truth and that SegFormer is better able to capture the geometric characteristics of the mycelium.

Table 6 The performance of various mainstream models.

The visualization in Fig. 5 qualitatively illustrates the quantitative trends reported in Table 6. In rows 1 and 2, where mycelium boundary is clear, all three models realize high Mycelium F1-score and IoU, and SegFormer achieves the lowest ASSD. Nevertheless, the Boundary IoU values of the three remain tightly clustered near 28% without following the tendency of ASSD. It demonstrates that although the ability of capturing geometric characteristics has improved with successive architectural updates, precise edge alignment has not yet benefited from that. Row 3 describes a sample with jagged, low-contrast borders. The visible drift in the predictions reflects their elevated HD95 scores. SegFormer lowers the score compared with U-Net, yet the value remains too high for fine-grained tasks. This outcome underscores the challenge posed by complex edges. Row 4 shows condensation in the Petri dish that creates mist-like noise, raising HD95 for model prediction and revealing their shared weakness under noise.

Fig. 5
figure 5

Visualization of baseline model predictions. Blue, red, and white indicate the predicted mask, the ground truth mask, and their overlap, respectively. (a) Original image. (b) Predicted mask overlaid on the original image. (c) Ground truth and prediction overlap overlaid on the original image. (d) Magnified crop of the original image. (e) Predicted mask overlaid on the magnified crop. (f) Ground truth and prediction overlap overlaid on the magnified crop. All panels except (a) and (d) are produced by blending the corresponding masks with the underlying image using partial transparency.

Robustness evaluation

Based on the benchmark results, we select SegFormer as the best performer for evaluating model robustness against species-related variations, different types of mycelium culture media, and varying culture temperatures. The model maintains consistently stable segmentation performance across species on Ganoderma sinense, Pleurotus ostreatus, and Trametes spp. As shown in Table 7, classic metrics (F1-score and IoU) exceed 92% for all three species. The results of Boundary IoU (from 24.50% to 11.50%) and ASSD (from 21.04 to 118.29) reveal the challenge of boundary-aware segmentation in various fungal species. Figure 6 illustrates model robustness under different types of mycelium culture medium, and varying culture temperatures. In Fig. 6(a), the model maintains robust performance with Mycelium F1-scores and Mycelium IoU both above 93% under 25 °C and 15 °C temperature settings. For boundary-aware metrics, the model performance show highly consistency, with Boundary IoU surrounding 29%, and ASSD value about 16 pixels across two temperature settings. In Fig. 6(b), Mycelium F1-score and Mycelium IoU remain stably above 92% under MYG and PDA culture media settings. The range, in Boundary IoU from 30.50% to 11.50% and in ASSD from 14.10 to 43.77, indicates that boundary aware segmentation under different culture conditions still has substantial room for improvement.

Table 7 The results of cross species robustness.
Fig. 6
figure 6

Culture condition robustness evaluation results. (a) Culture temperature. (b) Culture medium.

The accuracy of boundary segmentation is crucial for mycelium segmentation research, as it directly affects the quality of studies on core scientific issues, such as quantifying growth patterns, monitoring environmental adaptation, and evaluating physiological responses to different stimuli. In this field, small errors in segmentation boundaries can cause significant inaccuracies in subsequent quantitative analysis results. This highlights the fact that mycelium boundary segmentation poses a highly challenging task.

Usage Notes

The public release comprises two components: a dataset hosted on Zenodo24 and a code repository available on GitHub. As for the dataset, researchers could unzip the downloaded archives to obtain two parts data, labeled data and unlabeled data. The labeled data part consists of ‘labeled-GL.zip’, ‘labeled-GS_PO_TS.zip’ and ‘labeled-MYG_PDA_TEMP.zip’. ‘labeled-GL.zip’ contains ‘trainset’ and ‘testset’ subfolders, which can be used to reproduce the benchmark results or to train, infer, and test custom models. ‘labeled-GS_PO_TS.zip’ contains ‘GS’, ‘PO’ and ‘TS’ subfolders, which contains Ganoderma sinense, Pleurotus ostreatus and Trametes spp. labeled samples, separately. These can be used to cross-species robustness test. ‘labeled-MYG_PDA_TEMP.zip’ contains ‘MYG’, ‘PDA’ and ‘TEMP15’ subfolders, which contains mycelium samples cultured under three conditions: on MYG agar plates, on PDA agar plates, and at 15 °C, respectively. These can be used to environment robustness test. The unlabeled data part provides 19,609 additional images without annotations by eight subfolders. Seven of these named ‘unlabeled-GL1’ through ‘unlabeled-GL7’ provides 17,920 Ganoderma lucidum images. The remaining named ‘unlabeled-GS_PO_TS’ provides 1689 images of Ganoderma sinense, Pleurotus ostreatus and Trametes spp. sample. These enable the evaluation of semi-supervised or self-supervised methods for boundary segmentation and supporting various segmentation tasks for further mycelium research. As for the code repository, the repository includes: (a) the ‘requirements.txt’ file listing all Python dependencies in the form of ‘package =  = version’; (b) the ‘local_configs’ folder with default MMSegmentation model, dataset and schedule configurations; (c) the ‘mmseg’ folder that extends default MMSegmentation with customized evaluation metric function, ‘mmseg/core/evaluation/extra_metrics.py’, used in this study; (d) the ‘mycelium_model’ folder containing the dataset configuration file in the path ‘mycelium_model/dataset/EPA_mycelium.py’, and the model configuration files containing hyper-parameter and module settings for mainstream deep-learning models stored under ‘mycelium_model/model’; (e) two shell scripts, ‘script_train.sh’ and ‘script_inference.sh’, for training and inference, respectively. Before running the code, the ‘data_root’ variable in the dataset configuration and the paths in both train and inference scripts should be updated to match their local environment.