Introduction

Fungi are common in nature, but a weakened immune system increases vulnerability to fungal infections1. Superficial fungal infections can affect the skin, hair, and nails of humans, whereas invasive fungal diseases (IFDs) involve systemic infections impacting various organs2. Over the past 20 years, the incidence and mortality rates of IFDs have consistently increased worldwide3,4. The IFDs have a hidden onset, non-specific clinical manifestations, and are often masked by symptoms of underlying diseases5,6. The most severe problem clinicians currently face in terms of IFD diagnosis is the lack of rapid and efficient diagnostic techniques for IFDs, which hinders providing patients with the best treatment opportunity7.

The IFD diagnostic methods are mainly based on microscopy, culture, and histopathology. However, there are not enough skilled clinicians in the field of fungal microscopy, causing high workloads for clinicians and potential misinterpretation of diagnostic results. This problem is even more pronounced in rural hospitals, where inexperienced technicians might miss infections or produce false positives, which could further affect the selection of subsequent treatment plans.

Over the past decade, deep learning (DL) has demonstrated remarkable progress across multiple disciplines, leading to its widespread application in both research and clinical practice. The medical imaging field has particularly benefited from recent DL advancements, yielding significant improvements in image segmentation, object detection, and classification tasks8,9. Although substantial research has focused on pathological image analysis, fungal image analysis has remained an understudied area despite its clinical relevance. Many researchers, including Zieliński et al10, have used deep learning in fungal identification, classifying bright-field microscopic images of fungi. This approach minimizes the need for biochemical tests, speeds up identification, and reduces diagnostic costs. Koo et al.11 conducted automated detection of superficial fungal infections from bright field microscopic images with 40 × and 100 × magnifications using the YOLO v4 object detection model. It was demonstrated that this object detection model can accurately detect hyphae in microscopic images. Rahman et al.12 presented a pioneering study that employed deep convolutional neural networks (CNNs) for classifying 89 pathogenic fungal genera from bright-field microscopic images. The research team trained and compared the performance of different CNN architectures, enhancing the potential for rapid and precise identification of fungal species. Naama et al.13 employed a decision support system for pathologists when diagnosing cutaneous fungal infections using PAS and GMS stains, which improved both accuracy and diagnosis speed while reducing the pathologists’ workload. Shubhankar et al.14,15 introduced an innovative meta-learning-based deep learning architecture named MeFunX, which combined the CNN and XGBoost models. The MeFunX model achieved a 92.49% accuracy in the early detection of fungal infections from microscopic images, outperforming state-of-the-art models, including the VGG16, ResNet, and EfficientNet models. Yilmaz et al.16 demonstrated that the VGG16/InceptionV3 models, with approximately 96% accuracy, could outperform clinicians in automated fungal detection from KOH microscopy, providing faster and more accurate diagnoses.

However, most of the aforementioned studies have predominantly focused on non-superficial fungi and used bright-field images rather than fluorescence images for fungi detection. In recent years, fewer studies have been conducted on AI-based identification of superficial fungi from bright-field images, but they have not comprehensively analyzed the integrated recognition of fungal spores, hyphae, and mycelium in superficial fungi, which is crucial for reflecting real-world clinical application scenarios. In addition, these studies have not investigated the consistency of interpretation among clinicians.

This study introduces an advanced deep-learning framework that uses fluorescence fungal images and integrates the YOLOX and MobileNet V2 models to identify fungal spores and hyphae from clinical samples. Based on the results, the proposed dual-model framework has significant application value in clinical practice in terms of improving the accuracy and alleviating clinicians’ workload.

Materials and method

Materials

This study was conducted under the supervision of the Ethics Committee (KY2023-057) of Huashan Hospital, affiliated with Fudan University. Our study had adhered to the guidelines and principles stated in the’Declaration of Helsinki’. Written informed consent was obtained from the patients for the examination of their samples and the use of their clinical data. The clinical samples of superficial fungi prepared for fluorescence images were mainly collected from February 2023 to February 2024. Personal information was not required in this study. The fluorescence staining solution for fungal samples was provided by Jiangsu Life Time Biological Co., Ltd., and image scanning was performed at 10 × magnification using the intelligent fluorescence microscope from Shanghai TuLi Technology Co., Ltd.

Annotation and dataset

All the images had a resolution of 1,920 × 1,080 pixels and were annotated by two experienced clinicians. The annotation was performed using an in-house developed software named APTime, developed by SODA Data Technology Co., Ltd. A total of 942 images were labeled for constructing an object detection model, among which 813 images were divided into a training set and a validation set, and 129 images were used to construct an independent test set. For constructing the mycelium detection model, 689 images containing mycelium were collected and cropped into 600 × 600-pixel tiles with a 160-pixel overlap, resulting in a total of 1,351 mycelium-positive patches and 3,072 mycelium-negative patches. These images were divided into training, validation, and testing sets, as shown in Table 1. Finally, 70 negative samples were collected to evaluate the performance of the proposed framework.

Table 1 Datasets used for training and validation of the YOLOX and MobileNet V2 models.

Dual-model framework

The obtained fluorescence fungal images are analyzed by the two deep learning models, and the final result for a whole image is obtained by integrating the output results of the two models. First, the YOLOX model is used to identify scattered hyphae and spores17. The YOLOX-L is selected in this study as an object detection model to classify, locate, and count the spores and hyphae. The CSPNet model is used as a backbone.

For mycelium detection, the MobileNet V2 architecture is used18. The images, with the initial size of 1,920 × 1,080 pixels, are segmented into 160-pixel tiles with a 600 × 600-pixel overlap, which are then resized to a size of 512 × 512 pixels for further processing by the MobileNet V2 model. For the analysis of an entire image by the MobileNet V2 model, first, a classification result for each tile of an image is obtained, and then all the results of all the tiles are aggregated. If any tile indicates a positive result, the entire image is considered mycelium-positive, as illustrated in Fig. 1.

Fig. 1
figure 1

The proposed framework for fungal spore and hypha identification, where each image is subjected to two analytical processes to analyze scattered hypha, spore, and mycelium.

Model training

Image preprocessing included three key steps: (1) normalization through pixel value division by 255; (2) preservation of standard RGB channel ordering; and (3) random brightness adjustments with ± 20% of the original intensity to ensure robustness across varying illumination conditions.

In the YOLOX-L model’s training process, MixUp and Mosaic were used for data augmentation. Data augmentation for MobileNet training was performed by random flip, rotation, and blurring. In addition, the hue, saturation, and input image values were also randomly altered. During MobileNet training, batch-level balancing was used to ensure that training batches contained an equal number of positive and negative samples. This approach could effectively mitigate the impact of class imbalance during the optimization process. To mitigate overfitting in both the YOLOX and MobileNet V2 models during model training, this study implemented the dropout (rate = 0.5), label smoothing (ε = 0.001), and L2 weight regularization (λ = 0.001). These measures collectively improved the models’ generalization ability on unseen data while maintaining diagnostic accuracy.

Evaluation

This study conducted a comprehensive evaluation of the proposed model’s performance, examining various metrics: (1) the YOLOX model’s accuracy in detecting spores and hyphae; (2) the consistency of interpreting spores and hyphae among different clinicians; (3) the MobileNet V2 model’s accuracy in classifying mycelium; (4) the proposed dual-model framework’s accuracy in determining whether a sample image is positive or negative for any fungal form. To evaluate the YOLOX model’s performance (i.e., metric (1)), this study selected the Intersection over Union (IoU) metric to quantify the degree of overlap between the predicted and ground-truth bounding boxes. The YOLOX model’s performance in detecting targets was evaluated using the precision, recall, PR-curve, AP, mAP, and F1-score metrics. The F1-score index was also used to evaluate consistency among clinicians (i.e., metric (2)). For the evaluation of metrics (3) and (4), the precision, recall, Kappa, and F1-score indices were used as quantitative metrics.

Results

Evaluation of scattered spore and hypha detection ability of YOLOX model

This evaluation process involved 129 images containing scattered spores or hyphae, which were labeled and double-checked by two doctors to ensure data quality. The specific dataset contained various morphologies of spores and hyphae commonly seen in clinical practice. Among them, the spores included round, oval, and bowling ball-shaped spores in budding, and the hyphae included septate, non-septate hyphae, and branched hyphae, as shown in Fig. 2(a). Different shapes of objects used in the evaluation could all be accurately identified by the trained YOLOX model. The PR-curves and AP performances were separately evaluated for hyphae and spores at different IoU thresholds of 0.1, 0.3, 0.5, and 0.75, as presented in Fig. 2(b).

Fig. 2
figure 2

(a) Different forms of spores and hypha; (b) the PR curves of the YOLOX model in identifying spores and hypha for different IoU values, where the left diagram shows the PR curves for spore, and the right diagram depicts the PR curves for hyphae; in both diagrams, the horizontal coordinate indicates the recall rate, and the vertical coordinate shows the precision rate; (c) precision, recall, and F1-score results of the YOLOX model.

YOLOX model detects hyphae with the form of a box, but the hyphae have a characteristic form that cannot be displayed in accordance with the shape of the box because of its curved linear structure. Therefore, to reduce the false-negative rate, the model should be able to detect as many hyphae suspect areas as possible. Accordingly, in this study, the IOU value was set to the lowest value. When the IoU was set to 0.1, the recall value of the spore or hypha detection was the highest, as displayed in Fig. 2(c); the F1-score and AP values were 0.81 and 0.89 for spore, 0.88 and 0.92 for hyphae, respectively; the mAP was 0.9. The results indicated that the proposed model could accurately identify different morphologies of hyphae and spores. However, as spores were more susceptible to the background noise, the model exhibited superior performance in hypha detection compared to its performance in spore detection.

Evaluation of consistency among clinicians

Different clinicians might come to different conclusions when reading images under a microscope. To assess the consistency between their, three experienced clinicians annotated the same dataset, and their labeled results were compared using the F1-score metric as an evaluation indicator, as shown in Table 2. The comparison results of the three clinicians showed that the F1-score value agreement between any two of them was 80%–90%, and the agreement between the proposed AI model and any one doctor was also 80%–90%. Several factors contributed to the inconsistency between clinicians, as well as between clinicians and the proposed AI model, as illustrated in Fig. 3. First, the Bounding box size varied and did not strictly follow the target object’s edges drawn by different clinicians. In addition, for some cases where multiple spores or hyphae targets were intertwined, different clinicians drew different numbers of label boxes and classified them differently. For instance, one clinician might classify all the targets as a whole, while others might label each target within them separately. Further, the subjectivity factor was also introduced in some images that included targets with a very low fluorescence intensity or strong background noise.

Table 2 Consistency of interpretation between different clinicians and between the three clinicians and the proposed dual-model framework in terms of the F1-score metric.
Fig. 3
figure 3

Illustration of conditions under which clinicians might have different interpretations.

Evaluation of fungal mycelium detecting ability of MobileNet V2 model

In preliminary analysis, this study tried to use the YOLOX model to detect all fungal forms, but the results indicated that the YOLOX model did not perform well in processing large targets with complex internal structures, such as fungal mycelium. Therefore, this study introduced the MobileNet V2 model for this task. Due to the characteristic of mycelium that comprised multiple entangled hyphae, as depicted in Fig. 4(a), it was difficult to identify and count a single hypha. However, a qualitative result of negative or positive mycelium was easier to obtain using a classification model. The precision, recall, and F1-score values of the classification model were 83%, 100%, and 93%, respectively, as presented in Figs. 4(b) and (c). As shown in Figs. 4(b) and (c), the MobileNet V2 model performed well in the mycelium classification task. The results indicated that the MobileNet V2 model had a high recall rate, meaning that it could hardly miss any mycelium.

Fig. 4
figure 4

(a) Different forms of mycelium; (b) the confusion matrix of the MobileNet V2 model; (c) the MobileNet V2 model’s performance evaluated using the precision, recall, and F1-score values.

Evaluation of fungal image positivity detection effect of proposed dual-model framework

For the evaluation of all the fungal forms used in this study, an ensemble workflow integrated the results of the spore and hypha detection model and the fungal mycelium classification model to obtain a final result of fungal negative or positive for the entire image. Fungal negative meant no hyphae or spores were identified. A total of 219 images were used in this evaluation. The precision, recall, F1-score, and Kappa values of the proposed model were 92.5%, 99.3%, 95.7%, and 0.857, respectively, as shown in Figs. 5(a) and (b). The results demonstrated that the proposed fluorescence fungal image analysis framework’s results were highly consistent with the evaluation results of clinicians, having a significant clinical reference value.

Fig. 5
figure 5

(a) The confusion matrix of the proposed dual-model framework; (b) the proposed dual-model framework’s performance evaluated using the precision, recall, F1-score.

Discussion

In clinical translational research, immunofluorescence images have been widely applied to pathological tissue analysis to study tumor microenvironments19, due to their ability to detect multiple biomarkers simultaneously on a single tissue section. This has led to the development of numerous AI-based fluorescent image analysis methods20. In clinical diagnosis, there is an increasing trend of fluorescence imaging application in the clinical diagnosis of fungal infections due to its sensitivity and specificity. Notably, fluorescence images using fluorescence dye that specifically stains the component of the fungal cell wall are better at distinguishing fungi from the background than bright-field images, allowing for clearer observation of fungal morphology and improving a doctor’s efficiency. In this study, two deep learning models, the YOLOX and MobileNet V2 models, were combined to conduct an analysis of fungal fluorescence images. In model performance validation, both individual and ensemble models demonstrated high efficacy in identifying fungal spores, hyphae, and mycelium, achieving results comparable to those of clinicians. To the best of our knowledge, this is the first time that fungal mycelium has been detected, separate from spores or single hyphae using a different model. Fungal mycelium is a common fungal form in clinical samples, and its accurate detection is important for clinical diagnosis. With the widespread use of digital scanners in hospitals, a fully automated analysis workflow powered by deep learning technology would be a promising approach to alleviate the burden of hospitals.

Previous studies have primarily focused on non-superficial fungi, including non-cutaneous fungal species, which exhibit significant morphological differences compared to superficial fungi. Consequently, the AI algorithms required for their detection might also differ15,16. In addition, in certain clinical fungal diagnostic scenarios, the challenge lies in fungal classification, a task addressed through image classification algorithms12. Although some studies have conducted the AI-based identification of superficial fungi from bright-field images in recent years, they have not comprehensively analyzed the integrated recognition of fungal spores, hyphae, and mycelium in superficial fungi, which is crucial to reflect real-world clinical application scenarios. In contrast, this study employs a combination of different AI algorithms to identify diverse morphological variations of superficial fungi, aiming to determine the presence of fungal infection in clinical samples.

Although this study has achieved promising results, it has some limitations. First, the annotation process of fungus images is time-consuming and complex. Second, the generalization ability of the proposed hybrid model is limited by the training data, and fungus species are diverse and are known for their seasonal and regional distribution characteristics. Therefore, future research could explore the following aspects. First, the proposed model could be further optimized to improve its classification ability for small targets and complex morphology. For instance, different AI-based models, such as the YOLOv5, Faster R-CNN, and EfficientNet models, could be employed to improve detection performance. Second, the training dataset could be expanded and diversified to enhance the proposed model’s generalization ability. Third, more advanced augmentation methods, such as domain-specific augmentations (e.g., synthetic lesion generation via StyleGAN for medical images), could be used to improve detection performance to address the problems of class imbalances and rare cases.

In summary, the combination of the YOLOX and the MobileNet V2 models provides a more comprehensive and effective method for analyzing fluorescent fungal images. Future work could further explore the optimization and fusion of the two algorithms to expand the recognition range of fungal species, aiming to further improve the accuracy of these models. At the same time, the proposed AI-driven framework could be combined with fluorescence scanning technology to provide a comprehensive clinical solution for automated skin fungal identification to support diagnostic decision-making.