Abstract
There are too many types of Chinese medicinal herbs (CMH) and it is difficult to collect microscopic images, which naturally leads to the problem of small sample size. In addition, CMH also has some scarcity characteristic, with the proportion of certain cells as low as 0.5%. This leads to the failure of deep learning models, and even few-shot learning methods are difficult to solve effectively. Expanding the data scale of rare features is one of the effective strategies. To address this challenge, we propose an effective microscopic image augmentation approach for few-shot learning (MIAA-FSL). The approach consists of two aspects: first, we design the conditionally guided microscopic image generation model (CGMIGM), which combines the denoising diffusion probabilistic models (DDPM) based conditional guidance technique to efficiently generate rare features and thus alleviate the class imbalance problem. Second, we introduce the semi-supervised learning data augmentation model (SSLDAM), which integrates semi-supervised image processing and pseudo-label generation techniques to effectively overcome the issues of damage, blurriness, and difficulty in discernment in microscopic images, making otherwise unusable images usable. The experimental results show that the MIAA-FSL improves the identification accuracy by 24% on average compared with the Microscope Image Recognition + DDPM (MIR+DDPM) approach, especially in the identification of rare features, the accuracy is significantly improved from 45.5% to 87.0%, which effectively mitigates the problem of object detection with few samples.
Similar content being viewed by others
Introduction
Identification of Chinese Medicinal Herbs (CMH) is crucial in ensuring their quality, as it helps verify their purity, authenticity, and effectiveness. Currently, the identification of these herbs mainly relies on several technical approaches, including morphological identification1, origin identification2, and microscopic identification3. With the growing diversity of CMH, traditional microscopic identification methods encounter significant challenges, including limited data availability and class imbalance. These issues hinder the accurate and reliable identification of CMH, highlighting the need for more advanced approaches to address these limitations effectively. Morphological identification1 involves preliminary identification of the appearance, shape, and texture of medicinal materials through sensory means such as touch, smell, taste, and sight. Origin identification determines the source of medicinal materials by analyzing their appearance or classification, considering factors such as growth environment and geographical location. Microscopic identification utilizes microscopy to observe the microstructure, cellular morphology, and texture features of CMH, aiding in the distinction between different types. These identification approaches complement each other, and improve the accuracy of identifying CMH, thus ensuring the quality and effectiveness of CMH. Among them, Microscopic Identification of Chinese Medicinal Herbs (MICMH) is crucial. According to the Chinese Pharmacopoeia4, MICMH is the most common, convenient, and cost-effective approach for preliminary screening of CMH. This approach utilizes an optical microscope to observe the microstructure, cell morphology, and texture characteristics of CMH to find their representative cell structures, distinguish different CMH, and then identify their authenticity. MICMH is a low-cost, rapid approach and serves as an effective quality control tool.
However, the cost of identification increases significantly in the case of large-scale testing, especially in the field of CMH, which has numerous types of medicinal products. The main challenge of automatic identification is the extensive diversity of CMH, resulting in not only a scarcity of relevant microscopic image data but also an imbalanced distribution of categories. Researchers developed strategies to address issues such as Few-shot object detection, including model fine-tuning, data augmentation, and transfer learning. Nevertheless, in situations with extremely limited data, both fine-tuning and transfer learning prove to be less effective5. The most effective solution for resolving class imbalance issues lies in finding approaches to increase the number of available samples. Nowadays, data augmentation shows great potential in improving the number of samples. Among these, mainstream approaches for enhancing data include traditional transformations6, optimization strategies7, and generative models8. These traditional data augmentation approaches have the advantage of simulating potential variations encountered by images in the real world, which helps models learn comprehensive features effectively67. However, these traditional approaches suffer from limited transformation options and inconsistent quality in the generated samples. To tackle the challenges of small sample sizes and class imbalance in few-shot object detection, we propose two innovative strategies. First, the condition-guided microscopic image generation model provides precise control over image attributes, significantly improving the quality and diversity of the generated samples. Second, the semi-supervised learning-based data augmentation approach effectively utilizes pseudo-labels to maximize the potential of unlabeled data, thereby enhancing classification accuracy and boosting model generalization. These advancements directly address the core issues in small-sample detection, offering clear advantages over traditional methods. Conversely, the latest generative model approaches can directly produce additional training samples by generating highly realistic synthetic images, thereby enhancing the learning efficiency and generalization capability of the model. This represents the current research focus and future direction in the field.
Inspired by this, we attempt to utilize generative data augmentation to address the issues of Few-shot object detection. Therefore, we propose an effective microscopic image augmentation approach for Few-shot Learning (MIAA-FSL). As shown in Fig. 1, rare features suffer from low model accuracy due to insufficient training samples. By using data generation techniques, rare samples can be synthesized to enhance the diversity and quantity of the dataset. This process allows the model to undergo more comprehensive training, thereby significantly improving its accuracy. Specifically, our work includes the following innovations:
(1) We propose a condition-guided microscopic image generation model, which ensures precise and personalized control over the image generation process. It can generate images with specific attributes in response to particular conditions. As a result, it addresses the issues of insufficient samples and poor quality often encountered by traditional models when dealing with class imbalance problems.
(2) We propose a data augmentation model based on semi-supervised learning. We utilize this approach to work with unlabeled or weakly labeled samples by generating pseudo-labels that are comprehensive and effective. This approach increases the diversity of our training dataset. We enhance the model’s performance by using a large amount of unlabeled data, significantly improving classification accuracy and generalization capabilities, particularly in Few-shot object detection. It mitigates issues of category imbalance effectively.
(3) Extensive evaluation experiments demonstrate that MIAA-FSL improves identification accuracy by an average of 24.0% compared to MIR+DDPM. Specifically, in situations where the algorithm was previously constrained by an extreme shortage of samples in certain categories, the identification accuracy increased significantly, from 45.5% to 87.0%, thereby making data that was previously unusable now valuable.
Related work
Microscopic image identification technology
The core of microscopic image identification is to recognize cellular-level images. The latest approach is deep learning, which researchers utilized for cancer cell identification9. Wang et al.9 proposed a breast cancer pathology image classification approach based on deep learning and transfer learning, using convolutional neural networks (CNNs) to successfully classify breast cancer tissue slides, significantly improving diagnostic accuracy and efficiency. Hameed et al.10 proposed a novel breast cancer image classification model, which uses deep learning trained on a large dataset to automatically classify complex cancer cell images. Gupta et al.11 conducted a systematic review of deep learning approaches for breast cancer detection, focusing on MRI-based detection approaches and analyzing current research advancements and challenges. Inspired by this, we could find meaning in Microscopic Automatic Identification12. Researchers proposed innovative approaches such as meta-learning13 and matching neural networks14, which are suitable for microscopic automatic identification.
Meta-learning-based image identification approaches13 mainly learn meta-knowledge from a large number of priority tasks, and utilize a priori knowledge to guide the model in new task learning. For example, Finn et al.15 introduce the Model-Agnostic Meta-Learning (MAML) approach, which aims to identify task-sensitive parameters in neural networks. MAML achieves rapid convergence of the loss function through parameter fine-tuning. Wang et al.16 combined meta-learning with data generation to induce changes in specific attributes and features of existing images, transferring these to new samples. This process resulted in the generation of new sample images with varied alterations. Ravi et al.17 utilize meta-learning approaches to update optimizer parameters. These updates were applied to update the network of classifiers, facilitating image classification. However, when meta-learning algorithms are applied to Few-shot learning, different support data can result in significant fluctuations in identification accuracy18.
In addition, some researchers proposed matching networks14, which enables fast learning on datasets while utilizing unlabeled data effectively. For instance, Nakamura et al.19 proposed a fine-tuning approach for image identification. They achieved this by adjusting learning rates, utilizing adaptive gradient optimization, and improving network structures. Vinyals et al.14 proposed matching networks. These networks mAP labeled Few-shot data and unlabeled data to corresponding labels and utilize unlabeled data effectively. Nevertheless, they encounter difficulties when dealing with target data that has a significantly different distribution than the training data20.
In summary, these approaches assume a sufficient number of samples for training and validation. However, they may fail due to the very limited data available for some CMH in microscopic identification.
Data augmentation techniques
Data augmentation techniques are key approaches for enhancing model performance. These techniques primarily include traditional transformation approaches6, generative models7, and optimization strategy approaches8. The purpose of these approaches is to augment the dataset and improve the model’s robustness to noise and data variations.
Traditional transformation approaches mainly include Hide-and-seek6, Cutout21, Random Erasing22, etc. They focus on increasing data diversity through geometric and color transformations. Specifically, Hide-and-seek promotes the model to learn more robust features by randomly occlusion image patches, and Cutout promotes the model to learn more robust features by occlusion part of the image. These approaches are simple and easy to implement but may lead to misleading negative samples and loss of critical information.
Generative models mainly include Bayesian Data Augmentation (BDA)7, Imbalanced Conditional Generative Adversarial Network (ImbCGAN)23, and Boundary-Aware Generative Adversarial Network (BAGAN)24. They rely on a specific model structure and are used to generate new data samples, which improve image quality and quantity by generating new samples. Specifically, BDA generates images optimized by the Monte Carlo EM algorithm. ImbCGAN introduces the conditional generator, and BAGAN utilizes a Generative Adversarial Network (GAN) to balance the imbalanced dataset. However, these approaches can only improve the sample to a certain extent.
Optimization strategies mainly include RandAugment8 and AdaTransform25. These approaches utilize reinforcement learning and adversarial learning strategies to determine how to augment data. Specifically, RandAugment selects augmentation strategies by simplifying computations, while AdaTransform learns competitive and cooperative tasks to adapt to the specific needs of a model. However, these approaches only enhance the adaptability of models for specific tasks.
In general, traditional transformation approaches enhance data diversity, generative models produce new samples, and optimization strategies refine enhancement techniques using intelligent algorithms. In Few-shot object detection, generative models are particularly effective. However, these approaches often fall short when dealing with extremely limited data.
Our approach
To address the issues of Few-shot object detection, we propose MIAA-FSL, as illustrated in Fig. 2. MIAA-FSL aims to improve the accuracy of image identification algorithms from the perspectives of data generation and algorithmic augmentation. MIAA-FSL comprises two key components: (1) Condition-Guided Microscopic Image Generation Model. (2) Data Augmentation Model based on Semi-Supervised Learning. Therefore, the MIAA-FSL training algorithm will be applied in the subsequent sections to enhance the overall performance of the image classification process.
Condition-guided microscopic image generation model
To generate specific types of microscopic images through conditional guidance, we propose a conditional-guided microscopic image generation model (CGMIGM), which includes two stages: microscopic image exploration and image generation.
Microscopic image exploration phase
We conducted a distinct exploration process that considers the incremental addition of noise in a layer-by-layer manner as the core element of the process. Briefly, we start from a low noise state \(x_0\) and then gradually transform to a high noise state \(x_T\) by gradually increasing the noise. At each step in the process, we generate an exploratory state \(x_{t+1}\) that is dependent on the previous state \(x_t\), while we introduce a certain degree of noise at this step. This process can be viewed as a microscopic image exploration process modeled after a Markov Chain. The formula is as follows:
where \(\mathcal {N}\) denotes the normal distribution, \(\textbf{I}\) denotes the identity matrix of appropriate dimensions, \(q(x_t)\) represents the probability density function of \(x_t\) conditioned on \(x_{t-1}\), T is the total exploration step, t represents the current exploration step, \(\beta _t\) is the weight of the noise introduced when exploring step t, and each step of exploration is affected by the variable \({\left\{ \beta _{t} \in (0,1)\right\} _{t=1}^{T}}\).
We utilize the properties of the Gaussian distribution to ultimately merge the noise terms. The exploration state of each step can be expressed by the initial state \({x}_0\) through continuous iteration. The formula is as follows:
where \(\alpha _t\) acts as the complement of the noise weight \(\beta _t\), defined as \(\alpha _t = 1 - \beta _t\). The term \(\bar{\alpha _t}\) represents the product of cumulative noise weights from \(t=1\) to t, calculated as \(\bar{\alpha _t} = \prod _{i=1}^{t} \alpha _i\), \(\textbf{I}\) denotes the identity matrix of appropriate dimensions, which serves as the covariance matrix of the Gaussian distribution.
Chinese medicinal herbs microscopic image generation phase
We utilize a Unet neural network and Gaussian noise \(\varepsilon\)26 to simulate the Chinese Medicinal Herbs Microscopic Image Generation Phase. During this process, we introduce auxiliary information \(c\) about the types of microscopic images, that play an important role in the generation. Specifically, we design the Unet neural network to take the initial state \(x_0\) and auxiliary information \(c\) as inputs and generate an updated state \(x_{t+1}\). In each step of generation, we gradually introduce Gaussian noise \(\varepsilon\), iteratively updating the state \(x_t\) until reaching the desired high noise state \(x_T\). This way, we can simulate a series of microscopic images with progressively increasing noise to meet specific medicinal herb requirements. The detailed process is as follows:
Firstly, we define a network \(\varvec{\mu }_{\theta }\left( x_{t}, t\right)\), which aims to estimate the mean of the data \(x_{t}\) at a specific time point t during the exploration process. Additionally, we utilize \(\epsilon _{\theta }\left( x_{t}, t\right)\) to reparameterize Gaussian noise. This reparameterization makes the noise dependent on the input \(x_{t}\) at the time step t. The formula for this process is as follows:
where \(\alpha _{t}\) and \(\bar{\alpha }_{t}\) play a role in influencing the transition of exploration states by adjusting the amount of noise.
Subsequently, we consider the CMH category c and introduce a conditional probability distribution \(p\left( x_{t-1} \mid x_{t}, c\right)\). This distribution explores the data \(x_{t}\) at the next time step, given the auxiliary information c, ensuring that it further guarantees the generated microscopic images satisfy specific conditions. The process is as follows:
where \(\sigma _{t}^{2}\) is used to determine the uncertainty and variability of the transformation, \(\mu _{\theta }\left( x_{t}, c\right)\) is the network that calculates the mean based on the auxiliary information c and input \(x_{t}\).
In the CGMIGM model, the classification information \(c\) is incorporated into the diffusion process through a conditional diffusion approach. At each generation step, the auxiliary network \(\mu _{\theta }(x_t, c)\) takes both the current state \(x_t\) and the classification information \(c\) as inputs to compute the conditional mean, which allows the model to ensure that the generated microscopic images meet specific classification requirements. By introducing a conditional distribution \(p(x_{t-1} \mid x_t, c)\), the model effectively integrates the classification information into the generation process, ensuring the generated images accurately represent rare features and meet the intended conditions.
Model training
We begin by designing a classifier-independent diffusion algorithm. The core idea of this algorithm is to guide the generative process through an implicit classifier without explicitly defining and training the classifier model. Specifically, we split the complete generative model into two parts: an unconditional generative model \(p_{\theta }(x)\) and a conditional generative model \(p_{\theta }(x \mid c)\), they share the same neural network for parametric learning and are trained based on paired data (x, c). By training this condition-based guided microgeneration model \(p_{\theta }(x \mid c)\), we can achieve precise control and personalization of the micrographic image generation process to meet specific CMH needs.
We have identified three key advantages in the training process of this model. Firstly, it meets exploratory generative needs. In the context of conditional guidance-based microscopic image generation, we require the ability to generate specific types of images based on detailed requirements. By training both the unconditional generative model \(p_{\theta }(x)\) and the conditional generative model \(p_{\theta }(x \mid c)\), we can explore various approaches for generating different types of images and fine-tune the model parameters as needed for precise control over the microscopic image generation. Secondly, our training process supports personalized generation. By incorporating the conditional generative model \(p_{\theta }(x \mid c)\), we can produce specific types of microscopic images using given category information \(c\). This feature is important for generating images that are tailored to specific needs. Thirdly, MIAA-FSL benefits from a parametric learning strategy. We make efficient use of data and model parameters by sharing a neural network between the generative and conditional generative models. This results in generated microscopic images that are not only more realistic but also more diverse. The detailed training process is as follows:
In this case, to train both models at the same time, the conditional information c that harbors the hidden CMH cell class is randomly set to null, and the noise estimation of the generated images is all done by parameterizing the function \({\varvec{\epsilon }_{\theta }\left( {x}_{t}, t, c\right) }\).
Afterward, we utilize a linear combination of conditional and unconditional score estimates for sampling. By adjusting the guidance weights, we achieve greater flexibility in controlling the realism and diversity of generated images. The details are as follows:
Inspired by Ho26, we train the diffusion model by using a simplified target that ignores the weighting term. The loss function used is as follows:
In summary, the microscopic image generation model based on conditional guidance achieves precision and personalized control by incorporating the conditional information c. Furthermore, it eliminates the bottleneck of traditional classifier-guided approaches to achieve efficient image generation.
The \(c\) plays a critical role in the training process. By incorporating \(c\) into the noise estimation function \(\varvec{\epsilon }_{\theta }(x_t, t, c)\), we influence the content of the image at each generation step. Specifically, the condition \(c\) determines the details of the generated image. When generating TCM micrographic images, the conditional information \(c\) represents different types of CMH cell classes.
Data augmentation model based on semi-supervised learning
To address issues such as damage, blurriness, and difficulty in discernment present in some microscopic images, we propose a Semi-Supervised Learning Data Augmentation Model (SSLDAM), which involves three stages: incorporating semi-supervised images, generating pseudo-labels, and enhancing images based on semi-supervised learning. The specific process is outlined below.
Unlabeled and weakly labeled semi-supervised data processing and analysis
The model starts with the fusion process of the blurry image \(I_{b}\) observed under the microscope and the image \(I_{d}\) generated by the diffusion model. The fusion function L produces the composite image \(I_{e}\) according to the following formula.
where k represents the fusion coefficient, and \(I_{e}\) is the ultimate composite image. The parameter k adjusts the contribution of \(I_{b}\) and \(I_{d}\) in \(I_{e}\).
Following this, we optimize the image \(I_e\) to meet the input requirements of the deep learning model. After the steps of denoising \(S\) and normalization \(N\), we obtain the optimized image \(I'_e\). The optimization process can be expressed using the following formula:
where \(N\) adjusts pixel values to a specified range, ensuring data consistency and comparability, \(S\) removes noise interference, it highlights the valid information in the image. The formula is as follows:
where \(G\) represents the Gaussian kernel, * denotes the convolution operation.
Then, the optimized image \(I'_e\) is used to train the model and further optimize the model parameters to improve the model’s performance on a specific task. The formula is as follows:
where \(\theta\) represents the initial model parameters, \(\theta '\) denotes the model parameters after fine-tuning, \(\mathcal {L}(\theta )\) is the loss function used to evaluate the performance of the model on the fine-tuning dataset.
Training and fine-tuning
In this paper, we adopted YOLOv5 as the base model, denoted as \(P\). YOLOv5 is a real-time multi-object detection framework that leverages feature pyramid networks and anchor box techniques to achieve efficient and accurate object detection27.
The model is initially trained on high-quality microscopic image data to establish robust parameters \(\theta\). It is then fine-tuned using partially damaged, blurred, and challenging-to-discern microscopic images, resulting in updated parameters \(\theta '\). This process ensures the model’s adaptability to diverse real-world microscopic imaging conditions.
Pseudo-label generation process
The fine-tuned model processes optimized input images \(I_{e}'\), generating outputs that include bounding boxes, category predictions, and confidence scores. These outputs form the pseudo-labels \(\hat{y}\), defined as:
where \(g()\) represents the prediction function using the fine-tuned model parameters, and \(\hat{y}\) is the predicted pseudo-labels. It accepts the preprocessed input image and fine-tuned model parameters to generate predictions \(\hat{y}\).
Semi-supervised learning-based image augmentation
After many predictions, we obtain numerous pseudo-labels \(\hat{y}\). However, not every label is effective. Hence, we utilize a confidence-weighted filtering approach to select usable data from the pseudo-labels, calculate the frequency of each target class in all predictions, and combine it with the corresponding confidence score.
To ensure pseudo-label quality, we implement a confidence-weighted filtering strategy. Confidence scores for each predicted class are calculated as:
where \(P(c | I_{e}', \hat{y})\) is the prediction function of the YOLOv5 model, which outputs the probability of the model predicting class \(c\) given \(I_{e}'\) and \(\hat{y}\).
We also assess the frequency of each class by calculating its relative frequency across all predictions. Specifically, we count how many times each predicted class appears in all predictions, and then divide this count by the total number of predictions, thus obtaining the relative frequency of each class in the model’s predictions. The formula is as follows:
where \(N_c(\hat{y})\) is the number of occurrences of class \(c\) associated with the pseudo-labels \(\hat{y}\) across multiple model predictions, and \(N_{\text {total}}\) is the total number of predictions. \(\text {E}(c | \hat{y})\) represents the relative frequency of that class in the predictions.
Finally, we select pseudo-labels \(L_{pl}\) using, this approach prioritizes pseudo-labels with high confidence and frequent occurrence. The formula is as follows:
where \(L_{pl}\) represents the final selected pseudo-labels, \(\text {C}(c | \hat{y})\) indicates the confidence score for class \(c\) calculated based on the pseudo-labels \(\hat{y}\), \(\text {E}(c | \hat{y})\) is the frequency of that class in multiple predictions, computed based on the pseudo-labels \(\hat{y}\).
Overall, SSLDAM aims to enhance microscopic image quality and improve model robustness through multi-stage processes, including image fusion, optimization and fine-tuning, pseudo-label generation, and filtering. By fusing blurry images with diffusion-generated images and applying denoising and normalization optimization, the input images are made suitable for deep learning models. On this basis, YOLOv5 is used for initial training and fine-tuning, generating pseudo-labels that are further refined using a confidence-weighted filtering strategy to optimize the training dataset, thereby improving the model’s ability to recognize complex microscopic images.
MIAA-FSL training algorithm
In response to the challenges of training such algorithms, we face several difficulties, including Few-shot object detection, the need to effectively combine conditional and unconditional training strategies, and the use of Signal-to-Noise Ratio (SNR) sequences for parameter adjustment.
Firstly, we face several challenges when training algorithms for generating images with few categories, primarily due to the scarcity of data samples that limit the performance of traditional generative models. Additionally, we must judiciously combine conditional and unconditional training strategies during the training process to ensure the diversity and accuracy of the generated images. Furthermore, we need to meticulously adjust parameters using SNR sequences for image updates to maintain the quality and stability of the generated images. Implementing semi-supervised data augmentation also poses a challenge, as it requires us to effectively utilize unlabeled and weakly labeled data while creating accurate pseudo-labels. To address these issues, we developed the MIAA-FSL training algorithm that integrates both conditional and unconditional training strategies, employs SNR sequences for iterative image updates, and implements semi-supervised data augmentation to enhance the use of images. These components make up an effective solution that improves the accuracy of image discrimination. For detailed processes, please refer to Algorithm 1.
In the initial stage of the algorithm, we initialized basic parameters (as described in line 1). This initialization included setting values for the probability of unconditional training \(p_{\text {uncond}}\), the guidance strength for conditional generation \(w\), the conditional information of the image \(c\), and the SNR sequence \(\lambda _{1}, \ldots , \lambda _{T}\). These parameters provide the necessary basic settings for the subsequent steps of the algorithm, which are used to control the proportion of unconditional and conditional training and guide the entire image generation process.
Afterward, we move into the sampling and training phase (as described in lines: 2-8 ). In this phase, we dynamically decide whether to utilize conditional information based on the probability \(p_{\text {uncond}}\) of unconditional training, effectively combining both unconditional and conditional training strategies. We optimize the model’s performance by adding noise to the images and processing them with a denoising model.
In the core iterative process, we aim to progressively enhance the quality of the generated images (as described in lines: 9-17). We calculate guidance scores and update the images accordingly in each iteration, which step by step reduces the noise level and improves image quality until a clear final image is produced. Through this iterative optimization, we can generate images that more accurately match the conditional information.
Subsequently, we enhance each image in the dataset (as described in lines: 18-27) and preprocess them to suit the model input. We then have the model predict pseudo-labels and select the high-quality ones based on confidence scores, which are incorporated into the training set to facilitate semi-supervised learning. Ultimately, we train the model by combining the enhanced image and pseudo-labels datasets, continuously refining the prediction model until it converges (as described in lines: 28-31).
In summary, the algorithm’s core modules encompass the generation of CMH cell images and semi-supervised data augmentation. By iteratively training the image generation model and label synthesis model together, the algorithm generates and synthesizes data, enhances the dataset, and improves identification accuracy.
Dataset
We constructed three datasets, including the Chinese Medicinal Herbs dataset (i.e., CMHD), Enhanced Chinese Medicinal Herbs dataset 1 (i.e., ECMHD1), and Enhanced Chinese Medicinal Herbs dataset 2 (i.e., ECMHD2). We split these datasets into training and testing sets at an 8:2 ratio. They include four types of CMH: Atractylodes Macrocephala Koidz (i.e., AMK), Magnolia Officinalis (i.e., MO), Scutellaria Baicalensis (i.e., SB), and Wild Honeysuckle Flower (i.e., HYF). CMHD contains 978 samples, while ECMHD1 and ECMHD2 include 1480 and 2001 samples, respectively. The original images in CMHD and ECMHD1 have a resolution of 1592 \(\times\) 1944 pixels, while the augmented images in ECMHD1 and ECMHD2 have resolutions of 640\(\times\)640 and 52\(\times\)52 pixels, respectively. Specifically, Table 1 shows the total number of samples in each dataset, the resolutions of the original and augmented images, and the number of cell categories in each dataset. Table 2 provides the detailed quantities of characteristic cells across all datasets.
Experimental evaluation
In the experimental evaluation section, we focus on constructing the dataset to better understand the background and data context of the experiments and provide a detailed introduction to the comparative approaches and evaluation metrics employed.
Experimental environment and parameter optimization
Experimental environment
During the model training process, we conducted experiments on the PyTorch 1.9.0 platform using a V100 GPU. The training parameters were set as follows: Epoch = 1500, Batch_Size = 16, Learning_Rate = 1e-2, Num_Workers = 20. To validate the effectiveness of MIAA-FSL, we evaluated its performance using metrics such as F1 score, Mean Average Precision (mAP), Recall, and Precision.
Parameter optimization
We utilized the Taguchi approach28 to optimize parameter selection, aiming to reduce the influence of different algorithm models and parameters on experimental results. We selected the following hyperparameters: Epochs \((E)\), Batch Size \((B)\), Learning Rate \((LR)\), and Num Workers \((NW)\). \(E\) determines the number of iterations of the model on the data, \(B\) affects the number of samples used in each update, \(LR\) controls the amplitude of weight update, and \(NW\) improves the efficiency of data loading. At the same time, we selected four different level values for each hyperparameter to further explore its influence on model performance. We formed a set of experimental parameter combinations with these factors and their corresponding levels. We utilized an orthogonal array design for experimentation (as detailed in Table 3). Among them, YOLOv5 is a real-time multi-object detection algorithm, YOLOv8 is an updated version based on YOLOv5, SSD is a real-time target detection algorithm that can simultaneously predict target location and category in a single neural network, and Faster R-CNN is a faster and more accurate target detection algorithm. This design effectively reduces the influence of noise factors, thereby improving the stability and repeatability of experimental results.
Finally, we calculated the optimal parameter combinations by calculating the maximum SNR, which was calculated as follows:
where \(S_1\) denotes the mean value of the signal, \(N_1\) denotes the mean value of the noise, \(K\) is the number of experiments, and \(y_i\) is the output of each experiment.
In our study, higher SNR values correspond to improved performance28. Notably, the optimal performance was attained using the following parameter settings: Epoch = 1500, Batch_Size = 16, Learning_Rate = 1e-2, and Num_Workers = 20, as illustrated in Fig. 3.
Comparison approaches and evaluation metrics
Comparative experiments
Our experiments are based on two versions of the microscope image recognition approach, including the standard version (Microscope Image Recognition, MIR) and the enhanced version (Microscope Image Recognition Enhanced, MIR_n), which correspond to YOLOv5 and YOLOv8, respectively, although this is not the primary focus of this study. To verify the validity of various data enhancement techniques, we applied these techniques step by step and established comparison approaches, including MIR-NULL, MIR, MIR+DDPM, MIR+MIAA-FSL, MIR_n, and MIR_n+MIAA-FSL. The experimental setup is detailed in Table 4, where “\(\checkmark\)” indicates that a specific technique was used, and “\(\times\)” indicates that it was not.
Evaluation metrics
In the experimental evaluation of this paper, we utilize the following metrics to comprehensively evaluate the image identification model performance:
Accuracy measures the classification accuracy on the whole dataset, calculated as
where \(\text {TP}\) stands for True Positives and \(\text {FP}\) stands for False Positives.
Recall reflects the model’s ability to recognize actual positive categories, calculated as
where \(\text {TP}\) stands for True Positives and \(\text {FN}\) represents False Negatives.
F1 Score considers accuracy and identification ability comprehensively, calculated as
where \(\text {Prec}\) represents Precision and \(\text {Rec}\) represents Recall.
mAP evaluates the performance of the target detection task, calculated as
where \(n\) represents the total number of positive samples, and \(\text {Prec}(R_k)\) represents the precision rate under different recall thresholds. mAP denotes the mean of the APs of all categories.
Stepwise ablation study experiment
The ablation study results, as summarized in Table 4, highlight the contributions of individual components to the overall model performance. The baseline model, MIR-NULL, achieves mAP of 30.32%. When preprocessing is applied (MIR), the performance improves significantly to 43.88%. Integrating a diffusion model (MIR+DDPM) further boosts mAP to 60.04%. The addition of MIAA-FSL (MIR+MIAA-FSL) leads to a substantial improvement, reaching 82.50%. The inclusion of the CGMIGM and SSLDAM modules in the MIR_n variants demonstrates further enhancements, with MIR_n achieving 60.00% and MIR_n+MIAA-FSL achieving the highest mAP of 93.75%. These results emphasize the effectiveness of the proposed components in improving the model’s performance.
Comparison and analysis of the effects of various data augmentation approaches
As shown in Fig. 4, MIR consistently outperformed MIR-NULL with 7% increase in accuracy, as measured by the average accuracy metric across all datasets. This improvement is observed both in individual sample-level accuracy and overall average metrics, highlighting the efficacy of the preprocessing step in enhancing feature extraction. Moreover, the improvement spans across different categories within the datasets, demonstrating the robustness and generalizability of the preprocessing approach. MIR+DDPM further improved the accuracy by 12%, indicating that DDPM effectively expanded the dataset by generating additional samples. However, DDPM mainly focuses on increasing the volume of data and lacks the ability to specifically generate rare samples, which limits its effectiveness in addressing class imbalances. In contrast, MIR+MIAA-FSL showed 24% improvement over MIR+DDPM, suggesting that MIAA-FSL not only increased the sample size but also generated rare features by CGMIGM. By targeting underrepresented categories and further expanding the sample set using SSLDAM, MIAA-FSL enhances the robustness of the model and significantly improves performance in Few-shot scenarios.
Presentation of data augmentation effects with MIAA-FSL
The result of MIAA-FSL is illustrated in Fig. 5. Specifically, Fig. 5a displays the real microscopic images from the dataset alongside their corresponding images generated by CGMIGM. Upon visual comparison, it is readily apparent that the images generated by CGMIGM closely match the real microscopic images in shape and contour, achieving precise detail reproduction. Furthermore, Fig. 5b shows the practical application effects of SSLDM, which successfully assigns effective pseudo-labels to those blurred or damaged microscopic images. In summary, MIAA-FSL not only possesses the capability to generate images with meticulous details but also can assign pseudo-labels to blurred or damaged images. MIAA-FSL greatly facilitates the efficient expansion of datasets.
Performance evaluation and effect analysis of MIAA-FSL
The effectiveness of MIAA-FSL in limited sample environments has been comprehensively evaluated using data with a limited number of categories. The experimental results shown in Fig. 6 indicate that MIR_n + MIAA-FSL outperforms MIR_n, demonstrating a more stable trend in identification accuracy across different images and a significant improvement in accuracy across various images. The average accuracy increased by 34%, highlighting the effectiveness of MIAA-FSL in addressing the challenges posed by limited samples.
Robustness analysis
We conducted a comprehensive evaluation of the four approaches: MIR-NULL, MIR, MIR+DDPM, and MIR+MIAA-FSL under five lighting conditions: low light, mixed light, standard light, natural light, and strong light to further validate the effectiveness of MIAA-FSL. As shown in Fig. 7, MIR outperformed MIR-NULL, achieving 13% improvement in accuracy as measured by the average accuracy metric across all datasets. This improvement was observed consistently under five distinct lighting conditions: low light, mixed light, standard light, natural light, and strong light. The analysis indicates that MIR’s data preprocessing step significantly enhances model performance, not only across diverse datasets but also under varying environmental conditions, thereby demonstrating its robustness and effectiveness in practical applications. MIR+DDPM does even better, with 41% accuracy boost. This highlights the effectiveness of diffusion models for data augmentation. MIR+MIAA-FSL outperformed MIR+DDPM, with 1% increase in accuracy. This proves its superior performance in complex lighting and its effectiveness in object detection.
Conclusion
In this paper, we propose MIAA-FSL to address the Few-shot object detection challenge in collecting CMH. MIAA-FSL introduces two key innovations: CGMIGM and SSLDAM. CGMIGM generates synthetic microscopic images by incorporating classification information into a diffusion-based model, effectively creating diverse images that represent rare features to address the class imbalance problem. This enhances the model’s ability to learn from limited data. SSLDAM, on the other hand, utilizes semi-supervised learning to make use of unlabeled data by generating pseudo-labels. This approach helps augment the dataset and overcome challenges such as image damage, blurriness, and difficulty in discerning features. Together, these two models enable MIAA-FSL to significantly improve identification accuracy, with results showing an increase from 45.5% to 87% for rare features, demonstrating the practical effectiveness of our approach in solving Few-shot object detection problems.
In this study, we have successfully tackled critical challenges in few-shot object detection, with a particular focus on improving the recognition accuracy of rare and underrepresented features. Moving forward, our research will prioritize optimizing the model’s generalization capabilities, ensuring better adaptability across diverse data distributions and enabling effective handling of new, unseen tasks. We will also explore advanced adaptive learning strategies for complex environments, aiming to preserve efficient recognition even in the presence of noise, blurriness, or damaged images. By enhancing the model’s robustness to such distortions and incorporating techniques like cross-modal learning and cross-domain transfer, we aim to further strengthen its stability and reliability. These developments will be crucial for ensuring the model’s performance in real-world applications, especially in dynamic and unpredictable situations, paving the way for broader and more effective deployment in complex settings.
Data Availibility
The datasets generated and analyzed during the current study are not publicly available due to proprietary ownership by the enterprise and private copyright restrictions. For further inquiries, please contact pangguangyao@gxuwz.edu.cn.
References
Thongkhao, K. et al. Differentiation of cyanthillium cinereum, a smoking cessation herb, from its adulterant emilia sonchifolia using macroscopic and microscopic examination, hptlc profiles and dna barcodes. Sci. Rep. 10, 14753 (2020).
Yin, L. et al. A review of the application of near-infrared spectroscopy to rare traditional chinese medicine. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 221, 117208 (2019).
Ichim, M. C., Häser, A. & Nick, P. Microscopic authentication of commercial herbal products in the globalized market: Potential and limitations. Front. Pharmacol. 11, 876 (2020).
Leong, F. et al. Quality standard of traditional chinese medicines: Comparison between European pharmacopoeia and Chinese pharmacopoeia and recent advances. Chinese Med. 15, 1–20 (2020).
Zhang, X. et al. Fault diagnosis for small samples based on attention mechanism. Measurement 187, 110242 (2022).
Kumar Singh, K., & Jae Lee, Y. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In Proceedings of the IEEE International Conference on Computer Vision, 3524–3533 (2017).
Tran, T., Pham, T., Carneiro, G., Palmer, L. & Reid, I. A bayesian data augmentation approach for learning deep models. Adv. Neural Inf. Process. Syst. 30 (2017).
Lim, S., Kim, I., Kim, T., Kim, C. & Kim, S. Fast autoaugment. Adv. Neural Inf. Process. Syst.32 (2019).
Wang, W., Li, Y., Yan, X., Xiao, M. & Gao, M. Breast cancer image classification method based on deep transfer learning. In Proceedings of the International Conference on Image Processing, Machine Learning and Pattern Recognition, 190–197 (2024).
Guo, J. et al. A novel breast cancer image classification model based on multiscale texture feature analysis and dynamic learning. Sci. Rep. 14, 7216 (2024).
Adam, R., Dell’Aquila, K., Hodges, L., Maldjian, T. & Duong, T. Q. Deep learning applications to breast cancer detection by magnetic resonance imaging: A literature review. Breast Cancer Res. 25, 87 (2023).
Lee, D. Y., Li, Q. Y., Liu, J. & Efferth, T. Traditional chinese herbal medicine at the forefront battle against covid-19: Clinical experience and scientific basis. Phytomedicine 80, 153337 (2021).
Chen, J., Zhan, L.-M., Wu, X.-M. & Chung, F.-L. Variational metric scaling for metric-based meta-learning. Proc. AAAI Conf. Artif. Intell. 34, 3478–3485 (2020).
Vinyals, O., Blundell, C., Lillicrap, T. & Wierstra, D. et al. Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 29 (2016).
Finn, C., Abbeel, P., & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, 1126–1135 (PMLR, 2017).
Wang, Y.-X., Girshick, R., Hebert, M. & Hariharan, B. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7278–7286 (2018).
Ravi, S. & Larochelle, H. Optimization as a model for few-shot learning. In International Conference on Learning Representations (2016).
Agarwal, M., Yurochkin, M. & Sun, Y. On sensitivity of meta-learning to support data. Adv. Neural. Inf. Process. Syst. 34, 20447–20460 (2021).
Nakamura, A. & Harada, T. Revisiting fine-tuning for few-shot learning. arxiv 2019. arXiv preprint arXiv:1910.00216 (1910).
Wang, M. & Deng, W. Deep visual domain adaptation: A survey. Neurocomputing 312, 135–153 (2018).
DeVries, T. & Taylor, G. W. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017).
Zhong, Z., Zheng, L., Kang, G., Li, S. & Yang, Y. Random erasing data augmentation. Proc. AAAI Conf. Artif. Intell. 34, 13001–13008 (2020).
Douzas, G. & Bacao, F. Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst. Appl. 91, 464–471 (2018).
Mariani, G., Scheidegger, F., Istrate, R., Bekas, C. & Malossi, C. Bagan: Data augmentation with balancing gan. arXiv preprint arXiv:1803.09655 (2018).
Takase, T., Karakida, R. & Asoh, H. Self-paced data augmentation for training neural networks. Neurocomputing 442, 296–306 (2021).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020).
Inbar, O. et al. Analyzing the secondary wastewater-treatment process using faster r-cnn and yolov5 object detection algorithms. J. Clean. Prod. 416, 137913 (2023).
Taguchi, G. The system of experimental design engineering methods to optimize quality and minimize cost. Am. Supplier Inst. (1987).
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 62262059), the Natural Science Foundation of Guangxi Province (Grant Nos. 2021JJA170178), the Funding Scheme for Innovation and Technology Promotion of FDCT (Grant Nos. 0009/2024/ITP1), the Industry-University-Research Project of Wuzhou High-tech Zone and Wuzhou University (Grant No. 2020G003), the Guangxi Innovation-Driven Development Special Fund Project (Guike AA18118036).
Author information
Authors and Affiliations
Contributions
Wanying Li Was responsible for developing partial innovative points, experiment design and implementation, data collection and analysis, result interpretation, and writing the main body of the paper. Guangyao Pang Proposed research ideas, provided comprehensive project support, allocated resources, and oversaw paper review and guidance. Other authors Provided experimental assistance, organized data and literature, and offered technical support.
Corresponding author
Ethics declarations
Competing interests
The authors declare no potential conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, W., Yang, L., Peng, G. et al. An effective microscopic image augmentation approach. Sci Rep 15, 10247 (2025). https://doi.org/10.1038/s41598-025-93954-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-93954-x










