Background & Summary

Medicinal plants refer to plants that prevent and treat diseases to maintain health for the human body. Medicinal plants are not only the source of a large number of Western medicinal compounds but also the main source of Complementary and Alternative Medicine (CAM). China is rich in medicinal plant resources. According to statistics1, there are 11,146 species belonging to 383 families and 2313 genera of medicinal plants in China. In recent years, medicinal plants have played an increasingly important role in the prevention and treatment of various diseases2,3,4. However, different medicinal plants have distinct therapeutic effects. In clinical practice, it is necessary to use them according to the specific condition to mitigate adverse reactions and avoid toxicity. Therefore, accurately identifying medicinal plants to avoid misplanting, miscollection, misuse, and fraudulent substitution is the most fundamental and crucial step in ensuring the quality and efficacy of medicines.

As a part of systematics, taxonomy is closely related to phylogenetic studies and the research of evolutionary processes within systematics. Its tasks include using the findings of systematics to name and classify organisms under reasonable principles, providing an organizational framework that facilitates the storage and retrieval of biodiversity information, and offering guidance for identification work5. Plant taxonomy is a long-established discipline that has continuously developed and improved through the practice of human recognition and utilization of plants. Guided by Darwin’s theory of evolution6, plant taxonomists from various countries have proposed insights into plant classification systems by studying the origin and development of the plant kingdom. Among the most influential systems are those of Engler, Hutchinson, Takhtajan, and Cronquist. The Engler system and the Hutchinson system, in particular, represent the “Pseudanthium” and “Euanthium” schools, respectively, with flower characteristics being the primary distinguishing features7. The classification of medicinal plants adopts the principles and methods of plant taxonomy. Mastering characteristics such as plant morphology and microscopic structure ensures the accurate identification of plants with medicinal value8. This guarantees the authenticity of medicinal plants and herbal medicines, facilitating better research, rational development, and utilization.

Traditional identification of medicinal plants mainly relies on professionals according to the corresponding classification system. According to the written description of the classification system, the identification of medicinal plants is carried out by sight, hand touch, mouth taste, and nose9. This method mainly depends on the professional level and experience of the operator. Therefore, different operators have certain subjectivity in identifying medicinal plants, and their accuracy rates vary. With the development of human society, the demand for medicinal plant resources is increasing. Traditional identification methods cannot adapt to the rich diversity of plant taxa. At present, the classification of medicinal plants can be conducted from different perspectives, including pharmacognosy, pollen, leaf epidermis and seed coat, cytology, and molecular identification10.

Particularly, plant image recognition technology based on deep learning11,12 is a more intelligent and reliable auxiliary technology that can improve the accuracy and efficiency of plant identification. Due to the higher difficulty in collection and annotation, there are relatively fewer related works. Besides, Traditional Chinese Medicine (TCM) is more prevalent in Southeast Asia, and research on Traditional Chinese Medicinal Plant (TCMP) datasets is also distributed in the Southeast Asian region. In 2021, Tung and Vinh13 proposed a large, public, and multi-class dataset of medicinal plant images called VNPlant-200. The dataset consists of a total of 20,000 images of 200 different labeled Vietnamese medicinal plants. They also reported the preliminary classification results by using two local image descriptors. Huang et al.14 built a Chinese medicinal blossoms dataset, which consists of twelve blossoms used in TCM. The authors employed various data augmentation methods to increase the number of samples and provided the image classification results using AlexNet15 and InceptionV316. Abdollahi17 proposed using MobileNet18 to identify various medical plants in Ardabil, Iran. The dataset used to train and evaluate includes 30 different classes of medicinal plants, totaling 3000 images. DIMPSAR19 is a dataset for the analysis and recognition of Indian medicinal plant species. The dataset consists of two parts: leaf and plant. The former includes 80 classes, while the latter contains 40 classes. MED-11720 is an image dataset of common medicinal plant leaves from Assam, India. Beyond classification, the MED-117 dataset also includes segmented leaf frames for leaf segmentation using U-Net21. Ref. 22 reported a collection of leaf images from 10 common medicinal plants in Bangladesh. This paper also used several common Convolutional Neural Networks (CNNs) to classify this dataset. In addition to medicinal plant datasets, researchers have also collected an image dataset of TCM fruits23. This dataset includes 20 common types of TCM fruits. This paper also utilizes CNNs for classification. We summarized the details of the above datasets in Table 1.

Table 1 Detail information about related datasets.

However, although some works of medicinal plant datasets have been introduced, there are still many challenges in this field:

  1. 1.

    Limited diversity of species. Existing medicinal plant datasets capture a limited number of plant species and parts, restricting their application in clinical practice.

  2. 2.

    Lacking systematic data collection and cleaning framework. The data collection and cleaning process for existing datasets often highly relies on manual curation without automation tools.

  3. 3.

    Inadequate validation framework. Previous papers do not conduct technical validation using state-of-the-art network architectures and advanced data augmentation methods. This hampers the accurate assessment of datasets and achieves suboptimal performance.

To address these challenges, our TCMP-30024 dataset stands out with its comprehensive design and structured methodology. The overview of the overall data processing workflow and evaluation is illustrated in Fig. 1. By leveraging an automated web crawler, we have ensured a diverse representation of species and plant parts, capturing images from various angles and contexts. This not only enhances the dataset’s ability to reflect the true diversity of medicinal plants but also mitigates the limitations posed by manual curation. Furthermore, our systematic data-cleaning process ensures that only high-quality and relevant images are included, providing a strong foundation for subsequent research and model training. The final dataset is verified by human TCM experts. We conduct comprehensive technical validation using three popular image-classification network families, including mobile network, CNN, and Vision Transformer (ViT). Additionally, we propose an advanced data augmentation technique, HybridMix, during the training process, which not only enriches the dataset but also enhances the robustness of models. The largest Swin-Base model25 achieves 89.64% accuracy on our TCMP-30024 dataset. Our dataset, combined with released superior models, set a new standard for future medicinal plant recognition research, making it easier for researchers to reproduce results and build upon our findings.

Fig. 1
figure 1

Overview of the data processing workflow and evaluation for our TCMP-300 dataset.

Among the selected medicinal plants, the most species belong to the following families: Asteraceae (27 species), Lamiaceae (18 species), Liliaceae (14 species), Ranunculaceae and Fabaceae (11 species each), Apiaceae (10 species), Brassicaceae and Solanaceae (8 species each), Rosaceae, Papaveraceae, Scrophulariaceae, Araliaceae, and Campanulaceae (6 species each), while other families contain fewer than 5 species. The selected medicinal plants predominantly contain bioactive compounds such as flavonoids, alkaloids, volatile oils, coumarins, sterols, and tannins. They exhibit clinical effects such as (1) clearing heat and detoxifying; (2) antibacterial and anti-inflammatory; (3) relieving cough and resolving phlegm; (4) reducing swelling and dissipating nodules; (5) promoting blood circulation and resolving stasis. Among them, certain folk medicinal herbs such as bittersweet, black nightshade, and rorippa have not yet been fully explored and utilized. These plants hold significant potential as valuable resources for developing new traditional Chinese medicines in clinical practice.

In summary, the comprehensive and meticulously curated TCMP-30024 dataset represents a significant advancement in the field of medicinal plant research. By facilitating the classification and identification of diverse plant species, this dataset not only enhances our understanding of the medicinal properties of these plants but also accelerates the discovery of new therapeutic agents. For example, the development of artemisinin, a Nobel Prize-winning treatment for malaria, was the result of extensive manual research to identify effective compounds. With a well-structured dataset, researchers could potentially streamline the process of discovering new drugs from nature, significantly improving efficiency in the search for novel therapies. Moreover, this dataset aligns with the United Nations Sustainable Development Goal 3: “Ensure healthy lives and promote well-being for all at all ages.” By providing a robust resource for the classification and study of medicinal plants, it can support initiatives aimed at improving healthcare and promoting the use of traditional medicine. Ultimately, the development of such datasets not only contributes to scientific knowledge but also has the potential to foster sustainable practices in healthcare and biodiversity conservation, paving the way for a healthier future for all.

Methods

Image crawling

The dataset images were collected through the Bing search engine using an automated web crawler that queried with standardized scientific names (following the binomial nomenclature system: Genus species + authority, e.g., Angelica sinensis (Oliv.) Diels and Pulsatilla chinensis (Bunge) Regel) of medicinal plants. After initial automated retrieval, all images underwent rigorous manual and taxonomic validation to ensure botanical fidelity. The final collection includes high-quality images depicting key morphological features of each plant species, such as blossoms, stems, leaves, roots, fruits, and whole-plant profiles, with representative examples shown in Fig. 2.

Fig. 2
figure 2

Examples in TCMP-300 dataset.

Data cleaning

The data-cleaning process began with the development of textual prompts based on medicinal plant labels, followed by the application of CLIP (Contrastive Language-Image Pre-Training)26, a robust vision foundation model, to classify the crawled images. Guided by TCM experts, precise classification thresholds were established for each category, enabling the identification and removal of low-quality samples. Figure 3 illustrates examples of dirty data, such as images containing textual content or pharmaceutical products, which were prevalent in the initial dataset. After the first cleaning phase, we eliminated 52.73% noise images in the initial dataset. Figure 5 compares the number of images per class before and after cleaning, showing a significant reduction from 222~400 to 101~354 images per class, highlighting the poor quality of the original data. To further refine the dataset, TCM experts conducted manual verification, addressing misclassified or overlooked samples. Figure 4 demonstrates two types of errors: mistakenly cleaned samples (e.g., rare variants of “Sambucus williamsii” misclassified due to atypical appearances or obscured features) and mistakenly uncleaned samples (e.g., images with ambiguous visual features or subtle text overlays that evaded automated detection). This comprehensive approach ensured a high-quality dataset for subsequent analysis. Notice that few images in the TCMP-30024 dataset may also inevitably contain visible watermarks or embedded annotations (e.g., numbers). According to the observation from Northcutt et al.27, the noisy data may be a common phenomenon in computer vision machine learning datasets, such as MNIST28, CIFAR-1029, CIFAR-10029, Caltech-25630, and ImageNet31. This further demonstrates that the noisy data is not a problem specific of our dataset but a reality of the computer vision field.

Fig. 3
figure 3

Examples of dirty sample.

Fig. 4
figure 4

Examples of misclassification or overlooked samples.

Fig. 5
figure 5

Comparison of the number of images before and after data cleaning.

Data Records

Folder structure and recording format

The TCMP-30024 dataset, publicly available at https://doi.org/10.6084/m9.figshare.29432726, comprises 52,089 images across 300 TCMP categories. The user could download “tcmp-300-release.tar.gz” and “tcmp-info-20250329.csv”, where the former contains TCMP images and the latter describes class information. The user should run the script “tar -zxvf tcmp-300-release.tar.gz” to extract image files. The dataset is organized into category-specific subfolders following the naming convention “[Class ID].[English Name of TCMP]” (e.g., “001.Veronica persica Poir.”), with each subfolder containing all corresponding medicinal plant images. All images are provided in widely compatible formats (PNG, JPG, JPEG, and WebP), ensuring seamless integration with modern AI frameworks such as PyTorch32.

Data Splitting

Because our dataset exhibits a long tail distribution, to maintain this data characteristic, we divide the dataset based on classes as the basic unit. To split the whole dataset into train and validation subsets, we first define a threshold τ. For class Ci which contains Ci images, i [1, 2,  , n], where n stands for the total class number. We can calculate the image number of its corresponding train subset and validation subset by Equ. (1):

$${N}_{train}^{{C}_{i}}=\lfloor \tau \times | {C}_{i}| \rfloor ,\,{N}_{validation}^{{C}_{i}}=| {C}_{i}| -{N}_{train},$$
(1)

where \({N}_{train}^{{C}_{i}}\) and \({N}_{validation}^{{C}_{i}}\) are the number of images from the training subset and validation subset for class Ci, respectively. denotes round down operation. In our practice, we set τ = 0.7.

Properties

To make our dataset suitable for a wide range of identification tasks in practice, the TCMP-30024 dataset has three valuable properties:

  • Comprehensiveness. Our dataset covers the maximum number of TCMP categories compared to previous datasets. It encompasses six major parts of TCMP: blossom, stem, leaf, root, fruit, and the whole plant. This extensive categorization ensures a complete representation of various TCMP components.

  • Diversity. To augment the dataset’s diversity, images were scraped from the internet encompassing various perspectives, both indoor and outdoor settings, a spectrum of lighting conditions, and complex backgrounds. This mirrors the multifaceted contexts where medicinal plants are typically found. Our dataset merges images captured from diverse environments to ensure the generalization of TCMP recognition.

  • Long-tailed distribution. The distribution of instances across various categories emulates the real-world prevalence of TCM. The image number of frequently utilized medicinal plants substantially surpasses that of their rare counterparts, as illustrated in Fig. 5. This long-tailed distribution requires the model to achieve good performance on the overall categories, even if the sample numbers in some categories are small.

Technical Validation

Model architectures

We adopt popular image classification models, including ResNet33, DenseNet34, RegNet35, MobileNetV336, ShuffleNetV237, EfficientNet38, ConvNeXt39, ViT40, and Swin Transformer25, to evaluate the dataset. The first seven models are CNN backbones, where MobileNet, ShuffleNet, and EfficientNet are lightweight networks for edge devices. The last two models are visual Transformer networks.

Classification framework

Figure 6 shows the pipeline of the TCMP image classification framework. The input is an image x with RGB channels. We first adopt a data augmentation module τ() to preprocess the input image and produce an augmented image \(\widetilde{{\boldsymbol{x}}}\), i.e., \(\widetilde{{\boldsymbol{x}}}=\tau ({\boldsymbol{x}})\). A general image classification network f is applied to classify the augmented image \(\widetilde{{\boldsymbol{x}}}\), e.g., ResNet. The classification network f often includes a feature extractor ϕ for feature learning and a linear classifier ξ to output class probability distribution \({\boldsymbol{p}}=f(\widetilde{{\boldsymbol{x}}})=\xi (\phi (\widetilde{{\boldsymbol{x}}}))\in {{\mathbb{R}}}^{C}\), where C is the number of classes. The feature extractor ϕ includes L stages \({\{{\phi }_{l}\}}_{l=1}^{L}\) to process image features. Each stage ϕl downsamples the input image to refine hierarchical information and generate semantic features. We adopt the traditional cross-entropy loss to the image classification network. Given the input image x and its ground truth y, the cross-entropy loss is formulated as Equ (2).

$$L=-\mathop{\sum }\limits_{c=1}^{C}{\tau }_{c,y}\log \,{{\boldsymbol{p}}}_{c}.$$
(2)

Here, τc,y is an indicator function. If c = y, τc,y = 1, else wise τc,y = 0.

Fig. 6
figure 6

Pipeline of the TCMP image classification framework.

Data augmentation

Following the ImageNet classification, we utilize random cropping and flipping as the standard data augmentation41. The input image is first randomly cropped, ranging from 0.08 to 1.0 of the original image size. The aspect ratio is ranged from 3/4 to 4/3 of the original image aspect ratio. The cropped image is resized to 224 × 224 and horizontally flipped using a probability of 0.5. Finally, it is normalized by the mean and standard deviation over RGB channels.

We further propose HybridMix, an image mixture technique, as the extra data augmentation to improve the model accuracy. As shown in Fig. 7, given an input image, HybridMix chooses Mixup or CutMix with a probability of 0.5 to perform global or local image mixture.

Fig. 7
figure 7

Overview of HybridMix framework.

  1. 1.

    Global Image Mixture: Mixup42. Given two different images xi and xj from the mini-batch, Mixup conducts global pixel mixture using a balancing factor λ to produce a new mixed image \({\widetilde{{\boldsymbol{x}}}}_{ij}\), where λ [0, 1]. The label of the mixed image is also linearly interpolated between yi and yj with the same balancing factor λ to generate mixed label \({\widetilde{y}}_{ij}\). The formulations of \({\widetilde{{\boldsymbol{x}}}}_{ij}\) and \({\widetilde{y}}_{ij}\) are shown as Equ (3).

    $${\widetilde{{\boldsymbol{x}}}}_{ij}=\lambda \ast {{\boldsymbol{x}}}_{i}+(1-\lambda )\ast {{\boldsymbol{x}}}_{j},\,{\widetilde{y}}_{ij}=\lambda \ast {y}_{i}+(1-\lambda )\ast {y}_{j}$$
    (3)
  2. 2.

    Local Image Mixture: CutMix43. The main idea of CutMix is to crop a patch from xj and pad xi with the cropped patch. We represent the bounding box coordinates of the cropped patch on xj as B = (rxryrwrh), where \({r}_{w}=W\sqrt{\lambda }\) and \({r}_{h}=H\sqrt{\lambda }\), λ [0, 1], W and H are the width and height of the image, respectively. rx and ry are uniformly sampled according to rx ~ U(0, W), ry ~ U(0, H). The CutMix image is generated by removing the patch B on xi and filling the corresponding patch cropped from xj. In practice, a binary mask M = {0, 1}W×H fills 0 according to the bounding box B otherwise 1. The output CutMix image and interpolated label can be expressed as:

    $$\widehat{{\boldsymbol{x}}}={\bf{M}}\odot {{\boldsymbol{x}}}_{i}+({\boldsymbol{1}}-{\bf{M}})\odot {{\boldsymbol{x}}}_{j},\,{\widetilde{y}}_{ij}=\lambda \ast {y}_{i}+(1-\lambda )\ast {y}_{j}.$$
    (4)

    Here, denotes the element-wise multiplication, 1 = {1}W×H is the binary mask filled with 1.

Training setup

All models are trained by a stochastic gradient descent (SGD) optimizer with a momentum of 0.9, a weight decay of 1 × 10−4, and a batch size of 128. We utilize a cosine learning rate strategy, the learning rate of which starts at 0.1 and gradually decreases to 0 within the total 300 epochs.

Evaluation metrics

We utilize Accuracy (Acc), F1 score, Balanced Accuracy (B-Acc), and Balanced F1 score (B-F1) to measure the performance of TCMP recognition. The predicted results are often constructed by True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) for each class. Based on TP, TN, FP, and FN, the evaluation metrics of Acc, F1, B-Acc, and B-F1 can be formulated as follows:

  1. 1.

    Accuracy: Accuracy calculates the recall of all samples:

    $$\,{\rm{Acc}}\,=\frac{{\sum }_{c=1}^{C}T{P}_{c}}{{\sum }_{c=1}^{C}(T{P}_{c}+F{N}_{c})},$$
    (5)

    where C denotes the total class number, TPc denotes the True Positive of class c.

  2. 2.

    F1 score: The overall F1 score is the arithmetic mean of the F1 scores for all classes:

    $$\begin{array}{c}{{\rm{Precision}}}_{c}=\frac{T{P}_{c}}{T{P}_{c}+F{P}_{c}},\,{{\rm{Recall}}}_{c}=\frac{T{P}_{c}}{T{P}_{c}+F{N}_{c}},\\ {{\rm{F1}}}_{c}=\frac{2{{\rm{Precision}}}_{c}\cdot {{\rm{Recall}}}_{c}}{{{\rm{Precision}}}_{c}+{{\rm{Recall}}}_{c}},\,\,{\rm{F1\; score}}=\frac{1}{C}\mathop{\sum }\limits_{c=1}^{C}{{\rm{F1}}}_{c}.\end{array}$$
    (6)
  3. 3.

    Balanced Accuracy: Balanced accuracy calculates the average recall rate for each class:

    $$\,{\rm{B}} \mbox{-} {\rm{Acc}}=\frac{1}{C}\mathop{\sum }\limits_{c=1}^{C}\,{{\rm{Recall}}}_{c}.$$
    (7)
  4. 4.

    Balanced F1 score: Balanced F1 score is the weighted average of F1 scores for each class, with the weight being the proportion of samples in each class to the total sample size.

    $${w}_{c}=\frac{{N}_{i}}{N},\,\,{\rm{B}} \mbox{-} {\rm{F1}}=\mathop{\sum }\limits_{c=1}^{C}{w}_{c}{{\rm{F1}}}_{c},$$
    (8)

    where Nc denotes the samples number of class c, N denotes the total samples size.

Image classification accuracy with advanced data augmentation

As shown in Table 2, we evaluate the proposed dataset over various classification models with regular training and advanced data augmentation. Our proposed HybridMix augmentation consistently improves the baseline training with an average improvement of 1.57% across eleven models, including 1.42% for mobile networks, 1.01% for CNNs, and 2.25% for ViTs. The results indicate that HybridMix can enhance the generalization capability for TCMP recognition. The models with various sizes often present distinct performances. The lightweight MobileNetV3, ShuffleNetV2, and EfficientNet-B0 models are often deployed over resource-limited mobile devices, achieving 83% ~ 86% accuracy. The accuracy is generally improved as the parameter size increases. The best models for CNN and ViT are ConvNeXt-Tiny and Swin-Base, which achieve 89.43% and 89.64%, respectively. The results demonstrate that large models reach the best accuracy performance. Given the class imbalance in the dataset, we also report balanced accuracy (B-Acc), F1 score, balanced F1 score (B-F1) alongside overall accuracy. We can observe that HybridMix leads to consistent improvements on B-Acc, F1, and B-F1 metrics. The results show that our models trained by HybirdMix work well on the class imbalance scenario.

Table 2 Image classification accuracy (%) over multiple models with baseline training and advanced HybridMix augmentation on our proposed dataset.

Image classification accuracy with various input resolutions

As shown in Table 3, we investigate the effectiveness of the proposed dataset with various input resolutions. We conduct experiments over multiple classification models to increase the confidence of the results. We can observe that the accuracies of classification models generally improve as the input resolution increases. When the input resolution is increased from 224 × 224 to 336 × 336, the classification models achieve an average accuracy gain of 0.84%. When the input resolution is increased to 448 × 448, the average accuracy improvement is up to 1.65%, and RegNet-X achieves the best 89.54% accuracy. Moreover, we also observe that the larger input resolution enhances B-Acc, F1, and B-F1 metrics consistently. The experimental results demonstrate that classification accuracy benefits from larger input resolutions.

Table 3 Image classification accuracy (%) over multiple models with various input resolutions on our proposed dataset.

Analysis of accuracy curve during the training process

Figure 8 shows the accuracy curves of the CNN and ViT models during the training process. We find that the accuracy curve often steadily increases as the training epoch proceeds. Accuracy improves rapidly during the first 50 training epochs and tends to grow slowly after the 50th epoch. The network converges at the 300-th epoch and reaches the best accuracy.

Fig. 8
figure 8

Accuracy of image classification dataset on our proposed dataset.

Visualization of learned feature spaces

Figures 9 and 10 show t-SNE44 visualization of learned feature spaces over various networks from random and hard samples, respectively. For clear visualization, we randomly sample 10 classes from the dataset. In figures, each cluster including points with the same color represents a unique class. As shown in Fig. 9, we can observe that each network exhibits well intra-class compactness and inter-class separability. The visualization results indicate that each network learns a discriminative feature space, leading to good classification performance. Moreover, we also analyze the feature space of hard samples in Fig. 10. Here, the hard sample is defined as the misclassified sample by the AI model. We find that the feature space of hard samples is less discriminative than that of random samples. This indicates why hard cases are easy to be misclassified.

Fig. 9
figure 9

Visualization of learned feature space from random samples by t-SNE44.

Fig. 10
figure 10

Visualization of learned feature space from hard cases by t-SNE44.

Visualization of attention heatmaps

Figures 11 and 12 show attention heatmaps using a pretrained ResNet-50 model by Grad-CAM45. As shown in Fig. 11, the visualization results indicate that the model trained by our TCMP-30024 dataset can exactly capture fine-grained discriminative features for critical parts, such as blossom, stem, leaf, root, fruit, and plant, while ignoring the noise parts. The captured critical features are classification evidence for robust and accurate TCMP recognition. Moreover, we also found a small number of failure cases when the model focuses on inaccurate TCMP-related features, as shown in Fig. 12. This means that AI models may also have potential biases on few TCMP images.

Fig. 11
figure 11

Visualization of attention heatmaps with accurate TCMP-related features by Grad-CAM45.

Fig. 12
figure 12

Visualization of attention heatmaps with inaccurate TCMP-related features by Grad-CAM45.

Usage Notes

Anyone can access TCMP-30024 dataset from figshare platform. The standard PyTorch dataloader module could be called to load the dataset for training and validation. We also release some superior AI models pretrained on the TCMP-30024 dataset as baselines and references for future research to explore more powerful AI models. As shown in Fig. 13, we also open a leaderboard of our dataset over paperwithcode at https://paperswithcode.com/sota/image-classification-on-tcmp-300. Anyone can update the best accuracy online to promote the progress of TCMP recognition.

Fig. 13
figure 13

Overview of the online leaderboard for TCMP recognition.

To facilitate clinical practice, we deploy our model into an online web application over HuggingFace. As shown in Fig. 14, anyone can visit this system at https://winycg-tcmprecognition.hf.space/?__theme=system&deep_link=trNDfB9dwx8and upload an image for online plant recognition. Moreover, the system can also output top-5 class probabilities to interpret classification evidence.

Fig. 14
figure 14

Overview of the online web application for TCMP recognition.

Our dataset provides a platform to expand the number of categories and their numbers of images. We release ready-made tools for image crawling and data cleaning. The users can further construct personalized datasets according to their requirements. We hope our TCMP-30024 dataset can become a larger-scale dataset with more diverse categories and samples.