Introduction

Medicinal plants refer to botanical species that are used to remedy numerous diseases and health concerns affecting human beings. Numerous varieties of herbal remedies exist, and their characteristics can differ significantly from one location to another, leading to a consistent pattern in terms of “proportions” and “appearance.” roots to their leaves. Noteworthy for their medicinal qualities, the leaves of various herbs like Karpooravalli (Coleus amboinicus), Podina (Mentha arvensis), Neem (Azadirachta indica), and Thudhuvalai stand out. A diverse array of plants demonstrates medicinal qualities, and precise detection of these species is crucial for the discovery and development of new pharmaceutical agents. Furthermore, an in-depth understanding of plant species enhances crop yields and promotes the progress of sustainable agricultural practices.

The medicinal plant has been utilized for the production of medication owing to its abundant nutritional content1. Medicinal herbs are extensively utilized for their qualities, including antioxidant, anti-fungal, anti-allergic, and anti-bacterial effects2. Different types of plants provide therapeutic qualities, which can include trees, shrubs, or herbs. According to the data, approximately 14–28% of plants possess medical characteristics and can be utilized to create potent medications3,4. Numerous organizations are involved in the development of plant-based medications that effectively address various chronic diseases while minimizing the side effects commonly linked to synthetic compounds5,6.

The leaves of a plant are the predominant component utilized in numerous herbal remedies with variations in terms of form, size, texture, and color7. Hence, it is difficult to precisely distinguish it because of the seeming resemblance of the leaves8. Hence, the variation in the developmental stage renders leaf color an unreliable criterion for distinguishing medicinal plants9. Identifying and utilizing these therapeutic plants is a laborious undertaking, and even experts require a significant amount of effort to accurately identify the appropriate plant10.

The market for herbal medicines is currently overrun with phony and subpar goods that endanger human health and impede the sustainable development of the world. Studies aiming at creating instruments for classifying herbal remedies are therefore becoming more and more popular11. It is widely accepted that plant leaves have distinctive qualities that are simple to remove and analyze. As a result, they are frequently used as the primary basis for identifying various therapeutic herbs. Automatic computer image recognition has become more widespread in the field of image processing technology has progressive12.

In recent years, a diverse array of advanced Artificial Intelligence (AI) techniques has been utilized to support in the classification and detection of medicinal plant leaves13. These techniques have demonstrated considerable potential in distinguishing between visually similar leaf types, thereby supporting researchers, botanists, and agricultural professionals14 in automating the recognition process with a high degree of accuracy15,16. Among these AI methodologies, Deep Learning (DL) has surfaced as the most notable and effective strategy due to its robust feature extraction capabilities. DL techniques have proven particularly adept at capturing both low-level and high-level features from complex image data, facilitating the modeling of intricate patterns, textures, and shapes that are characteristic of medicinal leaves17,18.

Traditional machine learning techniques, although beneficial, often rely heavily on manual feature engineering and may encounter difficulties in generalizing across various plant species or environmental conditions19. Conversely, deep learning models possess the capability to autonomously learn hierarchical feature illustrations from raw image data20, leading to significant enhancements in classification performance and robustness. Additionally, by incorporating hybrid approaches—such as the combination of handcrafted features with deep features—researchers can bolster the discriminative power of the model21. Despite these advancements, challenges such as inter-class similarity, intra-class variability, and the scarcity of annotated datasets continue to exist22, highlighting the need for more innovative strategies like federated learning 23, feature fusion, and domain adaptation to ensure broader applicability and effective real-world deployment 24. Several visual approaches have been used to differentiate plant leaves 25, but none have specifically supported the differentiation of medicinal plants26, 2728. Despite the improvements made to the models, achieving desired outcomes using enhanced methodologies remains challenging29. A real-time approach was introduced to addresses the challenges encountered in detecting medicinal leaves.

An automated computer vision system is essential for assisting researchers and farmers in the precise and efficient detection of medicinal plant leaves. Thus, the research presents a novel method that gathers leaf characteristics using several strategies for feature extraction under diverse settings. The proposed approach employs a Neighborhood Component Analysis-Convolutional Neural Network (NCA-CNN) framework to effectively integrate and optimize feature representations. The classification process commences with the utilization of RGB leaf images, from which a combination of handcrafted features—such as Local Binary Patterns (LBP) and Histogram of Oriented Gradients (HOG)—are extracted. The resulting fused feature vector is subsequently fed into a CNN-based classifier to categorize the medicinal leaf images with enhanced accuracy and robustness.

The entire work is arranged in several folds. Several cutting-edge techniques for identifying medicinal leaves are described in Sect. 2. The methods and processes used in the investigation are described in depth in Sect. 3. Section 4 discusses the investigational evaluation and its analysis. Section 5 brings the conclusive section.

Related work

Herbal plants are essential in conventional medicine, and novices in the area may find it challenging to correctly identify the suitable plant species. Various plant portions, such as roots, flowers, stems, and leaves, are used to extract healing compounds. The study concentrates on analyzing plant leaves to categorize them into medicinal groups. To facilitate the accurate detection of medicinal leaves, a range of sophisticated techniques were utilized, as elaborated in the subsequent sections.

Machine learning techniques

The algorithms are essential to categorized leaf images, facilitating the automated recognition of plant species through the analysis of extracted features. The rationale behind the assertion is elaborated upon below. Firstly, the detection of medicinal plants poses significant challenges, as many of these species are located in dense forests, and their leaves often exhibit similar appearances. A misdetection of a herb can result in severe health risks, potentially leading to fatal consequences. Various methods exist for plant detection; however, the traditional approach relies on manual detection and prone to error30. To mitigate the issue, numerous researchers have developed programmed recognition technique31. For example32, introduced a framework for classifying medicinal plants based on the shape and color features of leaves, utilizing a Support Vector Machine (SVM) classifier on an optimized dataset, achieving an accuracy of 96.66%33. presented a plant recognition system that focused on leaves, employing 50 medicinal images sourced from Google Images and utilizing edge detection algorithms. They implemented a texture patch classification approach using Neural Networks (NN), resulting in an accuracy of 97.80%. Additionally34, developed a classification system for sugarcane leaf diseases caused by fungi, employing a triangle threshold method for leaf segmentation and achieving an accuracy of 98.60%.

35 introduced a leaf image classification methodology utilizing Convolutional Neural Networks (CNN). They gathered a dataset comprising 12,673 soybean leaf samples and implemented the LeNet architecture, achieving a classification accuracy of 98.32%36. developed a hybrid framework for recognizing Romanian medicinal plants, employing both color and grayscale images and resulting in an accuracy of 92.9%37. proposed an innovative method for plant disease detection and extracted integrated color texture and histogram characteristics, applying various ML and yielding a notable accuracy of 98.4%38. presented the Local Binary Patterns (LBP) technique for classifying plants through leaf images. They focused on extracting texture features and removing salt and pepper noise using ML methods, ultimately achieving a cumulative classification accuracy of 93.5%39. introduced a classification system for citrus plants based on leaf images. They extracted a dataset of 57 multi-features and optimized it to 15 features using ML techniques. Various classifiers were tested, with the Multi-Layer Perceptron (MLP) demonstrating superior performance, achieving an accuracy of 98.14% on a region of interest measuring 256 × 256 pixels.

Consequently, as supplemental and alternative methods for detection tasks, a large number of automated recognition systems using image processing technologies have been put into place. Despite their apparent usefulness, these contemporary methods frequently decrease short in terms of accuracy and precision when production through leaves in different lighting conditions40. Consequently, several automated deep learning methods have been advanced to facilitate the detection of various plant species, as comprehensive in the succeeding sections.

The detail descrition of machine learning methods are elloborated in Table 1.

Table 1 Detection with machine learning method.

Deep learning methods

Deep learning has shown extraordinary success into multiple fields, achieving cutting-edge performance in agriculture, healthcare, and industry. In the agricultural sector, it facilitates accurate crop monitoring and disease identification. In the healthcare domain, it improves diagnostics, medical imaging, and tailored treatment. Within the industrial sphere, deep learning drives automation, defect identification, and predictive maintenance, transforming conventional processes. In order to distinguish between 64 distinct types of medicinal plants with high precision, a variety of deep learning models were employed in50. In51, a three-layer model was presented for identifying medicinal leaves and attained an impressive accuracy rate of 92%. CNN has been used in conjunction with a variety of learning scales to classify plant leaves. Using different learning branches, these scales were used to extract characteristics52. Compared to the conventional CNN model, the approach produced more efficient results.

DL model was suggested and trained using a dataset of 800 photographs of medicinal leaves, which were classified into four different species and attained excellent performaace53. In54 leaf photos were categorize using DL and ML method utilizing various types of features and accomplished the utmost accuracy rate of 82.38%. Within the framework of the DL classifier, VGG 16 displayed superior performance, with an accuracy rate of 97.14%. The attention-based model, proposed in55, was developed to identify plants based on their flowers and achieved an accuracy rate of 97.2%. The recognition has been achieved using a dataset consisting of 102 flower photos. A detection technique was utilized in56,57 to identify Malaysian herbs using form and texture characteristics, achieving a notable accuracy of 98%.

In58,59,60, a CNN model was proposed and reached an accuracy of 97.6%. The CNN technique was utilized to categorize 32 plant species, resulting in an accuracy of 94.7%.

In, the features were initially collected using the local binary pattern method. These retrieved features were then utilized to differentiate between 30 leaves of medicinal plants and attained an accurateness of 56%. A mobile application was created in61 using Support Vector Machines (SVM) and Deep Neural Networks (DNN) to categorize medicinal plant herbs. The application efficiently analyzed the image and reached a classification accuracy of 93%. Correspondingly, a feature fusion technique was subsequently employed in62 to develop decision criteria for classifying medicinal plant herbs, which consisted of 51 different species. Probability neural networks were utilized in the particular scenario to categorize leaves according to their diverse morphological characteristics, resulting in a 74% accuracy rate.

A composite classifier was constructed using a grouping of a backpropagation neural network and a weighted K-nearest neighbors algorithm (Algorithm 1). The classifier was utilized as a mobile application to categorize 220 different plant species. The classifier that was suggested attained an accuracy rate of 93%. The study utilized leaf form traits and color histograms to classify 32 different leaf kinds, achieving an accuracy rate of 87%. A mobile application was created in63,64 to categorize 126 types of trees with a precision rate of 90%. In65, a DL methods were utilized to classify 33 dissimilar types of leaves. Among these models, MobileNetv2 confirmed superior performance, per an accuracy rate of 99%. A convolutional neural network (CNN) was utilized in66 to categorize 129 different forms of leaves collected from diverse locations, achieving an accuracy rate of 60%. In their study, the authors created a smartphone application that uses morphological characteristics to distinguish between five diverse sorts of plants. In the study, the author utilized the Sobel and Otsu approach to extract morphological features and conducted color-based categorization. In the year 52, a mobile application was employed to categorize 15 types of leaves using Convolutional Neural Network (CNN) models. Multiple CNN models were utilized to classify plant leaf species and accomplished an accuracy of 81.6%. The morphological characteristics of leaves were employed in67 to distinguish six variants of species, resulting in an error rate ranging from 17 to 35%. The detailed analysis of detection approaches are dipcted in Table 2.

Table 2 Model summary.

Various deep learning models have been utilized to propose smart solutions to identify medicinal leaves but the solutions were lacking in correctly identifying leaves with better precision under low luminous conditions. The current research proposed a better solution to identify medicinal plant leaves with a precision rate under low luminance conditions. In the paper, we proposed a fusion deep learning method that initially, used numerous feature extraction techniques to extract important features from plant leaves and afterward employed features are fused in a matrix to further identify plant leaves.

Various feature extraction methods have been employed for a variety of work after extracting features from images as described below:

In85 various feature extraction methods were employed to extract valuable features for CASIA iris image dataset. The extracted features were further used to compare the outcomes of the method and finally Gobar Wavelet Transform method outperformed.

A compound local binary pattern feature extraction technique were employed in86 towards extract features from the CASIA dataset and further used to classify with an accuracy of 96%. A correlation-based statistical approach was employed to address the correlation between adjacent pixels87.

The transformation were utilized in88 to define the boundary within the eye. Individual patterns were extracted in a feature vector with discrete wavelet transform and later classification was accomplished with a distance measure classifier. The iris features were extracted with the normalization method to distinguish eyelid and eyelash.

The research contributions as follows:

  1. 1.

    A model called FF-NCA-CNN was created to classify medical photos by combining several features.

  2. 2.

    In the suggested methodology, the image features were obtained by utilizing both handcrafted and pre-trained models. These features were then combined into a fusion matrix using Canonical Correlation Analysis (NCA).

  3. 3.

    The proposed model was specifically developed to operate in settings with low levels of illumination.

In the suggested model, the key characteristics were chosen via neighbor component analysis.

Dataset description and methodology

In this section of the paper covers the dataset description and proposed method that is used to distinguish plant leaves. The experiment involved the use of 6904 medicinal leaves, which were classified into 80 different types (https://www.kaggle.com/datasets/warcoder/indian-medicinal-leaf-image-dataset)89. Figure 1 displayed a small number of representative leaves, whereas Table 3 provided a comprehensive description of the specific properties of these leaves. The analysis utilized 1835 leaf photos obtained from a publicly accessible collection.

Fig. 1
figure 1

Sample images from medicinal leaves dataset.

Table 3 Dataset leaves description.

Proposed methodology

The proposed methodology begins by pre-processing the source image and enhancing the images by augmentation operations, such as flipping and rotation. Subsequently, the photos were employed to extract characteristics using feature-fusion deep learning methodologies. The comprehensive process is detailed in Fig. 2 below:

Fig. 2
figure 2

Methodology for medicinal leaf recognization.

Image pre-processing and augmentation

The approach begins with image preparation, during which the images undergo preprocessing to remove backgrounds. Here, the photographs were uploaded and processed automatically. The pre-processing procedures for background removal are outlined in Fig. 3 as follows:

  1. 1.

    Adjust the dimensions of the photos to 224*224, 256*256, and 128*128.

  2. 2.

    Optimization techniques were employed to further refine and improve the threshold.

  3. 3.

    Eliminate vacant pixels from the provided photos.

  4. 4.

    Invert the binary mask.

  5. 5.

    Substituting masks with actual picture pixels.

Fig. 3
figure 3

Image pre-processing of the leaf image.

The quality and performance of method are consistently influenced by the size of picture pixels90. Consequently, the pixels of images are modified through pre-processing. The procedure is accelerated by adjusting the original photos using several versions such as 224*224, 256*256, and 128*128. Preprocessing the images before training a deep network is not mandatory. However, in the research, we have thoroughly investigated various DL methods to improve performance.

To achieve effective and efficient training of networks, a substantial amount of data is required, which can be further improved by augmentation operations. The augmentation operations were employed to overcome the underfitting issue and enhance detection quality91. Figure 4 illustrates the several procedures employed to improve the initial dataset.

Fig. 4
figure 4

Augmentation techniques.

Feature extraction(FE)

The feature extraction were utilized to prevent repetition and facilitate the detection of primary traits from a comprehensive range of attributes. Within the agricultural field, a range of characteristics were employed to differentiate leaves, including HOG, color, texture, and other factors. The texture of plant leaves is the most crucial characteristic for leaf recognition. In the study, the researchers extract geometric and LBP features for classification.

The suggested methodology involves extracting features using handmade techniques such as LBP. and pre-trained DL model. The next section provides a comprehensive explanation of feature extraction.

Handcrafted feature extraction

Within the specific group of features, the texture and LBP features are expanded. Below, the specifics of the feature extraction process are outlined.

The feature extraction methodology employed in this research encompasses both geometric and textural attributes to progress the classification accuracy of medicinal plant leaves. Geometric characteristics such as area, perimeter, minor axis, major axis, orientation, and shape ratio are obtained from the segmented images of the leaves. The area is determined by aggregating the pixel values within the leaf’s region, while the perimeter is estimated through a combination of boundary segments.

The major and minor axes denote the length and width of the leaf, correspondingly, and are computed based on specific distance metrics. Orientation is quantified using a tanh-based function that indicates the angular position of the leaf, and the shape ratio evaluates the leaf’s area in relation to its bounding box, thereby illustrating its compactness. In addition to geometric features, this study integrates Local Binary Patterns (LBP) to derive textural characteristics. LBP examines the spatial configuration of pixel intensities by contrasting each pixel with its neighboring pixels, encoding local texture into binary patterns that represent edges and textures.

This integration of geometric and LBP-derived features effectively captures both the structural and visual attributes of leaves, leading to a thorough feature representation that enhances the classification model’s performance.The texture properties are essential for the classification and detection of medicinal leaf pictures. Thus, the suggested methodology involves extracting texture features using both feature extraction strategies. The texture of leaves can vary depending on their characteristics. Texture features, also known as form features, can be used to identify leaves by analyzing their shapes. The features pertain to the characteristics of lines, edges, corners, blobs, and ridges92,93. The formal depiction of extraction is outlined from (1) to (6).

$$\:Area=\:{\sum\:}_{i=1}^{N}\sum\:_{j=1}^{M}L\left[i,j\right]$$
(1)
$$\:Perimeter=\:\phi\:2m+2k+2l$$
(2)
$$\:major\:axis=r+p$$
(3)
$$\:minor\:axis=\sqrt{{r}^{2}+{p}^{2}-{f}^{2}}$$
(4)
$$\:\theta\:=\:\frac{\text{tanh}\left(M-L+\sqrt{{\left(M-L\right)}^{2}+{D}^{2}}\right)}{D}$$
(5)
$$\:t=\:\frac{Area}{Area\:of\:Bounding\:Box}$$
(6)

In the context, ‘r’ and ‘p’ denote the distances, ‘f’ exemplifies the distance from the focus point, ‘θ’ characterizes the orientation, ‘M’, ‘L’, ‘D’, and ‘l’ indicate length, and ‘k’ represents with LBP.

It is employed to analyze the local texture structures of an object. Textures are defined by spatial patterns rather than individual pixels. LBP utilizes various primitives to describe complex patterns in the supplied image and represented in 7.

$$\:{L}_{Bin\left(k,m\right)}=\:\sum\:_{k=0}^{k-1}p({d}_{k}-{d}_{l}\left)\right({2}^{k}p\left(x\right))=\left\{\begin{array}{c}1\:\:if\:x\:\ge\:\:0;\\\:0,\:Otherwise\end{array}\right.$$
(7)

The size of the vector extracted is \(\:{L}_{Bin(k,m)}\) for all dataset sizes.

Feature extraction with pre-trained model

The suggested methodology utilizes a pre-trained model to extract significant features. Pre-trained models are the optimal and very efficient methods for extracting features from picture collections. The fundamental element of a deep neural network is feature extraction, and convolution has been utilized to extract significant information. Convolution layers are exploited to extract features since the image to detect and locate them. One way to enhance the models is by incorporating different configurations, such as adding hidden layers and adjusting the learning rate and number of epochs. Various pre-trained models have been utilized as feature extractors, however, selecting the appropriate model has enhanced accuracy. The suggested methodology utilizes eight distinct pre-trained models which extract key features since the image dataset. The models’ performance was assessed using performance parameters. The pre-trained models have been utilized to extract the visual characteristics of photos.

Algorithm 1
figure a

Features Selection Algorithm.

Feature fusion and reduction

A NCA approach is used to produce a unified vector from the previously extracted features in the section. The fusion matrix is generated through the combination of many features that were retrieved in the previous stage. The performance is enhanced by the feature fusion method, as each feature is linked to distinct characteristics and the combined result demonstrates superior output. Despite the increased computational expense, the technique yields superior results. In addition to the benefits of features, there are also certain drawbacks. Therefore, the study utilizes Canonical Correlation Analysis (NCA) for fusion procedures. Subsequently, the suggested methodology utilizes NCA to extract key features in order to reduction the dimensionality of the feature matrix.

The entire procedure of feature fusion is illustrated below: Suppose \(\:\{{d}_{i}{\}}_{n=1}^{n}\) \(\:\in\:{P}^{i}\), \(\:\:\{{E}_{i}{\}}_{n=1}^{n}\:\in\:{P}^{j}\), \(\:\{{G}_{i}{\}}_{n=1}^{n}\:\in\:{P}^{m}\)here i, j and m depict sample space, n is observation size, NCA objective to discovery estimate directories:

\(\:{p}_{l\:}\in\:{P}^{i}\), \(\:{p}_{m}\in\:{P}^{j}\) and \(\:{p}_{n\:}\in\:{P}^{k},\) Which raises a correlation between \(\:{p}_{l}^{T}L,\:{p}_{m}^{T}M\:and\:{p}_{n}^{T}N\:where\:L=\:\left[{l}_{1},{l}_{2},\dots\:,{l}_{n}\right],\:\:M=\:\left[{m}_{1},{m}_{2},\dots\:,{m}_{n}\right],\:N=\:\left[{n}_{1},{n}_{2},\dots\:,{n}_{n}\right]\), depicts the sample matrix. A formal representation of NCA is described below im (8):

$$\:\rho\:=\:\frac{{p}_{m}^{}{p}_{l\:}^{T}{CCA}_{lmn}{p}_{m\:}}{\sqrt{\left({p}_{l}^{T}{CCA}_{ll}{p}_{l\:}\right)\left({p}_{m}^{T}{CCA}_{mm}{p}_{m\:}\right)\left({p}_{n}^{T}{CCA}_{nn}{p}_{n\:}\right)}}$$
(8)

Here\(\:{CCA}_{lmn}=LM{N}^{T}\), defines a covariance matrix. The eigen value and eigen vector representation is given below in (9)

$$\:\:\:\left|\begin{array}{ccc}{CCA}_{lmn}{{CCA}_{nn}}^{-1}{CCA}_{mln}&\:0&\:0\\\:0&\:{CCA}_{nlm}{{CCA}_{mm}}^{-1}{CCA}_{lnm}&\:0\\\:0&\:0&\:{CCA}_{mln}{{CCA}_{ll}}^{-1}{CCA}_{mnl}\end{array}\right|\:\left\{\begin{array}{c}{p}_{l\:}\\\:{p}_{m}\\\:{p}_{n\:}\end{array}\right\}=\lambda\:\left\{\begin{array}{c}{p}_{l\:}\\\:{p}_{m}\\\:{p}_{n\:}\end{array}\right\}$$
(9)

Here, \(\:\begin{array}{c}{p}_{l\:\:}=[{p}_{l1},{p}_{l2},\dots\:.,\:{p}_{lf}]\\\:{p}_{m}=[{p}_{m1},{p}_{m2},\dots\:.,\:{p}_{mf}]\\\:{p}_{n\:}=[{p}_{n1},{p}_{n2},\dots\:.,\:{p}_{nf}]\end{array}\) and fused feature matrix is represented as V(i) as given in (10) :

$$\text{V(i)} =\:\left\{\begin{array}{c}{p}_{l}^{T}l\\\:{p}_{m}^{T}m\\\:{p}_{n}^{T}n\end{array}\right\}$$
(10)

here V(i) represents a fussed feature matrix.

In the matrix V(i) generates after maximum correlation between attributes that discard irrelevant information. Therefore, it must remove irrelevant information from the feature matrix. Hence, in the methodology, NCA has been employed to reduce the feature matrix.

Classification based on reduced features

The research utilized a convolutional neural network consisting of many non-linear changes for classification purposes. CNN has a sophisticated usage of artificial neural network used for identifying patterns to detect, segment, and classify objects. The CNN design comprises multiple layers, comprising pooling, convolutional, and fully linked layers. The convolution layer into CNN is employed to extract significant information for subsequent classification and segmentation tasks. The convolution layer often detects corners, edges, and blobs to extract key characteristics. The pooling layer is responsible for both scaling and sampling, while other layers are used for parameter training and to prevent overfitting38. The layers are structured according to neurons, biases, and weights. Neurons in a fully linked layer were used to transform multidimensional features into one-dimensional features for subsequent classification39. Figure 5 illustrates the architecture consisting of 5 convolutional layers and a classifier block, as seen in Fig. 5. The following information provides a comprehensive description of the CNN block.

Each convolution block consists of convolution layers with 3*3 kernels that are utilized to extract information such as color, edge, and geometric properties. The ReLu activation function is used to transmute the data into a linear form. A batch normalization layer is fused to extract profound characteristics and diminish the number of training iterations. Subsequently, the feature map was reduced using 2*2 pooling layers. Dropout layers are incorporated to prevent overfitting in the network. The result of the CNN block is served into the classifier block. The classifier block comprises various block as shown in Fig. 5.

Fig. 5
figure 5

Classifier architecture.

Figure 6 depict the findings of the first activation layer.

Fig. 6
figure 6

First activation layer Outcomes.

Performance evaluation metrics

The model’s performance was illustrated through a confusion matrix based on various metrices:

$$\:{A}_{cc\:\:}=\frac{{T}_{p\:}+{T}_{N}}{{T}_{p}+{T}_{N}+{F}_{P}+{F}_{N}}\:\:$$
(11)
$$\:Pre=\frac{{T}_{p\:}}{{F}_{P}+\:{T}_{p\:}}\:\:$$
(12)
$$\:{R}_{call}=\:\:\frac{{T}_{p\:}}{{F}_{N}+\:{T}_{p\:}}\:\:$$
(13)
$$\:{F}_{score}=\:\:2\text{*}\frac{{R}_{call}\text{*}\:Pre}{{R}_{call}+\:Pre}$$
(14)

Results analysis and discussion

The proposed methodology was implemented on the Python Co-Lab platform. For the experiment, a dataset comprising medicinal plant leaves was employed to train and assess the suggested approach. The study was carried out with a variety of optimized hyperparameters, such as epochs, learning rate, and batch size. Nonetheless, in the paper reports the outcomes, which demonstrate remarkable results.

The previous section outlined a potential methodology that utilized a limited number of hyperparameters to establish the experiment. Table 4 describes the hypermeters. By fine-tuning the hyperparameters, the suggested feature fusion model surpasses the performance, as illustrated in the Figures below.

Table 4 Details of hyperparameter.

Hyperparameter setting

Here, the experiment was setup with various hyperparamters to improve performance as depicted in Table 4. During the training process, 80% of the dataset were used. Furthermore, 10% is reserved for validation throughout to evaluate the model’s working and decrease the chance of overfitting. In addition, to guarantee an detached assessment of the model’s efficiency, afterward, the testing was performed with remaining data. The testing was perfomed with uncovered data, thus simulating its performance in unaware conditions. In the approach enabled us to assess the model in a widespread conditions, which is the main intention of a feature model.

In the training phase, we utilized batch sizes ranging from 40 to 100, indicating the quantity of input images the model handles simultaneously before altering the model’s parameters. Although a larger batch size may result in more accurate gradient approximations, it also requires additional memory [90 F]. Additionally, average pooling is utilized in FF-NCA to decrease the spatial dimensions for feature maps by averaging values within the feature map region, leading to a more compact output feature map. A “softmax” function has also applied in the output layer to map the final layer’s output, enhancing the understanding of the model’s predictions. To guarantee consistent outcomes from the models, the seed is configured to 42. The model’s constraints are adjusted during the training phase for consistent training and evaluation of the model.

The “Adam” optimizer function was employed, which improves the stochastic gradient descent optimization method. In the function autonomously modifies the learning rate for every parameter, which advances the model’s steadiness and effectiveness during training while also correcting biases in both the first and second moments of the gradient. In the research, we assess the effectiveness of the feature fusion model by examining sensitivity and accuracy11. Because of the interrelated nature of different evaluation metrics, it is difficult to rank AI models exclusively by their accuracy.

Training and validation of feature fusion model

The research included training the suggested model using images of various dimensions, particularly 128 × 128, 224 × 224, and 256 × 256 pixels. The performance metrics of the model fluctuated across these various situations. Significantly, a boost in performance was noted when using an image size of 224 × 224.

Figure 7 illustrates the proposed model’s performance after 200 epochs through a learning rate set at 0.1, using the 224 × 224 image dimension. The graph indicates a direct correlation between the model’s performance and the training process. It is evident since Fig. 7 that improved training for model leads to increased accuracy and decreased error rates.

Fig. 7
figure 7

Accuracy and Error Rate with Epoch = 200, LR = 0.1 and Image Size = 224*224.

Figure 8 demonstrates model performance with 200 epochs and a learning rate of 0.1, utilizing an image dimension of 128 by 128 pixels. The graph reveals a distinct correlation between the model’s performance and its training process. It is evident from Fig. 8 that an increase in the model’s training leads to enhanced accuracy and a reduction in error rates.

Fig. 8
figure 8

Accuracy and Error Rate with Epoch = 200, LR = 0.1 and Image Size = 128*128.

Figure 9 presents a comparison underwent training for 200 epochs, utilizing a learning rate of 0.1 and an image size of 256*256 pixels. The graph demonstrates a clear relationship between the model’s training duration and its performance metrics. It is evident from Fig. 9 that an increase in training correlates with improved accuracy and a reduction in the error rate.

Fig. 9
figure 9

Accuracy and Error Rate with Epoch = 200, LR = 0.1 and Image Size = 256*256.

Figure 10 illustrates the evaluation of the projected model’s accuracy and error rate. The model was trained with 150 epochs and a learning rate of 0.1, using images of size 224*224. The graph illustrates a direct correlation between the performance for a model and its training. Figure 10 illustrates that cumulative the training of the model centrals to higher accuracy and lower error rates.

Fig. 10
figure 10

Accuracy and Error Rate with Epoch = 150, LR = 0.1 and Image Size = 224*224.

A comparative analysis was performed utilizing seven distinct models: MobileNet, VGG16, ResNet50V2, Xception, InceptionV2, and GoogleNet201, all sourced from a medical dataset. The accuracy curve for these models is illustrated in Fig. 11. In the initial epochs, the dataset exhibited heightened losses, attributed to the complex backgrounds present in the field data. Amongst the evaluated models, GoogleNet exhibited the lowest validation accuracy. As the epochs progressed, a decline in validation accuracy was observed, suggesting an enhancement in the models’ robustness. During the first four epochs, several models experienced elevated validation loss; however, in the loss gradually diminished after the 50th epoch. Figure 11 indicates that the models achieved optimal performance with further training unlikely to yield significant performance enhancements. Conversely, the validation dataset displayed a rising trend in validation accuracy over time, signifying which models were effectively simplifying.

Fig. 11
figure 11

Comparative analysis.

Assessment of feature fussin model

Table 5; Fig. 12 presents the assessment details of var ious DL models. Notably, the models exhibited comparatively lower performance when processing images of 256 × 256 dimensions. For the dataset with 224 × 224 images, the feature fusion model attained an exceptional outcome of 0.9886. Overall, the feature fusion model distinguished itself by achieving the highest precision and accuracy across all dataset dimensions.

Table 5 Performance assessenment of anticipated method with contemporary methods.
Fig. 12
figure 12

Performance assessment.

It is essential to assess various performance metrics to comprehensively estimate the efficiency. These metrics offer diverse insights into the model’s performance, thereby facilitating exact estimates and providing an complete assessment of its efficacy.

Figure 13 provides a comparative analysis of the performance of the FF-NCA-CNN model on two separate medicinal plant leaf datasets: the Mendeley dataset and the Flavia dataset. The assessment is grounded in four essential performance metrics—Accuracy, Recall, Precision, and F1-Score—which together estimate the model’s efficacy in leaf classification tasks. The Mendeley dataset, serving as the primary dataset, consistently demonstrates high results across all metrics, achieving an accuracy of 98.90%, precision of 98.70%, recall of 98.75%, and an F1-score of 98.72%. The Fig. 14 illustrate the model’s outstanding capability to accurately and reliably classify medicinal leaves.

Conversely, the Flavia dataset was exploited to evaluate the model’s generalization abilities on an external dataset. Although there was a insignificant decline in performance, it remained strong, with accuracy recorded at 96.35%, precision at 96.10%, recall at 96.42%, and an F1-score of 96.25%. The observed performance decrease is attributed to variations in image quality, class diversity, and the features of the dataset. Nonetheless, the model preserved a strong predictive capacity, suggesting that the federated feature fusion method employed in FF-NCA-CNN is both adaptable and dependable across various datasets. This analysis underscores the model’s relevance in practical applications involving a range of medicinal plant species and environments.

Fig. 13
figure 13

Model performance comparison with Two different dataset.

Figure 14 presents a comparative analysis of various deep learning and hybrid models utilized across different datasets. The FF-NCA-CNN model exhibits exceptional performance, attaining an accuracy of 98.9% on the Mendeley dataset, positioning it among the leading techniques in the domain. In comparison to other prominent models such as MobileNetV2 (98.97%), ResNet152 in conjunction with Inception-ResNetv2 and LBP (97%), and AousethNet (98%), the FF-NCA-CNN retains a competitive advantage. Previous models like GoogLeNet combined with SVM (87.34%), standard CNN architectures (86%), and Dual-Path CNN (77.1%) demonstrate considerably lower accuracies, underscoring the progress made in recent methodologies. Furthermore, various methods pragmatic to the Flavia and Swedish leaf datasets, including the five-layer CNN (98.22%) and VGG19 with logistic regression (97.85%), also yield strong results, yet still slightly lag behind the FF-NCA-CNN’s performance on Mendeley. These comparisons validate that the incorporation of advanced feature fusion techniques, as seen in FF-NCA-CNN, results in more robust and precise classification, particularly when addressing complex datasets such as those related to medicinal leaves. The findings strongly endorse the efficacy of FF-NCA-CNN as a superior model for automated detection of plant species.

Fig. 14
figure 14

Model performance comparison.

Discussion

Despite the successful results of the plant detection process, certain constraints persist that require attention to boost overall effectiveness. Notably, our results revealed that the detection accuracy was coherent with the individual images sourced from our database and for field observations. The research revealed that the proposed method successfully identified both rare and common species while disregarding associated habitats. Contrary to our initial assumptions, the database images and field observations were meticulously curated to guarantee clarity for experts. The remarkable results indicate that the application operates effectively across various conditions for plant detection. The successful functionality of the application infers its probable for wider use. Moreover, it offers the possibility of supporting in the detection of an expanding array of digitized plant images, conditional upon the images in the database meeting established quality standards.

Conclusions

The foreseen model diminishes the necessity for human intervention and reveals superior capabilities in the detection of medicinal plants. Significantly, the proposed method demonstrated superior performance following adjustments made to the localized data. In the future, the recommended model will be improved with a more effective interface and an optimized recognition algorithm, further enhanced to provide a viable and promising solution in practical applications.

It is vital to recognize and confront the particular difficulties experienced during the examination. Identifying plant species automatically can be difficult. One possible enhancement includes using geographical information from national or global sources about the usual environments of these plants. Aspects like species habitat, movement patterns, and appearance can hinder the effectiveness of detection methods. The presence of various images of a single species is essential for precise detection, especially as leaf features can vary with the seasons, increasing the difficulty. Tackling these constraints with additional research could improve the strength of our methodology.

The study introduces an enhanced approach by employing a varied dataset of medicinal plants from multiple botanical families. The finding holds substantial ability for the pharmaceutical industry, as it could aid in the discovery and utilization of plant species through healing properties. Future research may investigate the incorporation of supplementary data modalities, including infrared, hyperspectral, or thermal imagery, in conjunction with RGB data to enhance feature diversity and resilience under real-world conditions.