Introduction

In general, the precision agricultural research on identifying diseases over plant leaves by means of image analysis has been regarded as a critical area1. Some of the conventional techniques for determining the severity of the plant disease have been depending on the examination of the plant tissues2. Moreover, the management as well as the cultivation of the expert system has been widely utilized become of the widespread adoption of digital techniques and thus it has maximized the capacity of production3. Moreover, the disease and pest retrieving as well as the description characteristics over the expert’s system basically based expert’s exposits, and thus resulted in lower efficiency and maximized costs4. In addition to that, the advancement over artificial intelligence as well as image processing techniques has attained the opportunity for extending the research over the agricultural sector5. Further, the deep leering techniques are defined as a kind of machine learning approach and thus, it has been regarded as active research and also successfully deployed to diverse fields6. Thus, it has also been employed in various sectors like communications and networking, the automotive industry, the automotive industry, business, agriculture, and so on has utilized both image categorization and object detection techniques7. In considering sector, conventional techniques for identifying the disease in the plant have acquired experts to perform the process of visual inspection and also the more in-depth detection in labs as time-consuming process8.

More commonly, the plant disease is diagnosed depending on the symptoms that occur through diverse regions of the plants like as in pulp, stem, and leaf9,10. Moreover, the expert’s knowledge has been considered necessary for diagnosing the disease accurately in the plant leaf11. In addition to that, some of the systems have been extended to the correct detection of plant disease through recommending possible remedies and then it has served as advertisements through diverse agricultural product vendors12. Thus, the crop disease diagnosis depending on the visual symptoms may vary at diverse parts, where the farmer takes a picture of the plant part and then, the mobile disease identification approaches that have the potential to detect as well as to label the disease13. It has aided to prevent crop losses by reducing the phases in the usual detection approaches14. But, it is very crucial to attain the rural areas, where the crop has been generated through smallholder farmers, along with the utilization of some improved technologies; it is possible to attain the diagnosis at an expert level15.

Here, diverse AI techniques identify as well as categorize the disease over the plant that has been implemented. In general, various approaches like CNN, and so on are used16. Thus, this approach has been utilized along with various pre-processing approaches for maximizing the feature extraction process. Consequently, the Deep Learning (DL) algorithm model is used for extracting the features through the images and then used the features for performing regression or classification processes based on the requirements. In addition to that, it is depending on the similarity in classifying the data and it has been further considered as the unlabeled objects that are categorized by utilizing the neighbouring labelled objects. But, some of the conventional techniques have certain restrictions like overlapping and over-fitting issues of data17. In general, DL techniques have the potential to retrieve the features in a specified manner and then fused the features in order to offer better-classified outcomes18. On considering the earlier phases of the research depending on the process of detection, it has intended to remain the categorization tasks also. In addition to that, computers are utilized to detect the disease by means of DL approaches. Further, a fine-turned deep learning model is implemented for identification process. In order to further improve the identification process for diseased plants, a new model has been implemented.

Certain attributions that are highlighted in the given model are shown as follows.

  • To build the new precise networking model for detecting the disease over the plant that has aided for the earlier identification and prevention to secure the plant in a significant manner.

  • To perform the segmentation process by utilizing the novel architectural model termed DAA-MRCNN, where the AVLO model is used for truning the parameters along with the objective function for maximizing the dice and Jaccard co-efficient.

  • To perform the classification process, the DAA-MDeNet is utilized to attain the final classified outcomes, where the AVLO model is used for truning the parameters along with the objective function for maximizing the rate of accuracy.

  • To develop the new algorithm that is used for optimization process for both the segmentation and classification phase is termed AVLO, which is designed with the new formulation to overcome the limitations in conventional models.

  • To experimentally validate the entire performance of the given model using several measures to show the betterment over others.

Other sections that are followed in the newly designed model are given as follows. Tier II offers Literature survey, illustration of detection process: adaptive segmentation and classification model in Tier III, DAA-MRCNN for segmentation in Tier IV, identifying the plant disease using multi-scale DenseNet with AVLO in Tier V, results and conclusion are in Tier VI and VII.

Literature survey

Related works

In 2023, Moupojou et al.19 have suggested new DL-dependent techniques that were helped the farmers to identify the crop-related disease in order to neglect the yield delay. This technique has been trained over the publically available datasets and was composed of laboratory images attained under laboratory conditions but faced diverse limitations. Thus, a FieldPlant was recommended in this model that was aggregated through the plantations. In addition to that, the manual annotation of every image over the individual leaves was carried out to assure the quality. Finally, the determination was made for the object detection model and then identified the classification tasks.

In 2022, Saleem et al.20 have presented the model for recognizing the disease in the plant by utilizing the newly aggregated datasets. After detecting the most adequate deep learning techniques, the data augmentation approaches have been validated. Consequently, the impact over the resizers along with the interpolators, and batch normalization has also been carried out. At the last the overall performance enhancement was carried out through empirical observation. Moreover, the robustness of the given model was determined through k-fold cross-validations.

In 2022, Patil et al.21 have utilized the model for detecting the disease in the plants due to its spectacular accomplishment. Further, the standard approach has been utilized for neglecting the irrelevant background in the input images by multi-scale features selection. This model implemented the detection of disease in the cardamom plant by utilizing the EfficientNetV2 techniques. In addition to that, a comprehensive set of determinations has been made to ascertain the ability of the given model and then assimilate it with other techniques like CNN and the Efficient Net model.

In 2023, Hosny et al.22 implemented the new lightweight deep model for attaining better depictions of maximized-level features. Further, the deep features have been fused together along with the conventional standard features in order to attain the local texture information over the plant. Then, this model has been trained as well as then tested over the datasets.

In 2022, Amin et al.23 have proposed end-to-end DL techniques for detecting the unhealthy as well as the healthy leaves into consideration. This system has utilized two pre-trained CNNs and some other standard techniques for retrieving the deep features through the plant images. In addition to that, the data augmentation approaches have been utilized to add variations to the images to train the model and also ensure the model learns more complex cases. Thus, this model has acquired fewer parameters than the conventional techniques and attained less proceeding power.

In 2023, Vishnoi et al.24 have used the CNN model that included a lesser amount of layers, which may lead to the reduced burden computation process. In addition to that, various augmentation processes like flipping, zoom, scaling, shear, and shift have been deployed for producing the samples and thus maximized the training sets without attaining more images. Even though various conventional techniques were used in the detection process, this model has offered an accurate performance with computational and lower storage. Moreover, the rigorous validation outcomes have revealed that the given model was fit.

In 2021, Zhao et al.25 have provided a detection model by utilizing the Double GAN for offering high-resolution images by detecting the leaf with disease and it was carried out in two different phases. To the standard techniques, the healthy leaf images were utilized as the input for attaining the pre-trained model was performed in the initial phase and consequently, the unhealthy leaves were utilized to pre-train the model. Further, the other standard techniques were utilized to attain the corresponding images to extend the unbalanced datasets. The recognition issues have offered better outcomes when assimilated with original datasets.

In 2021, Ahmad et al.26 have designed techniques for systematically categorizing the symptoms of plant disease by utilizing the CNN model. While coupled with the given techniques, it has assured the industrial applications by minimizing the training times. In addition to that, the transfer learning techniques were used for training even the small datasets that have the potential to transfer the pre-trained weights over the larger datasets. But the negative transfer learning was a common issue in transfer learning. Thus, step-wise transfer learning techniques were recommended that aided the fast convergence.

In 2025, Hassan et al.27 have developed a transfer learning-based deep learning model for the classification of breast cancer in women. Multiple deep learning models are combined in this model to get robust results in the classification process. In 2024, Hassan et al.28 have presented a deep learning model for early detection of black fungus in medical images. This model detects the black fungus more effectively at an earlier stage to prevent the mortality of humans.

In 2025, Hassan et al.29 have suggested a Real-Time Adaptation Framework for dysarthria detection under the resource-constrained scenario. Across the diverse dataset, the enhanced WaveNet is used for capturing the long-term dependencies in the audio signal. The reliability of the clinical application is improved through this model. In 2024, Chouhan et al.30 have proposed an artificial intelligence-based approach for improving the agricultural sector.

In 2019, Chouhan et al.31 have developed a Fuzzy Competitive Learning based Counter Propagation Network (FCPN) for the purpose of segmenting the natural scene images. This model has high parallel learning capability and also handles the uncertainty in the segmentation process. In 2024, Sharma et al.32 have proposed Generative Adversarial Networks (GANs) and Vision Transformers (ViTs) for providing a solution to the farming situation. Here, the robust and high-quality dataset is used for constructing a robust model for the agricultural task.

In 2025, Chouhan et al.33 have developed an artificial intelligence model for soil health and crop health monitoring. The timely and accurate plant disease diagnosis system is offered through this model. In 2020, Mahmood et al.34 have presented a deep learning model for breast cancer detection on multimodalities. In 2025, Rehman et al.35 have proposed the Swin-ViT model for robust kidney carcinoma prognosis. In 2023, Ali et al.36 have presented TESR (Two-stage approach for Enhancement and super-resolution) for improving the resolution of the images in an artificial manner. In 2024, Mahmood et al.37 have developed a deep-learning algorithm for improving the detection and classification of breast cancer. In 2024, Mahmood et al.38 have presented a Depth Double Deep Learning Method of Linear Attention Network (D3LM-LAN) for detecting cognitive impairment at an earlier stage. In39 Multi-Modal Feature Fusion Network for Histopathology (MFF-HistoNet) is proposed for improving the accuracy of breast cancer detection by addressing the multigrading challenges. In40, squeeze‐and‐excitation and dilated dense convolution are proposed for analyzing the intricate brain tissues more accurately.

Problem specifications

In general, there is usually advancement as well as certain limitations over the classical model of disease detection in leaves that are tabulated in Table 1. Deep learning models19 technique detects the disease in the individual leaves. It has also assured the quality of the entire detection process. But the modelling of a global ensemble, along with the segmentation process, is limited in this model. The deep learning20 method has been embedded into the robotic system for deploying over the disease control approach. It has also been utilized for developing a cost-effective protection system. But, more in-depth validation needs to be explored to strengthen the process. The CNN21 process can neglect the complex background from the images. The model needs to be extended for detecting nutritional deficiency. The CNN22 model offered better determination with accurate outcomes for detection. It required less number of parameters. But the application of practical crop disease detection is restricted. CNN23 has acquired small parameters for retrieving the features as well as integrated the sets of features that offer more robustness to the given model. Detecting the disease by means of a digital imaging process needs to be developed. CNN24 detect the disease over the crops with the aid of the leaf images. It is more consistent as well as reliable. Better image variability is required. Double GAN25 has been effectively utilized in the field of image generation. It has also detected unhealthy leaves easily. But, the high-resolution images, along with a smaller number of samples, are limited. The ensemble26 method has played a significant role in enhancing the entire performance of the detection system. It has offered better feasibility. But it needs to be improved in the practical applications.

Table 1 Certain advancements and their limitations in the plant disease detection model.

Illustration of plant disease detection: adaptive segmentation and classification model

Proposed system of plant disease detection

In general, there is an ancient interaction between future crops as well as the disease that has caused the never-ending competition, and identification for controlling the pest. On considering crop protection, it is a difficult process for validating the resistance of the plants to employ pesticides in a cost-effective manner. But the determination of symptoms and disease severity is essential for addressing. Plant disease may affect the growth of the species, and thus, there is a need for earlier detection. For detecting the disease, various mobile-dependent techniques have been employed. Certain machine learning and deep learning approaches are taken into consideration for diagnosis. For tackling the losses, diverse techniques have been designed to diagnose the diseases. In addition to that, precision agriculture has utilized recent technologies for optimizing the decision-making process. Some of the standard techniques are utilized to offer optimal decisions that lead to the reduction of costs. But, this area still needs to be improved, specifically over the decision-support systems that aid in more useful recommendations. It has also provided more accurate predictions. Moreover, deep learning approaches are utilized for resolving complex issues in a reasonably short amount of time. But, there are limitations as well in the conventional approaches to detecting the disease in the plant, and thus, the new technique is implemented, and it has been represented in the architecture in Fig. 1.

Fig. 1
Fig. 1
Full size image

Architectural modelling of the given detection system for plant disease.

In this research work designed an efficient plant disease classification model. From the standard publicly available database, the images are acquired in the first step. Then, the attained images have been given to the phase of segmentation, that have aided the plant disease detection process to the next level, which has the potential to offer a more accurate and time-saving process. For performing the segmentation process, the newly developed DAA-MRCNN is used. Then, the images are given to the final phase of classification for classifying the diseased plant. In this phase, the newly designed DAA-MDeNet effectively performs the classification process. In both models, the integration of attention impacts precision and reduces false positives in complex field conditions during the segmentation and classification of plant diseases. During the segmentation operation, the pixel-level lesion areas are mainly focused on by the DAA-MRCNN because of the incorporation of the attention module. This may be used for suppressing the irrelevant region of the images and also the lighting artefacts. But in the classification of the plant disease, the lesion texture, edge patterns are focused on by the DAA-MDeNet model via the incorporation of an attention module. Thus, the diagnostic precision and the interpretability of the model is greatly improved through the incorporation of an attention mechanism in the proposed models. Further, by implementing the new AVLO algorithm for optimizing the parameters in both the DAA-MRCNN and DAA-MDeNet models for improve the performance. The determination of the detection process is validated in the final phase.

Plant disease dataset

The images related to performing the detection process are aggregated in this phase, and by using the relevant dataset, the details are given as:

Dataset: This dataset is named as PlantifyDr. It includes a total of 12,500 images in it from 10 different plant types, where the 10 different types are considered as 10 individual datasets. (1) Apple, (2) Cherry, (3) Citrus, (4) Corn, (5) Grape, (6) Peach, (7) Pepper, (8) Potato, (9) Strawberry, and (10) Tomato. It contains a total of 37 as plant diseases. It was collected through “https://www.kaggle.com/datasets/lavaman151/plantifydr-dataset”: “Access Date: 2023-08-09”.

Thus, the images are significantly aggregated, and it has been termed as \(PD_{zz}\), \(zz = 1,2, \ldots ,ZZ\) which denotes the images that are aggregated during the process. Then, the sample images with the disease name are given in Fig. 2.

Fig. 2
Fig. 2
Full size image

Some of the sample images aggregated using the given dataset.

Novel heuristic algorithm: AVLO

In the newly designed novel hybrid heuristic-based detection process for detecting the disease over the plant, a new model, AVLO, has been implemented depending on its position to tackle some of the limitations of the conventional AVOA and LO models. Because of the complex parameter spaces, achieving peak performance by tuning the parameters of the deep learning model is a difficult task. In this work, the AVLO is developed that simulate the natural activities of the lemur for effectively searching the optimal hyperparameter space. The searching process using the AVLO improves the convergence and accuracy of the model used for both segmentation and the classification operation. The AVLO-based parameter tuning improves the reliability of the detection, assisting the fine-tuning process in both segmentation and classification. The proposed AVLO is developed by hybridizing the LO and AVOA algorithms since they are more effective in the exploitation and exploration phases. The hybridization of both algorithms improves the convergence and also maintains the diversity of the population. Even in the large and non-convex parameter space, the dynamic balance between the exploitation and the exploration is effectively managed by the combination of both algorithms. As compared to the conventional optimization algorithms, faster convergence, better generalization are attained by leveraging the explorative strength of the AVOA along with the exploitative strength of the LO, so it provides high high-quality solution for the complex optimization issues.

AVOA model only requires the minimal computational complexity and then it maximizes flexibility. It also has the capability to solve the continuous issues that arise in the optimization process. But, it has time-related issues to process the entire phases. Consequently, the LO model has provided essential competition over other models and managed both the parameter and optimal control-related issues. Here, the binary version is limited and degrades its performance.

Thus, the new formulation based on population is implemented; it is expressed in Eq. (1).

$$ps = mean\left( {ps1,ps2} \right) + \frac{bs}{{wf}}$$
(1)

Here, \(ps\) denotes the new position formulation that is derived for update, \(mean\) indicates the mean value, \(ps1\) and \(ps2\) represents updates on the positions that take place using AVOA and LO accordingly.

AVOA41: Vultures are divided into two significant groups of hunting birds. In addition to that, the vultures have aided in protecting the stinging as well as infecting carcasses.

Phase one: The fitness as well as the best solution has been selected as the best vultures for all the solutions and other solutions over the second group using Eq. (2). To validate the iteration over the fitness function and also the population over the vulture is also determined. In addition to that, the probability of choosing the better solution has been derived.

$$A\left( z \right) = \left\{ {\begin{array}{*{20}l} {bsvu_{1} } \hfill & {if\quad a_{b} = B_{1} } \hfill \\ {bsvu_{2} } \hfill & {if\quad a_{b} = B_{2} } \hfill \\ \end{array} } \right.$$
(2)

Here, a better solution is required to validate the probability of the chosen vultures, where \(B_{1}\) and \(B_{2}\) are also measured.

Phase two: It is regarded as the rate of being satiated, and then depicted the behaviour of the model is derived in Eq. (3). The mathematical modelling included in this behaviour is expressed in Eq. (4).

$$c = d \times \left( {\sin^{*} \left( {\frac{\pi }{2} \times \frac{{it_{f} }}{{it_{tn} }}} \right) + \cos \left( {\frac{\pi }{2} \times \frac{{it_{f} }}{{it_{tn} }}} \right) - 1} \right)$$
(3)
$$vs = \left( {2 \times rnd_{1} + 1} \right) \times e \times \left( {1 - \frac{{it_{f} }}{{it_{tn} }}} \right) + c$$
(4)

Here, the term \(rnd_{1}\) depicts the random value among [0, 1], \(e\) and \(d\) is given as a random number among [− 1, 1] and [− 2, 2], \(it_{f}\) which is the current iteration. \(vs\) is the vultures and \(it_{tn}\) indicated as the total number of iterations.

Exploration: the vulture usually acquires various random areas for analysis, in which it is based on two different strategies, and then the parameters are denoted as \(C_{1}\) that is used to elect the strategy. On considering the number that is maximized over the parameter \(C_{1}\), where Eq. (5) is used. Subsequently, when \(rnd_{C1}\) is minimized the parameters \(C_{1}\), Eq. (6) is used.

$$C\left( {z + 1} \right) = \left\{ {\begin{array}{*{20}l} {eq.(6)} \hfill & {if\quad C_{1} \ge rnd_{C1} } \hfill \\ {eq.(8)} \hfill & {if\quad C_{1} < rnd_{C1} } \hfill \\ \end{array} } \right.$$
(5)
$$C\left( {z + 1} \right) = A\left( z \right) - D\left( z \right) \times sv$$
(6)
$$D\left( z \right) = \left| {E \times A\left( z \right) - C\left( z \right)} \right|$$
(7)

The best cultures in the surrounding area are the vulture’s random search for prey, which \(C\left( {z + 1} \right)\) is regarded as the vulture location vectors. In addition to that, the rate of the vulture is \(sv\) and it is derived in Eq. (8). Further, the term \(E\) represents the coefficient vector that has enhanced the random motion. Moreover, \(C\left( z \right)\) indicates the position of the current vector of the vulture.

$$C\left( {z + 1} \right) = A\left( z \right) - sv + rnd_{2} \times \left( {\left( {u_{bo} - l_{bo} } \right) \times rnd_{3} + l_{bo} } \right)$$
(8)

Here, \(l_{bo}\) and \(u_{bo}\) are given as lower and upper bounds, and \(rnd_{3}\) the coefficient of random nature is enhanced. Then, the best vultures are selected using Eq. (6), and it is termed as \(A\left( z \right)\) and also the vulture satiation rate attained using Eq. (7).

Exploitation: Then, the effectiveness of the AVOA has been defined in the phase. This phase has been carried out when the value \(\left| {sv} \right|\) becomes less than 1. Moreover, the parameters \(C_{2}\) and \(C_{3}\) are used to select the strategy.

Further, the rotating flight strategy is made while the random number is minimized within the \(C_{2}\) parameters. It is expressed in Eq. (9).

$$C\left( {z + 1} \right) = \left\{ {\begin{array}{*{20}l} {eq.(6)} \hfill & {if\quad C_{2} \ge rnd_{C2} } \hfill \\ {eq.(8)} \hfill & {if\quad C_{2} < rnd_{C2} } \hfill \\ \end{array} } \right.$$
(9)

LO42: The lemurs are defined as the classification of primates in general, in which it has includes all primates. In addition to that, the searching process over the LO algorithm model is classified into two diverse steps: population-based algorithm model, as dance-hup behaviour is involved in exploration, and the leap-up behaviour is included in the exploitation.

Then, the set of lemurs is represented by means of the matrix. Moreover, the population is depicted in the form of a matrix and given in Eq. (10).

$$F = \left[ {\begin{array}{*{20}c} {b_{1}^{1} } & {b_{1}^{2} } & \cdots & {b_{1}^{vd} } \\ {b_{2}^{1} } & {b_{2}^{2} } & \cdots & {b_{2}^{vd} } \\ \vdots & \vdots & \vdots & \vdots \\ {b_{sc}^{1} } & {b_{sc}^{2} } & \cdots & {b_{sc}^{vd} } \\ \end{array} } \right]$$
(10)

Here, \(sc\) depicts the candidate solution \(vd\) depicts the decision variable, and \(F\) provides the set of lemurs.

Then, the decision variable \(g\) among the solutions \(h\) is expressed as in Eq. (11).

$$\begin{aligned} & b_{h}^{g} = rnd( \cdot ) \times \left( {\left( {U_{b} - L_{b} } \right) + L} \right) \\ & \forall h \in \left( {1,2, \ldots ,nn} \right),\quad \forall g \in \left( {1,2, \ldots ,vd} \right) \\ \end{aligned}$$
(11)

Here, the term \(rnd( \cdot )\) is depicted as a distributed random number \(\left( {1,2, \ldots ,mx\_in} \right)\), where it represents an integer number. Then, the variables’ lower and upper bounds are depicted as \(\left( {U_{b} - L_{b} } \right)\).

Moreover, when the fitness value is low and then the decision variables are changed. In addition to that, the lemurs are organised based on the value of fitness, and thus it has aided in improving the performance along with the iteration through considering the global best lemur \(gbl\) as well as choosing the best nearest lemur for each of the lemurs \(bnl\).

Further, the decision variable \(g\) among the solutions \(h\) is validated as the value of iteration by utilizing the two options: a) the value is elected by \(gbl\) , and b) the value is selected by means of \(bnl\). It is given in Eq. (12).

$$Z_{h}^{g} = \left\{ {\begin{array}{*{20}l} {b\left( {h,g} \right) + abs\left( {b\left( {h,g} \right)} \right) - b\left( {bnl,c} \right)*\left( {rnd - 0.5} \right)*2,rnd \prec rr} \hfill \\ {b\left( {h,g} \right) + abs\left( {b\left( {h,g} \right)} \right) - b\left( {gbl,c} \right)*\left( {rnd - 0.5} \right)*2,rnd \prec rr} \hfill \\ \end{array} } \right.$$
(12)

Here, \(b\left( {h,g} \right)\) is depicted as the current lemur, \(b\left( {bnl,c} \right)\) is given as the best nearest lemur, \(rr\) is denoted as the rate of risk of all lemurs, and \(rnd\) is indicated as the random number [0, 1]. Thus, the AVLO model’s representation is in Algorithm 1.

Algorithm 1
Algorithm 1
Full size image

AVLO

Then, the flowchart depiction for the AVLO model is given in Fig. 3.

Fig. 3
Fig. 3
Full size image

Flowchart depiction for AVLO model.

Dilated, adaptive, and attention-based mask RCNN for segmentation to detect the plant disease

Model of mask RCNN

In general, the mask RCNN43 model is regarded as an image detection and segmentation model as well as it is considered as one of the Faster RCNN models. The diseased regions from the images are more effectively captured by the mask RCNN model as compared to the ResNet, EfficientNet, or transformer-based models like Swin Transformer. In addition, the boundary accuracy of the mask RCNN is very high, so it provides feasible results in the task of segmentation. Especially under the small dataset, the noisy and the complex background in the agricultural images are effectively balanced by the mask RCNN. In order to attain the corresponding features map on the starting phase itself, the images get forwarded to the trained FPN and then given to the ResNext101 model, where the ResNext101 model is the kind of CNN model that is used to enhance the rate of accuracy and then lower the hyper-parameter impact without influencing the performance of the given model. Consequently, the fixed Region of Interest (RoI) has been subjected to the RPN over the binary classification as well as the Bounding-Box regression process. In addition to that, there are four essential phrases in the Mask RCNN model.

  1. (a)

    Backbone: In general, it is used to pre-train the parameters as well as to attain the trained model by means of initial parameters. It has the potential to extract the relevant map features through original images. It has adopted the structural model in a fixed form, like DenseNet and so on.

  2. (b)

    Feature Pyramid Network (FPN): In general, it is completely opposite to that of the backbone phrase, and it aims to extract the multi-scale feature maps fully.

  3. (c)

    Region Proposal Network (RPN): The ultimate goal of this phrase is to offer and select the rough identification rectangle. It is followed up by the FPN phrases, and its parameters are propagated. The RoI involved in this phase is regarded as the align step, and that utilizes the bilinear interpolation rather than the rounding operation, which is utilized by means of RoI pooling that can improve the RoI accuracy rate. RPN has diverse scales that are normalized into the same dimension of RoI as 7*7.

  4. (d)

    Function Branches: On considering the candidate RoI that is generally refined by means of a RoI alignment as well as involves three functional branches, such as categorization, identification, and segmentation process. Moreover, the segmentation branches are connected to the fully convolutional layer has provided a binary mask. Thus, the basic depiction of the mask RCNN model is given in Fig. 4.

Fig. 4
Fig. 4
Full size image

Basic depiction for mask RCNN model.

Dilated and attention in mask RCNN

In order to carry out this phase, the initially aggregated images \(PD_{zz}\) are subjected as input to this model. For retrieving the most adequate or relevant information through the given image pixels, the process of segmentation has been carried out in general, and it has the potential to improvise the capability of the model without any lack in the entire process. The conventional segmentation approaches provide inaccuracy in the segmentation operation if it faces images with varying backgrounds and overlapping leaf images. So, the DAA-MRCNN is proposed in this work, which is made up of dilated convolution, attention modules, and an adaptive concept for improving the accuracy of the segmentation process. The combination of these modules in the suggested DAA-MRCNN enables to precise isolation of the diseased areas from the images to get the effective results in plant disease detection.

Dilation: The convolution process is regarded as the process of dilation, in other words, because it is more similar to the process of convolution, but the dilation process involves the pixel shifting to ensure a huge number of areas in the given images. In addition to that, the convolution layer has attained a huge number of receptive fields without impacting the size of feature maps, which has aided in improving the information regarding the features. In general, the convolution has acquired the kernel with holes as well as the dilated rate is regarded as the essential parameter for distinguishing the dilated from normal. The receptive field is improved in the suggested model with the incorporation of the dilation module. The addition of the dilation does not increase the number of parameters while expanding the receptive field in the model. In this work, the fixed dilation rates (1, 2, 4, 8) are used in the suggested model. In order to capture the finer details in the images, dilation rates of 1 and 2 are adopted. Likewise, the global contextual awareness is improved by the higher dilation rates (4, 8). In the different layers of the hybrid models, the combination of dilation rates (2, 4, and 6) is used so that the large and the small contextual cues are more effectively captured using the suggested model. During the training process, the dilation rates are taken as a fixed value.

Attention: The most significant process involved in the mechanism of attention process is to intent on the specified part of the images in order to perform the process of classification adequately. It is detailed in Eq. (13).

$$ATT\left( {y,z} \right) = \sum\limits_{i = 1}^{Z} {\sigma_{i} } \left( {y,z_{i} } \right)E_{i}$$
(13)

Here, \(y\) and \(\sigma_{i} \left( {y,z_{i} } \right)\) are the query task, similarity function among queries, and its respective key as well as \(E_{i}\), \(z_{i}\) is the keys.

In addition to that, the softmax function involved in the attention mechanism has also been determined and it is expressed in Eq. (14).

$$sf\left( {j_{i} } \right) = \frac{{\exp \left( {j_{i} } \right)}}{{\sum\nolimits_{{i^{\prime } }} {\exp \left( {j_{{i^{\prime } }} } \right)^{\prime } } }}$$
(14)

Dilation and attention with mask RCNN: Thus, the dilation in the mask RCNN model has been made by replacing the layer of convolution with dilation to offer better performance. Consequently, by using integrated attention values, attention that it has been guided through a set of input images that are attained.

Thus, the segmented images have been attained as the outcome of this phase, and it is termed as \(ss^{mrc}\) and its process is detailed in Fig. 5.

Fig. 5
Fig. 5
Full size image

Depiction of the segmentation process carried out using the DAA-MRCNN model.

Suggested DAA-MRCNN for segmentation

In the agricultural imagery, the end-to-end classification model trained on the raw images is a computationally efficient and simpler operation. The classifier sometimes produces misclassified results due to the variance in the background and lighting conditions of the raw input images. When the diseases occupy a tiny portion of the images, the detection of plant disease from the entire image frame is a difficult operation. Instead of focusing on the lesion of the images, the correlation of the background elements is mainly focused on by the end-to-end classifier. So, this research adopted the two-stage pipeline that consists of segmentation and classification by addressing the aforementioned issues. To perform the segmentation process, the new heuristic model, termed as DAA-MRCNN, has been implemented. The suggested DAA-MRCNN model removes the irrelevant context effectively by focusing on the diseased region of the images, and this model improves the specificity and the precision by extensively concentrating on the relevant areas of the images. To tackle some of the limitations of the mask RCNN model, a new formulation is derived. In general, this model is simple to train, and it has better flexibility and efficiency. Consequently, it has acquired more time for detection. Therefore, with the aid of the AVLO algorithmic model, certain parameters in MRCNN have been optimized to enrich the performance. It is derived in Eq. (15).

$$of_{1} = \mathop {\arg \min }\limits_{{\left\{ {af_{mrc} ,hn_{mrc} ,ep_{mrc} } \right\}}} \left( {\frac{1}{dc + jc}} \right)$$
(15)

Here, the terms \(af_{mrc} ,hn_{mrc} ,ep_{mrc}\) indicated the activation function among [0, 4], hidden neuron count in [5–255], and the number of epochs in [5 to 55] in MRCNN are tuned using AVLO. Further, the terms \(dc\) and \(jc\) are in Eq. (16) and (17).

$$dc = \frac{2 \times aa}{{\left( {aa + bb} \right) + \left( {aa + bz} \right)}}$$
(16)
$$jc = \frac{aa}{{aa + bb + bz}}$$
(17)

Here, the terms \(aa,az,bb,bz\) are the “true positive, true negative, false positive, and false negatives”.

Identifying the plant disease using dilated, adaptive, and attention-based multiscale densenet with AVLO

Multiscale densenet

Here, the segmented images \(ss^{mrc}\) are given to the final classification phase to obtain the classified outcomes.

Multi-scale44: Here, both the down-sampling as well as the dense blocks have been acquired along the path of down-sampling and thus made the multi-scale DenseNet model. While alleviating the computational expenses, the down-sampled feature maps have enabled the networking block of the Densenet to model over a wider frequency range dependency as well as longer contexts. Further, in order to recover the original resolutions by means of a lower resolution feature map, the concept of an upsampling layer has been considered as the transposed convolution, and the size of its filter is similar to that of the pooling. For permitting both the backwards and the forward flow of the signals without influencing the lower resolution blocks and thus the inter-block skip connection has been introduced, which is directly interconnected with the same scale to two dense blocks.

DenseNet: The efficient feature reuse operation of the DenseNet makes it as the well effective structure for the classification of plant diseases. Here, the vanishing gradient issues are effectively mitigated; also it promotes the reuse of features via the layer-to-layer connection between the models. The subtle features in the images are preserved more effectively to get valuable results in plant disease detection. Unlike the ResNet, EfficientNet, or transformer-based models like Swin Transformer models, the DenseNet model does not require heavy parameters, and it also provides effective results even for the highly variable agricultural images. In accordance with the feed-forward networking model, the outcomes of the layer \(m\) have been validated through \(k_{m} = I_{m} \left( {k_{m - 1} } \right)\), in which the term \(k_{0}\) is termed as the input of the network and the non-linear transformation that includes functions of the operations and is termed as \(I_{m} \left( . \right)\). To tackle the limitations of the deep models, the ResNet has deployed a skip connection and it is expressed in Eq. (18).

$$k_{m} = I_{m} \left( {k_{m - 1} } \right) + k_{m - 1}$$
(18)

Further, the skip connection has permitted the networking model to directly pass the gradient to the preceding layer. Here, the DenseNet has further enhanced the flow of information among the layers by replacing the output of the preceding layer. It is derived in Eq. (19).

$$k_{m} = I_{m} \left( {\left[ {k_{m - 1} ,k_{m - 2} , \ldots ,k_{0} } \right]} \right)$$
(19)

Here, the concatenation process is given as \(\left[ \cdots \right]\). Moreover, the dense connectivity has ensured to reuse of the features that have been validated through preceding layers. It has neglected the determination of the most similar features over the diverse layers that made the networking model more effective. Here, \(I_{m}\) has been comprised of BN that has been followed up through ReLU and convolution along with the feature map \(l\). In order to perform the image recognition tasks, a pooling layer that collects the maps to the lower dimensions and local activation and thus it has become significant for acquiring global information effectively.

Dilation: In general, the process of dilation over the DenseNet model has been utilized to retrieve multi-level information from the images. Thus, the dilated dense block has acquired the ability to manage the same spatial resolution of the images. Through enlarging the small \(n \times n\) kernel filter \(n + \left( {n - 1} \right)\left( {dl - 1} \right)\), where the term \(dl\) is depicted as the dilation ratio, the size of the receptive field has been increased to the same size. Thus, the dilated layer has retrieved the multi-level information from the images.

Attention: The attention mechanism has been considered as a resource allocation in general. To learn the feature weight in accordance with the loss and thus to make the significant feature maps with greater weight is the intent of this process. Thus, it has compressed the features along with the spatial attention as well as then turned every dimensional feature channel that has global receptive fields to some extent.

Dilation and attention-based multi-scale DenseNet: For attaining the feature maps along with the representative information through the image, the concept of dilation convolution is involved, and the convolution layer in the DenseNet model gets replaced with the dilation layer. This attention mechanism adaptively chose the relevant features through different feature maps in the domain.

Thus, this process is depicted in Fig. 6.

Fig. 6
Fig. 6
Full size image

Depiction of the DAA-MDeNet model.

Recommended DAA-MDeNet for classification

The robust feature extractions from the segmented images are necessary to get accurate results in the classification of plant disease. The multiscale features are not effectively captured by the conventional models, which may limit their capability in accurately detecting the plant disease. In this work, the DAA-MDeNet is adopted, which leverages the attention mechanism and multiscale dense connectivity for improving the feature representation from the images. The challenging visual patterns and the variability among the diseased samples are accurately covered by the proposed DAA-MDeNet to get accurate results in the disease classification. To perform the classification process, the new heuristic model, termed DAA-MDeNet, has been implemented in this model. For tackling the restriction of the DenseNet model, a new formulation is derived. The DenseNet model has offered better computational efficiency. But this model has faced the replication of data multiple times. Therefore, with the aid of the AVLO algorithmic model, certain parameters in DenseNet have been optimized to enrich the performance. It is derived in Eq. (18).

$$of_{1} = \mathop {\arg \min }\limits_{{\left\{ {af_{dns} ,hn_{dns} ,ep_{dns} } \right\}}} \left( {\frac{1}{acc} + fnr + fdr + fpr} \right)$$
(20)

Here, the terms \(af_{dns} ,hn_{dns} ,ep_{dns}\) indicated the activation function among [0, 4], hidden neuron count among [5 to 255], and the number of epochs among [5 to 50] DenseNet are optimized using AVLO. Further, the terms \(acc,fdr,fnr\) and \(fpr\) are equated in Eqs. (21) to (24).

$$acc = \frac{{\left( {aa + az} \right)}}{{\left( {aa + az + bb + bz} \right)}}$$
(21)
$$fnr = \frac{bz}{{az + aa}}$$
(22)
$$fdr = \frac{bb}{{bb + bz}}$$
(23)
$$fpr = \frac{bb}{{bb + az}}$$
(24)

Thus, in the end, the final outcomes are classified as well as the process is given in Fig. 7.

Fig. 7
Fig. 7
Full size image

Depiction of the classification process made using the DAA-MDeNet model.

Imaging results

The acquired resultant images over the detection process of plant disease are given in Fig. 8.

Fig. 8
Fig. 8
Full size image

Resultant images for plant disease detection framework.

Results and discussion

Simulation setup

The model was executed in Python, and its extensive results were obtained. Dingo Optimizer (DO)-DAA-MDeNet45, Eurasian Oystercatcher Optimizer (EOO)-DAA-MDeNet46, Residual Network (ResNet)47, VGG1648, CNN49, and Modified CNN (MCNN)50 models are used for assimilations. Here, 10 are population size, 6 as length of the chromosome, and 50 as the maximum number of iterations that are used in this model. The code for the implementation of developed model is available at the link: https://github.com/kalicharan8u/Plant-Disease-Detection-using-Mask-RCNN-with-Multiscale-DenseNet-

Performance measures

An analysis of the processed model is carried out as follows:

  1. (a)

    Specificity in Eq. (25)

    $$sp = \frac{az}{{az + bb}}$$
    (25)
  2. (b)

    Sensitivity in Eq. (26)

    $$sy = \frac{aa}{{aa + bz}}$$
    (26)
  3. (c)

    Precision in Eq. (27)

    $$pr = \frac{aa}{{aa + bb}}$$
    (27)
  4. (d)

    NPV in Eq. (28)

    $$npv = \frac{bz}{{bz + az}}$$
    (28)
  5. (e)

    MCC in Eq. (29)

    $$MCC = \frac{aa \times az - bb \times bz}{{\sqrt {\left( {aa + bb} \right)\left( {aa + bz} \right)\left( {az + bb} \right)\left( {az + bz} \right)} }}$$
    (29)
  6. (f)

    F1-score in Eq. (30)

    $$f1 - S = 2\cdot\frac{sy \cdot pr}{{pr + sy}}$$
    (30)

Determination of classification process using algorithms and classifiers

By varying the number of batch sizes from [4 to 64] as well as the learning percentage from [35 to 88], the determination of the classification process in the plant disease detection model by varying both the algorithm and classifier is given in Figs. 9 and 10 accordingly. On considering the value of the classifier, at batch size 4 for accuracy, the AVLO-DAA-MDeNet model has a better value of 16%, 10%, 14%  and 16% maximized than the Resnet, VGG16, DenseNet, and MDeNet models. It is similar to the case of algorithm comparison, where the AVLO-DAA-MDeNet model shows better value for positive measures than DO-DAA-MDeNet, EOO-DAA-MDeNet, AVOA-DAA-MDeNet, and LO-DAA-MDeNet and improves the classification process.

Fig. 9
Fig. 9Fig. 9
Full size image

Analyzing the classification process over the plant disease framework in terms of batch size by assimilating with algorithms and classifiers regarding (a) Accuracy, (b) F1-Score, (c) FDR, (d) FNR, (e) FPR, and (f) MCC.

Fig. 10
Fig. 10Fig. 10
Full size image

Analyzing the classification process over the plant disease framework in terms of learning percentage by assimilating with algorithms and classifiers regarding (a) Accuracy, (b) F1-Score, (c) FDR, (d) FNR, (e) FPR, and (f) MCC.

Statistical determination over the segmentation process using algorithms and classifiers

By varying the statistical measures, the determination of the segmentation process in the plant disease detection model by varying both the algorithm and classifier is given in Fig. 11. The graphical depiction shows a maximized value of accuracy, dice-coefficient, and Jaccard-coefficient value for the newly designed AVLO-DAA-MRCNN model, and thus better-segmented outcomes have aided the final classification process.

Fig. 11
Fig. 11
Full size image

Statistical analysis over the segmentation process of proposed plant disease classification framework by assimilating with algorithms and classifiers regarding (a) Accuracy, (b) dice-coefficient and (c) Jaccard coefficient.

Analysis on the confusion matrix for the plant disease classification model

Here, Fig. 12 shows the analysis of the confusion matrix for all ten datasets, and Fig. 13 shows the ROC and convergence analysis for dataset 1 for the proposed disease detection model. This validation has further enriched the entire performance of the given model and proved its efficiency also.

Fig. 12
Fig. 12Fig. 12
Full size image

Confusion analysis over the proposed plant disease classification framework regarding the dataset for (a) Apple, (b) Cherry, (c) Citrus, (d) Corn, (e) Grape, (f) Peach, (g) Pepper, (h) Potato, (i) Strawberry and (j) Tomato.

Fig. 13
Fig. 13
Full size image

Analysis of the proposed plant disease classification framework for dataset 1 regarding (a) ROC and (b) Convergence.

Overall analysis for plant disease classification model

Table 2 shows the validation that was made over the classification process of the plant disease detection framework in terms of both algorithms and classifiers for dataset 1. On considering the tabular values of accuracy over algorithm comparison, the proposed AVLO-DAA-MDeNet model has a better value of 6%, 3%, 2%, and 1% than DO-DAA-MDeNet, EOO-DAA-MDeNet, AVOA-DAA-MDeNet, and LO-DAA-MDeNet and improved the classification process.

Table 2 Proposed plant disease classification models overall analysis for dataset 1 by means of algorithms and classifiers.

Overall analysis for plant disease classification model using algorithms

Table 3 shows the statistical analysis made over the classification process of the plant disease detection framework using algorithms for dataset 1. When assimilating the proposed AVLO-DAA-MDeNet model with all other conventional algorithm models like DO-DAA-MDeNet, EOO-DAA-MDeNet, AVOA-DAA-MDeNet and LO-DAA-MDeNet, the proposed model has offered better outcomes.

Table 3 Proposed plant disease classification models: statistical determination using algorithms.

Accuracy report for plant disease classification model using classifiers in terms of datasets

The reports by means of accuracy for all ten datasets by assimilating with various conventional algorithms are shown in Table 4. On considering the values of Apple, the proposed AVLO-DAA-MDeNet model has acquired a better value of 6%, 1%, 3%, and 4% higher than Resnet, VGG16, DenseNet, and MDeNet models. It is the same for all other datasets; the proposed model has offered a better value.

Table 4 Proposed plant disease classification models accuracy report for the dataset by means of classifiers.

Ablation study of the suggested model

In order to prove the interpretability of the presented plant disease detection and the segmentation model, an ablation study is conducted in Table 5. As noted from the Table 5, the incorporation of dilated convolutions, adaptive attention, and optimizers helps to get the accuracy value up to 94.58% but there is a reduction in the accuracy value is occurs when the DenseNet (88.04%) model is used alone for the plant disease classification process. Likewise, the incorporation of the dilation further improves the accuracy of the DenseNet model up to 89.91%, which proves the interpretability of the proposed model in the classification of plant diseases.

Table 5 Ablation study of the presented plant disease classification model.

Comparison of the suggested model with the recent architectures

The suggested AVLO-DAA-MdeNet-based plant disease classification model is compared with the baseline approaches in Table 6 to claim the superior performance of the proposed approach in the plant disease classification task. It can be clearly seen from Table 6 is that the presented AVLO-DAA-MdeNet-based plant disease classification model provides an accuracy of 88.4% in the classification of the plant disease, which is comparatively superior to the Vision Transformers (85.424%), ConvNeXt (80.832%), and Swin Transformer (83.88%). Thus, the comparative results in Table 6 prove the effectiveness of the proposed model in plant disease classification.

Table 6 Comparison of the suggested plant disease classification model with the baseline approaches.

Generalizability and robustness of the suggested model, varying lighting, occlusions, or camera angles

The suggested generalizability and robustness of the presented model under varying lighting, occlusion or camera angles are given in Table 7. According to Fig. 14a, the generalizability of the presented AVLO-DAA-MdeNet model is measured in terms of accuracy metrics, which proves the higher generalization of the presented AVLO-DAA-MdeNet in the varying occlusion, lighting and camera angles. In addition, the robustness of the presented AVLO-DAA-MdeNet is measured in terms of the noise coefficient. Here, the correlation of the presented AVLO-DAA-MdeNet model is decreased towards the increase in noise coefficient, which effectively confirms the robustness of the presented AVLO-DAA-MdeNet against various lighting, occlusion or camera angles.

Table 7 Comparison of the suggested plant disease classification model with other multiscale strategies.
Fig. 14
Fig. 14
Full size image

Analysis of the proposed plant disease classification framework regarding “(a) Generalizability and (b) Robustness.

Comparison of the presented model with other multi-scale strategies

The suggested AVLO-DAA-MdeNet-based plant disease detection is compared with the other multiscale approaches, such as ASPP and FPN to confirm the lesion capturing capability of the presented model. Here, the proposed AVLO-DAA-MdeNet model reaches the accuracy value of 92.4% which is comparatively higher than the other multiscale approaches, such as ASPP (91.184%), FPN (85.256%) across varying scales. The accuracy improvement up to 7.2% confirms the capability of the presented AVLO-DAA-MdeNet in capturing the lesion across varying scales.

Attention maps of the proposed models

In both DAA-MRCNN and DAA-MDeNet models, the attention maps are used to emphasize the diagnostically relevant regions. The attention maps given in Fig. 15 help to illustrate that the model focuses on diseased areas versus healthy or background regions. Here, the integration of the attention maps can effectively focus on the diseased region, which greatly improves the accuracy of the plant disease detection operation. The visual explanations provided by the attention map would increase trust among agronomists and farmers and support real-world adoption in precision agriculture.

Fig. 15
Fig. 15
Full size image

Attention maps of the proposed plant disease classification framework.

Comparison of the AVLO with other evolutionary algorithms

Table 8 provides the comparison on the convergence speed and the solution qulaity of the AVLO to prove the effectiveness over the other algorithms such as Adam, RMSprop, PSO, and GA. As noted from Table 8, the proposed AVLO is the improved version of the optimisation algorithm that provides a higher convergence speed and solution quality compared to the other approaches.

Table 8 Comparison of the AVLO with other evolutionary algorithms.

Accuracy margin of the proposed model compared to the direct classification on raw images

Table 9 shows the accuracy comparison of the two-stage pipeline (proposed model) and the direct classification on the raw images. As given in Table 9, the two-stage pipeline provides the accuracy value of 95.6% but the direct classification on the raw images provides the accuracy of 93.4%. This proves the extent of the presented two-stage pipeline over the other models.

Table 9 Accuracy of the two-stage pipeline and direct comparison.

Conclusion

In this research work, an improved plant disease classification model was developed using a hybrid deep learning architecture. From the standard publicly available database, the images were collected in the first step. Then, the attained images were fed to segmentation process for acquiring the segmented images that have aided the plant disease detection process to the next level, which has the potential to offer a more accurate and time-saving process. For performing the segmentation process, the newly developed DAA-MRCNN was used. Then, the segmented images were forwarded to the final phase of classification for classifying the diseased plant. In this phase, the newly designed DAA-MDeNet effectively performs the classification process. Further, by implementing the new AVLO algorithm for optimizing the parameters in both the DAA-MRCNN and DAA-MDeNet models for enhancing the performance of the given model. The determination of the detection process was carried out by assimilating it with other models, where the performance measures are used for validation. For the value of precision, the proposed AVLO-DAA-MDeNet model has acquired a better value of 6%, 4%, 3% and 1% higher than DO-DAA-MDeNet, EOO-DAA-MDeNet, AVOA-DAA-MDeNet and LO-DAA-MDeNet models. Timely and reliable assessments for detecting the disease in the plant for improving the protection activity are the future scopes.

Limitations and future scope: The class imbalance issues occur across species or disease categories, and these issues are solved in future work through data augmentation, loss weighting, or sampling techniques such as focal loss or SMOTE for improving the performance of the developed model in the underrepresented classes. Despite the high accuracy, the longer inference time and the higher computational complexity may limit the real-time performance of the developed model under the low-power edge environment. The suggested model is one of the effective approaches, but it is not feasible in a resource-constrained environment since it demands higher memory and processing power. The need for future work is highlighted because of the trade-off between the deployability and accuracy of the proposed model. In future, the practical use of the suggested model in the real-world agricultural setting is enabled by focusing on the deployment of pruning, quantization, and knowledge distillation approaches. Furthermore, the real-world deployment and feasibility of the suggested model on the real-world agricultural environment is assessed by evaluating the performance of the developed model in the edge platforms such as NVIDIA Jetson Nano and Raspberry Pi, etc.