Plant disease detection using a hybrid dilated CNN with attention mechanisms and optimized mask RCNN segmentation

Sahu, Kalicharan; Tiwari, Shivam; Singh, Manoj Kumar; Pahareeya, Jankisharan; Shakya, Harish Kumar; Kumar, Gyanendra; Selvarajan, Shitharth

doi:10.1038/s41598-025-26192-w

Download PDF

Article
Open access
Published: 23 November 2025

Plant disease detection using a hybrid dilated CNN with attention mechanisms and optimized mask RCNN segmentation

Kalicharan Sahu¹,
Shivam Tiwari²,
Manoj Kumar Singh³,
Jankisharan Pahareeya⁴,
Harish Kumar Shakya⁵,
Gyanendra Kumar⁶ &
…
Shitharth Selvarajan⁷

Scientific Reports volume 15, Article number: 42008 (2025) Cite this article

2403 Accesses
1 Citations
Metrics details

Subjects

Abstract

In accordance with human life, agriculture has main role in it, and in addition to that most people are involved in some kind of agricultural activity either in a direct or indirect manner. Moreover, the agricultural sectors acquired a major role in supplying better quality food and thus made the greatest attribution to the growth of populations and economics. But, the disease over the crop has influenced the growth of the corresponding species and thus requires an earlier diagnosis of plant disease by utilizing the most adequate and automatic detection approach for improving the quality of the production of food as well as to reduce the loss in economic. But, there are no techniques in the conventional system for identifying the disease in diverse crops in the agricultural environment. In modern times, deep learning approaches have acquired tremendous enhancement in the identification of image categorization as well as the object detection system. For precise detection of plant disease, an improved classification model is developed. Initially, from the standard publicly available database, the images of the plants are aggregated. The gathered images are segmented using Dilated, Adaptive, and Attention-based Mask Recurrent Convolutional Neural Networks (DAA-MRCNN). Then, it is fed into a hybrid classification phase, where the new model namely Dilated, Adaptive, and Attention-based Multiscale DenseNet termed as (DAA-MDeNet) for classification. The classifier performance is improved by optimizing the parameter in Mask RCNN and Multiscale DenseNet using the hybrid optimization algorithm named African Vulture and Lemur Optimizer (AVLO). When compared with the other model, a superior performance is shown in the proposed model.

AI based real time disease diagnosis in plants using deep learning driven CNNs

Article Open access 05 January 2026

Deep learning-based disease detection in potato and mango leaves: a comparative study of CNN, AlexNet, ResNet, and EfficientNet

Article Open access 24 December 2025

Classification of rice plant diseases using efficient DenseNet121

Article Open access 20 February 2026

Introduction

In general, the precision agricultural research on identifying diseases over plant leaves by means of image analysis has been regarded as a critical area¹. Some of the conventional techniques for determining the severity of the plant disease have been depending on the examination of the plant tissues². Moreover, the management as well as the cultivation of the expert system has been widely utilized become of the widespread adoption of digital techniques and thus it has maximized the capacity of production³. Moreover, the disease and pest retrieving as well as the description characteristics over the expert’s system basically based expert’s exposits, and thus resulted in lower efficiency and maximized costs⁴. In addition to that, the advancement over artificial intelligence as well as image processing techniques has attained the opportunity for extending the research over the agricultural sector⁵. Further, the deep leering techniques are defined as a kind of machine learning approach and thus, it has been regarded as active research and also successfully deployed to diverse fields⁶. Thus, it has also been employed in various sectors like communications and networking, the automotive industry, the automotive industry, business, agriculture, and so on has utilized both image categorization and object detection techniques⁷. In considering sector, conventional techniques for identifying the disease in the plant have acquired experts to perform the process of visual inspection and also the more in-depth detection in labs as time-consuming process⁸.

More commonly, the plant disease is diagnosed depending on the symptoms that occur through diverse regions of the plants like as in pulp, stem, and leaf^9,10. Moreover, the expert’s knowledge has been considered necessary for diagnosing the disease accurately in the plant leaf¹¹. In addition to that, some of the systems have been extended to the correct detection of plant disease through recommending possible remedies and then it has served as advertisements through diverse agricultural product vendors¹². Thus, the crop disease diagnosis depending on the visual symptoms may vary at diverse parts, where the farmer takes a picture of the plant part and then, the mobile disease identification approaches that have the potential to detect as well as to label the disease¹³. It has aided to prevent crop losses by reducing the phases in the usual detection approaches¹⁴. But, it is very crucial to attain the rural areas, where the crop has been generated through smallholder farmers, along with the utilization of some improved technologies; it is possible to attain the diagnosis at an expert level¹⁵.

Here, diverse AI techniques identify as well as categorize the disease over the plant that has been implemented. In general, various approaches like CNN, and so on are used¹⁶. Thus, this approach has been utilized along with various pre-processing approaches for maximizing the feature extraction process. Consequently, the Deep Learning (DL) algorithm model is used for extracting the features through the images and then used the features for performing regression or classification processes based on the requirements. In addition to that, it is depending on the similarity in classifying the data and it has been further considered as the unlabeled objects that are categorized by utilizing the neighbouring labelled objects. But, some of the conventional techniques have certain restrictions like overlapping and over-fitting issues of data¹⁷. In general, DL techniques have the potential to retrieve the features in a specified manner and then fused the features in order to offer better-classified outcomes¹⁸. On considering the earlier phases of the research depending on the process of detection, it has intended to remain the categorization tasks also. In addition to that, computers are utilized to detect the disease by means of DL approaches. Further, a fine-turned deep learning model is implemented for identification process. In order to further improve the identification process for diseased plants, a new model has been implemented.

Certain attributions that are highlighted in the given model are shown as follows.

To build the new precise networking model for detecting the disease over the plant that has aided for the earlier identification and prevention to secure the plant in a significant manner.
To perform the segmentation process by utilizing the novel architectural model termed DAA-MRCNN, where the AVLO model is used for truning the parameters along with the objective function for maximizing the dice and Jaccard co-efficient.
To perform the classification process, the DAA-MDeNet is utilized to attain the final classified outcomes, where the AVLO model is used for truning the parameters along with the objective function for maximizing the rate of accuracy.
To develop the new algorithm that is used for optimization process for both the segmentation and classification phase is termed AVLO, which is designed with the new formulation to overcome the limitations in conventional models.
To experimentally validate the entire performance of the given model using several measures to show the betterment over others.

Other sections that are followed in the newly designed model are given as follows. Tier II offers Literature survey, illustration of detection process: adaptive segmentation and classification model in Tier III, DAA-MRCNN for segmentation in Tier IV, identifying the plant disease using multi-scale DenseNet with AVLO in Tier V, results and conclusion are in Tier VI and VII.

Literature survey

Related works

In 2023, Moupojou et al.¹⁹ have suggested new DL-dependent techniques that were helped the farmers to identify the crop-related disease in order to neglect the yield delay. This technique has been trained over the publically available datasets and was composed of laboratory images attained under laboratory conditions but faced diverse limitations. Thus, a FieldPlant was recommended in this model that was aggregated through the plantations. In addition to that, the manual annotation of every image over the individual leaves was carried out to assure the quality. Finally, the determination was made for the object detection model and then identified the classification tasks.

In 2022, Saleem et al.²⁰ have presented the model for recognizing the disease in the plant by utilizing the newly aggregated datasets. After detecting the most adequate deep learning techniques, the data augmentation approaches have been validated. Consequently, the impact over the resizers along with the interpolators, and batch normalization has also been carried out. At the last the overall performance enhancement was carried out through empirical observation. Moreover, the robustness of the given model was determined through k-fold cross-validations.

In 2022, Patil et al.²¹ have utilized the model for detecting the disease in the plants due to its spectacular accomplishment. Further, the standard approach has been utilized for neglecting the irrelevant background in the input images by multi-scale features selection. This model implemented the detection of disease in the cardamom plant by utilizing the EfficientNetV2 techniques. In addition to that, a comprehensive set of determinations has been made to ascertain the ability of the given model and then assimilate it with other techniques like CNN and the Efficient Net model.

In 2023, Hosny et al.²² implemented the new lightweight deep model for attaining better depictions of maximized-level features. Further, the deep features have been fused together along with the conventional standard features in order to attain the local texture information over the plant. Then, this model has been trained as well as then tested over the datasets.

In 2022, Amin et al.²³ have proposed end-to-end DL techniques for detecting the unhealthy as well as the healthy leaves into consideration. This system has utilized two pre-trained CNNs and some other standard techniques for retrieving the deep features through the plant images. In addition to that, the data augmentation approaches have been utilized to add variations to the images to train the model and also ensure the model learns more complex cases. Thus, this model has acquired fewer parameters than the conventional techniques and attained less proceeding power.

In 2023, Vishnoi et al.²⁴ have used the CNN model that included a lesser amount of layers, which may lead to the reduced burden computation process. In addition to that, various augmentation processes like flipping, zoom, scaling, shear, and shift have been deployed for producing the samples and thus maximized the training sets without attaining more images. Even though various conventional techniques were used in the detection process, this model has offered an accurate performance with computational and lower storage. Moreover, the rigorous validation outcomes have revealed that the given model was fit.

In 2021, Zhao et al.²⁵ have provided a detection model by utilizing the Double GAN for offering high-resolution images by detecting the leaf with disease and it was carried out in two different phases. To the standard techniques, the healthy leaf images were utilized as the input for attaining the pre-trained model was performed in the initial phase and consequently, the unhealthy leaves were utilized to pre-train the model. Further, the other standard techniques were utilized to attain the corresponding images to extend the unbalanced datasets. The recognition issues have offered better outcomes when assimilated with original datasets.

In 2021, Ahmad et al.²⁶ have designed techniques for systematically categorizing the symptoms of plant disease by utilizing the CNN model. While coupled with the given techniques, it has assured the industrial applications by minimizing the training times. In addition to that, the transfer learning techniques were used for training even the small datasets that have the potential to transfer the pre-trained weights over the larger datasets. But the negative transfer learning was a common issue in transfer learning. Thus, step-wise transfer learning techniques were recommended that aided the fast convergence.

In 2025, Hassan et al.²⁷ have developed a transfer learning-based deep learning model for the classification of breast cancer in women. Multiple deep learning models are combined in this model to get robust results in the classification process. In 2024, Hassan et al.²⁸ have presented a deep learning model for early detection of black fungus in medical images. This model detects the black fungus more effectively at an earlier stage to prevent the mortality of humans.

In 2025, Hassan et al.²⁹ have suggested a Real-Time Adaptation Framework for dysarthria detection under the resource-constrained scenario. Across the diverse dataset, the enhanced WaveNet is used for capturing the long-term dependencies in the audio signal. The reliability of the clinical application is improved through this model. In 2024, Chouhan et al.³⁰ have proposed an artificial intelligence-based approach for improving the agricultural sector.

In 2019, Chouhan et al.³¹ have developed a Fuzzy Competitive Learning based Counter Propagation Network (FCPN) for the purpose of segmenting the natural scene images. This model has high parallel learning capability and also handles the uncertainty in the segmentation process. In 2024, Sharma et al.³² have proposed Generative Adversarial Networks (GANs) and Vision Transformers (ViTs) for providing a solution to the farming situation. Here, the robust and high-quality dataset is used for constructing a robust model for the agricultural task.

In 2025, Chouhan et al.³³ have developed an artificial intelligence model for soil health and crop health monitoring. The timely and accurate plant disease diagnosis system is offered through this model. In 2020, Mahmood et al.³⁴ have presented a deep learning model for breast cancer detection on multimodalities. In 2025, Rehman et al.³⁵ have proposed the Swin-ViT model for robust kidney carcinoma prognosis. In 2023, Ali et al.³⁶ have presented TESR (Two-stage approach for Enhancement and super-resolution) for improving the resolution of the images in an artificial manner. In 2024, Mahmood et al.³⁷ have developed a deep-learning algorithm for improving the detection and classification of breast cancer. In 2024, Mahmood et al.³⁸ have presented a Depth Double Deep Learning Method of Linear Attention Network (D3LM-LAN) for detecting cognitive impairment at an earlier stage. In³⁹ Multi-Modal Feature Fusion Network for Histopathology (MFF-HistoNet) is proposed for improving the accuracy of breast cancer detection by addressing the multigrading challenges. In⁴⁰, squeeze‐and‐excitation and dilated dense convolution are proposed for analyzing the intricate brain tissues more accurately.

Problem specifications

In general, there is usually advancement as well as certain limitations over the classical model of disease detection in leaves that are tabulated in Table 1. Deep learning models¹⁹ technique detects the disease in the individual leaves. It has also assured the quality of the entire detection process. But the modelling of a global ensemble, along with the segmentation process, is limited in this model. The deep learning²⁰ method has been embedded into the robotic system for deploying over the disease control approach. It has also been utilized for developing a cost-effective protection system. But, more in-depth validation needs to be explored to strengthen the process. The CNN²¹ process can neglect the complex background from the images. The model needs to be extended for detecting nutritional deficiency. The CNN²² model offered better determination with accurate outcomes for detection. It required less number of parameters. But the application of practical crop disease detection is restricted. CNN²³ has acquired small parameters for retrieving the features as well as integrated the sets of features that offer more robustness to the given model. Detecting the disease by means of a digital imaging process needs to be developed. CNN²⁴ detect the disease over the crops with the aid of the leaf images. It is more consistent as well as reliable. Better image variability is required. Double GAN²⁵ has been effectively utilized in the field of image generation. It has also detected unhealthy leaves easily. But, the high-resolution images, along with a smaller number of samples, are limited. The ensemble²⁶ method has played a significant role in enhancing the entire performance of the detection system. It has offered better feasibility. But it needs to be improved in the practical applications.

Table 1 Certain advancements and their limitations in the plant disease detection model.

Full size table

Illustration of plant disease detection: adaptive segmentation and classification model

Proposed system of plant disease detection

In general, there is an ancient interaction between future crops as well as the disease that has caused the never-ending competition, and identification for controlling the pest. On considering crop protection, it is a difficult process for validating the resistance of the plants to employ pesticides in a cost-effective manner. But the determination of symptoms and disease severity is essential for addressing. Plant disease may affect the growth of the species, and thus, there is a need for earlier detection. For detecting the disease, various mobile-dependent techniques have been employed. Certain machine learning and deep learning approaches are taken into consideration for diagnosis. For tackling the losses, diverse techniques have been designed to diagnose the diseases. In addition to that, precision agriculture has utilized recent technologies for optimizing the decision-making process. Some of the standard techniques are utilized to offer optimal decisions that lead to the reduction of costs. But, this area still needs to be improved, specifically over the decision-support systems that aid in more useful recommendations. It has also provided more accurate predictions. Moreover, deep learning approaches are utilized for resolving complex issues in a reasonably short amount of time. But, there are limitations as well in the conventional approaches to detecting the disease in the plant, and thus, the new technique is implemented, and it has been represented in the architecture in Fig. 1.

In this research work designed an efficient plant disease classification model. From the standard publicly available database, the images are acquired in the first step. Then, the attained images have been given to the phase of segmentation, that have aided the plant disease detection process to the next level, which has the potential to offer a more accurate and time-saving process. For performing the segmentation process, the newly developed DAA-MRCNN is used. Then, the images are given to the final phase of classification for classifying the diseased plant. In this phase, the newly designed DAA-MDeNet effectively performs the classification process. In both models, the integration of attention impacts precision and reduces false positives in complex field conditions during the segmentation and classification of plant diseases. During the segmentation operation, the pixel-level lesion areas are mainly focused on by the DAA-MRCNN because of the incorporation of the attention module. This may be used for suppressing the irrelevant region of the images and also the lighting artefacts. But in the classification of the plant disease, the lesion texture, edge patterns are focused on by the DAA-MDeNet model via the incorporation of an attention module. Thus, the diagnostic precision and the interpretability of the model is greatly improved through the incorporation of an attention mechanism in the proposed models. Further, by implementing the new AVLO algorithm for optimizing the parameters in both the DAA-MRCNN and DAA-MDeNet models for improve the performance. The determination of the detection process is validated in the final phase.

Plant disease dataset

The images related to performing the detection process are aggregated in this phase, and by using the relevant dataset, the details are given as:

Dataset: This dataset is named as PlantifyDr. It includes a total of 12,500 images in it from 10 different plant types, where the 10 different types are considered as 10 individual datasets. (1) Apple, (2) Cherry, (3) Citrus, (4) Corn, (5) Grape, (6) Peach, (7) Pepper, (8) Potato, (9) Strawberry, and (10) Tomato. It contains a total of 37 as plant diseases. It was collected through “https://www.kaggle.com/datasets/lavaman151/plantifydr-dataset”: “Access Date: 2023-08-09”.

Thus, the images are significantly aggregated, and it has been termed as $PD_{zz}$, $zz = 1,2, \ldots ,ZZ$ which denotes the images that are aggregated during the process. Then, the sample images with the disease name are given in Fig. 2.

Novel heuristic algorithm: AVLO

In the newly designed novel hybrid heuristic-based detection process for detecting the disease over the plant, a new model, AVLO, has been implemented depending on its position to tackle some of the limitations of the conventional AVOA and LO models. Because of the complex parameter spaces, achieving peak performance by tuning the parameters of the deep learning model is a difficult task. In this work, the AVLO is developed that simulate the natural activities of the lemur for effectively searching the optimal hyperparameter space. The searching process using the AVLO improves the convergence and accuracy of the model used for both segmentation and the classification operation. The AVLO-based parameter tuning improves the reliability of the detection, assisting the fine-tuning process in both segmentation and classification. The proposed AVLO is developed by hybridizing the LO and AVOA algorithms since they are more effective in the exploitation and exploration phases. The hybridization of both algorithms improves the convergence and also maintains the diversity of the population. Even in the large and non-convex parameter space, the dynamic balance between the exploitation and the exploration is effectively managed by the combination of both algorithms. As compared to the conventional optimization algorithms, faster convergence, better generalization are attained by leveraging the explorative strength of the AVOA along with the exploitative strength of the LO, so it provides high high-quality solution for the complex optimization issues.

AVOA model only requires the minimal computational complexity and then it maximizes flexibility. It also has the capability to solve the continuous issues that arise in the optimization process. But, it has time-related issues to process the entire phases. Consequently, the LO model has provided essential competition over other models and managed both the parameter and optimal control-related issues. Here, the binary version is limited and degrades its performance.

Thus, the new formulation based on population is implemented; it is expressed in Eq. (1).

$$ps = mean\left( {ps1,ps2} \right) + \frac{bs}{{wf}}$$

(1)

Here, $ps$ denotes the new position formulation that is derived for update, $mean$ indicates the mean value, $ps1$ and $ps2$ represents updates on the positions that take place using AVOA and LO accordingly.

AVOA⁴¹: Vultures are divided into two significant groups of hunting birds. In addition to that, the vultures have aided in protecting the stinging as well as infecting carcasses.

Phase one: The fitness as well as the best solution has been selected as the best vultures for all the solutions and other solutions over the second group using Eq. (2). To validate the iteration over the fitness function and also the population over the vulture is also determined. In addition to that, the probability of choosing the better solution has been derived.

$$A\left( z \right) = \left\{ {\begin{array}{*{20}l} {bsvu_{1} } \hfill & {if\quad a_{b} = B_{1} } \hfill \\ {bsvu_{2} } \hfill & {if\quad a_{b} = B_{2} } \hfill \\ \end{array} } \right.$$

(2)

Here, a better solution is required to validate the probability of the chosen vultures, where $B_{1}$ and $B_{2}$ are also measured.

Phase two: It is regarded as the rate of being satiated, and then depicted the behaviour of the model is derived in Eq. (3). The mathematical modelling included in this behaviour is expressed in Eq. (4).

$$c = d \times \left( {\sin^{*} \left( {\frac{\pi }{2} \times \frac{{it_{f} }}{{it_{tn} }}} \right) + \cos \left( {\frac{\pi }{2} \times \frac{{it_{f} }}{{it_{tn} }}} \right) - 1} \right)$$

(3)

$$vs = \left( {2 \times rnd_{1} + 1} \right) \times e \times \left( {1 - \frac{{it_{f} }}{{it_{tn} }}} \right) + c$$

(4)

Here, the term $rnd_{1}$ depicts the random value among [0, 1], $e$ and $d$ is given as a random number among [− 1, 1] and [− 2, 2], $it_{f}$ which is the current iteration. $vs$ is the vultures and $it_{tn}$ indicated as the total number of iterations.

Exploration: the vulture usually acquires various random areas for analysis, in which it is based on two different strategies, and then the parameters are denoted as $C_{1}$ that is used to elect the strategy. On considering the number that is maximized over the parameter $C_{1}$, where Eq. (5) is used. Subsequently, when $rnd_{C1}$ is minimized the parameters $C_{1}$, Eq. (6) is used.

$$C\left( {z + 1} \right) = \left\{ {\begin{array}{*{20}l} {eq.(6)} \hfill & {if\quad C_{1} \ge rnd_{C1} } \hfill \\ {eq.(8)} \hfill & {if\quad C_{1} < rnd_{C1} } \hfill \\ \end{array} } \right.$$

(5)

$$C\left( {z + 1} \right) = A\left( z \right) - D\left( z \right) \times sv$$

(6)

$$D\left( z \right) = \left| {E \times A\left( z \right) - C\left( z \right)} \right|$$

(7)

The best cultures in the surrounding area are the vulture’s random search for prey, which $C\left( {z + 1} \right)$ is regarded as the vulture location vectors. In addition to that, the rate of the vulture is $sv$ and it is derived in Eq. (8). Further, the term $E$ represents the coefficient vector that has enhanced the random motion. Moreover, $C\left( z \right)$ indicates the position of the current vector of the vulture.

$$C\left( {z + 1} \right) = A\left( z \right) - sv + rnd_{2} \times \left( {\left( {u_{bo} - l_{bo} } \right) \times rnd_{3} + l_{bo} } \right)$$

(8)

Here, $l_{bo}$ and $u_{bo}$ are given as lower and upper bounds, and $rnd_{3}$ the coefficient of random nature is enhanced. Then, the best vultures are selected using Eq. (6), and it is termed as $A\left( z \right)$ and also the vulture satiation rate attained using Eq. (7).

Exploitation: Then, the effectiveness of the AVOA has been defined in the phase. This phase has been carried out when the value $\left| {sv} \right|$ becomes less than 1. Moreover, the parameters $C_{2}$ and $C_{3}$ are used to select the strategy.

Further, the rotating flight strategy is made while the random number is minimized within the $C_{2}$ parameters. It is expressed in Eq. (9).

$$C\left( {z + 1} \right) = \left\{ {\begin{array}{*{20}l} {eq.(6)} \hfill & {if\quad C_{2} \ge rnd_{C2} } \hfill \\ {eq.(8)} \hfill & {if\quad C_{2} < rnd_{C2} } \hfill \\ \end{array} } \right.$$

(9)

LO⁴²: The lemurs are defined as the classification of primates in general, in which it has includes all primates. In addition to that, the searching process over the LO algorithm model is classified into two diverse steps: population-based algorithm model, as dance-hup behaviour is involved in exploration, and the leap-up behaviour is included in the exploitation.

Then, the set of lemurs is represented by means of the matrix. Moreover, the population is depicted in the form of a matrix and given in Eq. (10).

$$F = \left[ {\begin{array}{*{20}c} {b_{1}^{1} } & {b_{1}^{2} } & \cdots & {b_{1}^{vd} } \\ {b_{2}^{1} } & {b_{2}^{2} } & \cdots & {b_{2}^{vd} } \\ \vdots & \vdots & \vdots & \vdots \\ {b_{sc}^{1} } & {b_{sc}^{2} } & \cdots & {b_{sc}^{vd} } \\ \end{array} } \right]$$

(10)

Here, $sc$ depicts the candidate solution $vd$ depicts the decision variable, and $F$ provides the set of lemurs.

Then, the decision variable $g$ among the solutions $h$ is expressed as in Eq. (11).

$$\begin{aligned} & b_{h}^{g} = rnd( \cdot ) \times \left( {\left( {U_{b} - L_{b} } \right) + L} \right) \\ & \forall h \in \left( {1,2, \ldots ,nn} \right),\quad \forall g \in \left( {1,2, \ldots ,vd} \right) \\ \end{aligned}$$

(11)

Here, the term $rnd( \cdot )$ is depicted as a distributed random number $\left( {1,2, \ldots ,mx\_in} \right)$, where it represents an integer number. Then, the variables’ lower and upper bounds are depicted as $\left( {U_{b} - L_{b} } \right)$.

Moreover, when the fitness value is low and then the decision variables are changed. In addition to that, the lemurs are organised based on the value of fitness, and thus it has aided in improving the performance along with the iteration through considering the global best lemur $gbl$ as well as choosing the best nearest lemur for each of the lemurs $bnl$.

Further, the decision variable $g$ among the solutions $h$ is validated as the value of iteration by utilizing the two options: a) the value is elected by $gbl$ , and b) the value is selected by means of $bnl$. It is given in Eq. (12).

$$Z_{h}^{g} = \left\{ {\begin{array}{*{20}l} {b\left( {h,g} \right) + abs\left( {b\left( {h,g} \right)} \right) - b\left( {bnl,c} \right)*\left( {rnd - 0.5} \right)*2,rnd \prec rr} \hfill \\ {b\left( {h,g} \right) + abs\left( {b\left( {h,g} \right)} \right) - b\left( {gbl,c} \right)*\left( {rnd - 0.5} \right)*2,rnd \prec rr} \hfill \\ \end{array} } \right.$$

(12)

Here, $b\left( {h,g} \right)$ is depicted as the current lemur, $b\left( {bnl,c} \right)$ is given as the best nearest lemur, $rr$ is denoted as the rate of risk of all lemurs, and $rnd$ is indicated as the random number [0, 1]. Thus, the AVLO model’s representation is in Algorithm 1.

Then, the flowchart depiction for the AVLO model is given in Fig. 3.

Dilated, adaptive, and attention-based mask RCNN for segmentation to detect the plant disease

Model of mask RCNN

In general, the mask RCNN⁴³ model is regarded as an image detection and segmentation model as well as it is considered as one of the Faster RCNN models. The diseased regions from the images are more effectively captured by the mask RCNN model as compared to the ResNet, EfficientNet, or transformer-based models like Swin Transformer. In addition, the boundary accuracy of the mask RCNN is very high, so it provides feasible results in the task of segmentation. Especially under the small dataset, the noisy and the complex background in the agricultural images are effectively balanced by the mask RCNN. In order to attain the corresponding features map on the starting phase itself, the images get forwarded to the trained FPN and then given to the ResNext101 model, where the ResNext101 model is the kind of CNN model that is used to enhance the rate of accuracy and then lower the hyper-parameter impact without influencing the performance of the given model. Consequently, the fixed Region of Interest (RoI) has been subjected to the RPN over the binary classification as well as the Bounding-Box regression process. In addition to that, there are four essential phrases in the Mask RCNN model.

(a)
Backbone: In general, it is used to pre-train the parameters as well as to attain the trained model by means of initial parameters. It has the potential to extract the relevant map features through original images. It has adopted the structural model in a fixed form, like DenseNet and so on.
(b)
Feature Pyramid Network (FPN): In general, it is completely opposite to that of the backbone phrase, and it aims to extract the multi-scale feature maps fully.
(c)
Region Proposal Network (RPN): The ultimate goal of this phrase is to offer and select the rough identification rectangle. It is followed up by the FPN phrases, and its parameters are propagated. The RoI involved in this phase is regarded as the align step, and that utilizes the bilinear interpolation rather than the rounding operation, which is utilized by means of RoI pooling that can improve the RoI accuracy rate. RPN has diverse scales that are normalized into the same dimension of RoI as 7*7.
(d)
Function Branches: On considering the candidate RoI that is generally refined by means of a RoI alignment as well as involves three functional branches, such as categorization, identification, and segmentation process. Moreover, the segmentation branches are connected to the fully convolutional layer has provided a binary mask. Thus, the basic depiction of the mask RCNN model is given in Fig. 4.

Dilated and attention in mask RCNN

In order to carry out this phase, the initially aggregated images $PD_{zz}$ are subjected as input to this model. For retrieving the most adequate or relevant information through the given image pixels, the process of segmentation has been carried out in general, and it has the potential to improvise the capability of the model without any lack in the entire process. The conventional segmentation approaches provide inaccuracy in the segmentation operation if it faces images with varying backgrounds and overlapping leaf images. So, the DAA-MRCNN is proposed in this work, which is made up of dilated convolution, attention modules, and an adaptive concept for improving the accuracy of the segmentation process. The combination of these modules in the suggested DAA-MRCNN enables to precise isolation of the diseased areas from the images to get the effective results in plant disease detection.

Dilation: The convolution process is regarded as the process of dilation, in other words, because it is more similar to the process of convolution, but the dilation process involves the pixel shifting to ensure a huge number of areas in the given images. In addition to that, the convolution layer has attained a huge number of receptive fields without impacting the size of feature maps, which has aided in improving the information regarding the features. In general, the convolution has acquired the kernel with holes as well as the dilated rate is regarded as the essential parameter for distinguishing the dilated from normal. The receptive field is improved in the suggested model with the incorporation of the dilation module. The addition of the dilation does not increase the number of parameters while expanding the receptive field in the model. In this work, the fixed dilation rates (1, 2, 4, 8) are used in the suggested model. In order to capture the finer details in the images, dilation rates of 1 and 2 are adopted. Likewise, the global contextual awareness is improved by the higher dilation rates (4, 8). In the different layers of the hybrid models, the combination of dilation rates (2, 4, and 6) is used so that the large and the small contextual cues are more effectively captured using the suggested model. During the training process, the dilation rates are taken as a fixed value.

Attention: The most significant process involved in the mechanism of attention process is to intent on the specified part of the images in order to perform the process of classification adequately. It is detailed in Eq. (13).

$$ATT\left( {y,z} \right) = \sum\limits_{i = 1}^{Z} {\sigma_{i} } \left( {y,z_{i} } \right)E_{i}$$

(13)

Here, $y$ and $\sigma_{i} \left( {y,z_{i} } \right)$ are the query task, similarity function among queries, and its respective key as well as $E_{i}$, $z_{i}$ is the keys.

In addition to that, the softmax function involved in the attention mechanism has also been determined and it is expressed in Eq. (14).

$$sf\left( {j_{i} } \right) = \frac{{\exp \left( {j_{i} } \right)}}{{\sum\nolimits_{{i^{\prime } }} {\exp \left( {j_{{i^{\prime } }} } \right)^{\prime } } }}$$

(14)

Dilation and attention with mask RCNN: Thus, the dilation in the mask RCNN model has been made by replacing the layer of convolution with dilation to offer better performance. Consequently, by using integrated attention values, attention that it has been guided through a set of input images that are attained.

Thus, the segmented images have been attained as the outcome of this phase, and it is termed as $ss^{mrc}$ and its process is detailed in Fig. 5.

Suggested DAA-MRCNN for segmentation

In the agricultural imagery, the end-to-end classification model trained on the raw images is a computationally efficient and simpler operation. The classifier sometimes produces misclassified results due to the variance in the background and lighting conditions of the raw input images. When the diseases occupy a tiny portion of the images, the detection of plant disease from the entire image frame is a difficult operation. Instead of focusing on the lesion of the images, the correlation of the background elements is mainly focused on by the end-to-end classifier. So, this research adopted the two-stage pipeline that consists of segmentation and classification by addressing the aforementioned issues. To perform the segmentation process, the new heuristic model, termed as DAA-MRCNN, has been implemented. The suggested DAA-MRCNN model removes the irrelevant context effectively by focusing on the diseased region of the images, and this model improves the specificity and the precision by extensively concentrating on the relevant areas of the images. To tackle some of the limitations of the mask RCNN model, a new formulation is derived. In general, this model is simple to train, and it has better flexibility and efficiency. Consequently, it has acquired more time for detection. Therefore, with the aid of the AVLO algorithmic model, certain parameters in MRCNN have been optimized to enrich the performance. It is derived in Eq. (15).

$$of_{1} = \mathop {\arg \min }\limits_{{\left\{ {af_{mrc} ,hn_{mrc} ,ep_{mrc} } \right\}}} \left( {\frac{1}{dc + jc}} \right)$$

(15)

Here, the terms $af_{mrc} ,hn_{mrc} ,ep_{mrc}$ indicated the activation function among [0, 4], hidden neuron count in [5–255], and the number of epochs in [5 to 55] in MRCNN are tuned using AVLO. Further, the terms $dc$ and $jc$ are in Eq. (16) and (17).

$$dc = \frac{2 \times aa}{{\left( {aa + bb} \right) + \left( {aa + bz} \right)}}$$

(16)

$$jc = \frac{aa}{{aa + bb + bz}}$$

(17)

Here, the terms $aa,az,bb,bz$ are the “true positive, true negative, false positive, and false negatives”.

Identifying the plant disease using dilated, adaptive, and attention-based multiscale densenet with AVLO

Multiscale densenet

Here, the segmented images $ss^{mrc}$ are given to the final classification phase to obtain the classified outcomes.

Multi-scale⁴⁴: Here, both the down-sampling as well as the dense blocks have been acquired along the path of down-sampling and thus made the multi-scale DenseNet model. While alleviating the computational expenses, the down-sampled feature maps have enabled the networking block of the Densenet to model over a wider frequency range dependency as well as longer contexts. Further, in order to recover the original resolutions by means of a lower resolution feature map, the concept of an upsampling layer has been considered as the transposed convolution, and the size of its filter is similar to that of the pooling. For permitting both the backwards and the forward flow of the signals without influencing the lower resolution blocks and thus the inter-block skip connection has been introduced, which is directly interconnected with the same scale to two dense blocks.

DenseNet: The efficient feature reuse operation of the DenseNet makes it as the well effective structure for the classification of plant diseases. Here, the vanishing gradient issues are effectively mitigated; also it promotes the reuse of features via the layer-to-layer connection between the models. The subtle features in the images are preserved more effectively to get valuable results in plant disease detection. Unlike the ResNet, EfficientNet, or transformer-based models like Swin Transformer models, the DenseNet model does not require heavy parameters, and it also provides effective results even for the highly variable agricultural images. In accordance with the feed-forward networking model, the outcomes of the layer $m$ have been validated through $k_{m} = I_{m} \left( {k_{m - 1} } \right)$, in which the term $k_{0}$ is termed as the input of the network and the non-linear transformation that includes functions of the operations and is termed as $I_{m} \left( . \right)$. To tackle the limitations of the deep models, the ResNet has deployed a skip connection and it is expressed in Eq. (18).

$$k_{m} = I_{m} \left( {k_{m - 1} } \right) + k_{m - 1}$$

(18)

Further, the skip connection has permitted the networking model to directly pass the gradient to the preceding layer. Here, the DenseNet has further enhanced the flow of information among the layers by replacing the output of the preceding layer. It is derived in Eq. (19).

$$k_{m} = I_{m} \left( {\left[ {k_{m - 1} ,k_{m - 2} , \ldots ,k_{0} } \right]} \right)$$

(19)

Here, the concatenation process is given as $\left[ \cdots \right]$. Moreover, the dense connectivity has ensured to reuse of the features that have been validated through preceding layers. It has neglected the determination of the most similar features over the diverse layers that made the networking model more effective. Here, $I_{m}$ has been comprised of BN that has been followed up through ReLU and convolution along with the feature map $l$. In order to perform the image recognition tasks, a pooling layer that collects the maps to the lower dimensions and local activation and thus it has become significant for acquiring global information effectively.

Dilation: In general, the process of dilation over the DenseNet model has been utilized to retrieve multi-level information from the images. Thus, the dilated dense block has acquired the ability to manage the same spatial resolution of the images. Through enlarging the small $n \times n$ kernel filter $n + \left( {n - 1} \right)\left( {dl - 1} \right)$, where the term $dl$ is depicted as the dilation ratio, the size of the receptive field has been increased to the same size. Thus, the dilated layer has retrieved the multi-level information from the images.

Attention: The attention mechanism has been considered as a resource allocation in general. To learn the feature weight in accordance with the loss and thus to make the significant feature maps with greater weight is the intent of this process. Thus, it has compressed the features along with the spatial attention as well as then turned every dimensional feature channel that has global receptive fields to some extent.

Dilation and attention-based multi-scale DenseNet: For attaining the feature maps along with the representative information through the image, the concept of dilation convolution is involved, and the convolution layer in the DenseNet model gets replaced with the dilation layer. This attention mechanism adaptively chose the relevant features through different feature maps in the domain.

Thus, this process is depicted in Fig. 6.

Recommended DAA-MDeNet for classification

The robust feature extractions from the segmented images are necessary to get accurate results in the classification of plant disease. The multiscale features are not effectively captured by the conventional models, which may limit their capability in accurately detecting the plant disease. In this work, the DAA-MDeNet is adopted, which leverages the attention mechanism and multiscale dense connectivity for improving the feature representation from the images. The challenging visual patterns and the variability among the diseased samples are accurately covered by the proposed DAA-MDeNet to get accurate results in the disease classification. To perform the classification process, the new heuristic model, termed DAA-MDeNet, has been implemented in this model. For tackling the restriction of the DenseNet model, a new formulation is derived. The DenseNet model has offered better computational efficiency. But this model has faced the replication of data multiple times. Therefore, with the aid of the AVLO algorithmic model, certain parameters in DenseNet have been optimized to enrich the performance. It is derived in Eq. (18).

$$of_{1} = \mathop {\arg \min }\limits_{{\left\{ {af_{dns} ,hn_{dns} ,ep_{dns} } \right\}}} \left( {\frac{1}{acc} + fnr + fdr + fpr} \right)$$

(20)

Here, the terms $af_{dns} ,hn_{dns} ,ep_{dns}$ indicated the activation function among [0, 4], hidden neuron count among [5 to 255], and the number of epochs among [5 to 50] DenseNet are optimized using AVLO. Further, the terms $acc,fdr,fnr$ and $fpr$ are equated in Eqs. (21) to (24).

$$acc = \frac{{\left( {aa + az} \right)}}{{\left( {aa + az + bb + bz} \right)}}$$

(21)

$$fnr = \frac{bz}{{az + aa}}$$

(22)

$$fdr = \frac{bb}{{bb + bz}}$$

(23)

$$fpr = \frac{bb}{{bb + az}}$$

(24)

Thus, in the end, the final outcomes are classified as well as the process is given in Fig. 7.

Imaging results

The acquired resultant images over the detection process of plant disease are given in Fig. 8.

Results and discussion

Simulation setup

The model was executed in Python, and its extensive results were obtained. Dingo Optimizer (DO)-DAA-MDeNet⁴⁵, Eurasian Oystercatcher Optimizer (EOO)-DAA-MDeNet⁴⁶, Residual Network (ResNet)⁴⁷, VGG16⁴⁸, CNN⁴⁹, and Modified CNN (MCNN)⁵⁰ models are used for assimilations. Here, 10 are population size, 6 as length of the chromosome, and 50 as the maximum number of iterations that are used in this model. The code for the implementation of developed model is available at the link: https://github.com/kalicharan8u/Plant-Disease-Detection-using-Mask-RCNN-with-Multiscale-DenseNet-

Performance measures

An analysis of the processed model is carried out as follows:

(a)
Specificity in Eq. (25)
$$sp = \frac{az}{{az + bb}}$$
(25)
(b)
Sensitivity in Eq. (26)
$$sy = \frac{aa}{{aa + bz}}$$
(26)
(c)
Precision in Eq. (27)
$$pr = \frac{aa}{{aa + bb}}$$
(27)
(d)
NPV in Eq. (28)
$$npv = \frac{bz}{{bz + az}}$$
(28)
(e)
MCC in Eq. (29)
$$MCC = \frac{aa \times az - bb \times bz}{{\sqrt {\left( {aa + bb} \right)\left( {aa + bz} \right)\left( {az + bb} \right)\left( {az + bz} \right)} }}$$
(29)
(f)
F1-score in Eq. (30)
$$f1 - S = 2\cdot\frac{sy \cdot pr}{{pr + sy}}$$
(30)

Determination of classification process using algorithms and classifiers

By varying the number of batch sizes from [4 to 64] as well as the learning percentage from [35 to 88], the determination of the classification process in the plant disease detection model by varying both the algorithm and classifier is given in Figs. 9 and 10 accordingly. On considering the value of the classifier, at batch size 4 for accuracy, the AVLO-DAA-MDeNet model has a better value of 16%, 10%, 14% and 16% maximized than the Resnet, VGG16, DenseNet, and MDeNet models. It is similar to the case of algorithm comparison, where the AVLO-DAA-MDeNet model shows better value for positive measures than DO-DAA-MDeNet, EOO-DAA-MDeNet, AVOA-DAA-MDeNet, and LO-DAA-MDeNet and improves the classification process.

Statistical determination over the segmentation process using algorithms and classifiers

By varying the statistical measures, the determination of the segmentation process in the plant disease detection model by varying both the algorithm and classifier is given in Fig. 11. The graphical depiction shows a maximized value of accuracy, dice-coefficient, and Jaccard-coefficient value for the newly designed AVLO-DAA-MRCNN model, and thus better-segmented outcomes have aided the final classification process.

Analysis on the confusion matrix for the plant disease classification model

Here, Fig. 12 shows the analysis of the confusion matrix for all ten datasets, and Fig. 13 shows the ROC and convergence analysis for dataset 1 for the proposed disease detection model. This validation has further enriched the entire performance of the given model and proved its efficiency also.

Overall analysis for plant disease classification model

Table 2 shows the validation that was made over the classification process of the plant disease detection framework in terms of both algorithms and classifiers for dataset 1. On considering the tabular values of accuracy over algorithm comparison, the proposed AVLO-DAA-MDeNet model has a better value of 6%, 3%, 2%, and 1% than DO-DAA-MDeNet, EOO-DAA-MDeNet, AVOA-DAA-MDeNet, and LO-DAA-MDeNet and improved the classification process.

Table 2 Proposed plant disease classification models overall analysis for dataset 1 by means of algorithms and classifiers.

Full size table

Overall analysis for plant disease classification model using algorithms

Table 3 shows the statistical analysis made over the classification process of the plant disease detection framework using algorithms for dataset 1. When assimilating the proposed AVLO-DAA-MDeNet model with all other conventional algorithm models like DO-DAA-MDeNet, EOO-DAA-MDeNet, AVOA-DAA-MDeNet and LO-DAA-MDeNet, the proposed model has offered better outcomes.

Table 3 Proposed plant disease classification models: statistical determination using algorithms.

Full size table

Accuracy report for plant disease classification model using classifiers in terms of datasets

The reports by means of accuracy for all ten datasets by assimilating with various conventional algorithms are shown in Table 4. On considering the values of Apple, the proposed AVLO-DAA-MDeNet model has acquired a better value of 6%, 1%, 3%, and 4% higher than Resnet, VGG16, DenseNet, and MDeNet models. It is the same for all other datasets; the proposed model has offered a better value.

Table 4 Proposed plant disease classification models accuracy report for the dataset by means of classifiers.

Full size table

Ablation study of the suggested model

In order to prove the interpretability of the presented plant disease detection and the segmentation model, an ablation study is conducted in Table 5. As noted from the Table 5, the incorporation of dilated convolutions, adaptive attention, and optimizers helps to get the accuracy value up to 94.58% but there is a reduction in the accuracy value is occurs when the DenseNet (88.04%) model is used alone for the plant disease classification process. Likewise, the incorporation of the dilation further improves the accuracy of the DenseNet model up to 89.91%, which proves the interpretability of the proposed model in the classification of plant diseases.

Table 5 Ablation study of the presented plant disease classification model.

Full size table

Comparison of the suggested model with the recent architectures

The suggested AVLO-DAA-MdeNet-based plant disease classification model is compared with the baseline approaches in Table 6 to claim the superior performance of the proposed approach in the plant disease classification task. It can be clearly seen from Table 6 is that the presented AVLO-DAA-MdeNet-based plant disease classification model provides an accuracy of 88.4% in the classification of the plant disease, which is comparatively superior to the Vision Transformers (85.424%), ConvNeXt (80.832%), and Swin Transformer (83.88%). Thus, the comparative results in Table 6 prove the effectiveness of the proposed model in plant disease classification.

Table 6 Comparison of the suggested plant disease classification model with the baseline approaches.

Full size table

Generalizability and robustness of the suggested model, varying lighting, occlusions, or camera angles

The suggested generalizability and robustness of the presented model under varying lighting, occlusion or camera angles are given in Table 7. According to Fig. 14a, the generalizability of the presented AVLO-DAA-MdeNet model is measured in terms of accuracy metrics, which proves the higher generalization of the presented AVLO-DAA-MdeNet in the varying occlusion, lighting and camera angles. In addition, the robustness of the presented AVLO-DAA-MdeNet is measured in terms of the noise coefficient. Here, the correlation of the presented AVLO-DAA-MdeNet model is decreased towards the increase in noise coefficient, which effectively confirms the robustness of the presented AVLO-DAA-MdeNet against various lighting, occlusion or camera angles.

Table 7 Comparison of the suggested plant disease classification model with other multiscale strategies.

Full size table

Comparison of the presented model with other multi-scale strategies

The suggested AVLO-DAA-MdeNet-based plant disease detection is compared with the other multiscale approaches, such as ASPP and FPN to confirm the lesion capturing capability of the presented model. Here, the proposed AVLO-DAA-MdeNet model reaches the accuracy value of 92.4% which is comparatively higher than the other multiscale approaches, such as ASPP (91.184%), FPN (85.256%) across varying scales. The accuracy improvement up to 7.2% confirms the capability of the presented AVLO-DAA-MdeNet in capturing the lesion across varying scales.

Attention maps of the proposed models

In both DAA-MRCNN and DAA-MDeNet models, the attention maps are used to emphasize the diagnostically relevant regions. The attention maps given in Fig. 15 help to illustrate that the model focuses on diseased areas versus healthy or background regions. Here, the integration of the attention maps can effectively focus on the diseased region, which greatly improves the accuracy of the plant disease detection operation. The visual explanations provided by the attention map would increase trust among agronomists and farmers and support real-world adoption in precision agriculture.

Comparison of the AVLO with other evolutionary algorithms

Table 8 provides the comparison on the convergence speed and the solution qulaity of the AVLO to prove the effectiveness over the other algorithms such as Adam, RMSprop, PSO, and GA. As noted from Table 8, the proposed AVLO is the improved version of the optimisation algorithm that provides a higher convergence speed and solution quality compared to the other approaches.

Table 8 Comparison of the AVLO with other evolutionary algorithms.

Full size table

Accuracy margin of the proposed model compared to the direct classification on raw images

Table 9 shows the accuracy comparison of the two-stage pipeline (proposed model) and the direct classification on the raw images. As given in Table 9, the two-stage pipeline provides the accuracy value of 95.6% but the direct classification on the raw images provides the accuracy of 93.4%. This proves the extent of the presented two-stage pipeline over the other models.

Table 9 Accuracy of the two-stage pipeline and direct comparison.

Full size table

Conclusion

In this research work, an improved plant disease classification model was developed using a hybrid deep learning architecture. From the standard publicly available database, the images were collected in the first step. Then, the attained images were fed to segmentation process for acquiring the segmented images that have aided the plant disease detection process to the next level, which has the potential to offer a more accurate and time-saving process. For performing the segmentation process, the newly developed DAA-MRCNN was used. Then, the segmented images were forwarded to the final phase of classification for classifying the diseased plant. In this phase, the newly designed DAA-MDeNet effectively performs the classification process. Further, by implementing the new AVLO algorithm for optimizing the parameters in both the DAA-MRCNN and DAA-MDeNet models for enhancing the performance of the given model. The determination of the detection process was carried out by assimilating it with other models, where the performance measures are used for validation. For the value of precision, the proposed AVLO-DAA-MDeNet model has acquired a better value of 6%, 4%, 3% and 1% higher than DO-DAA-MDeNet, EOO-DAA-MDeNet, AVOA-DAA-MDeNet and LO-DAA-MDeNet models. Timely and reliable assessments for detecting the disease in the plant for improving the protection activity are the future scopes.

Limitations and future scope: The class imbalance issues occur across species or disease categories, and these issues are solved in future work through data augmentation, loss weighting, or sampling techniques such as focal loss or SMOTE for improving the performance of the developed model in the underrepresented classes. Despite the high accuracy, the longer inference time and the higher computational complexity may limit the real-time performance of the developed model under the low-power edge environment. The suggested model is one of the effective approaches, but it is not feasible in a resource-constrained environment since it demands higher memory and processing power. The need for future work is highlighted because of the trade-off between the deployability and accuracy of the proposed model. In future, the practical use of the suggested model in the real-world agricultural setting is enabled by focusing on the deployment of pruning, quantization, and knowledge distillation approaches. Furthermore, the real-world deployment and feasibility of the suggested model on the real-world agricultural environment is assessed by evaluating the performance of the developed model in the edge platforms such as NVIDIA Jetson Nano and Raspberry Pi, etc.

Data availability

In case of benchmark data: The data underlying this article are available in the dataset link as: https://www.kaggle.com/datasets/lavaman151/plantifydr-dataset.

Code availability

The code for the implementation of the developed model is available at the link https://github.com/kalicharan8u/Plant-Disease-Detection-using-Mask-RCNN-with-Multiscale-DenseNet- and it has been given in Section "Simulation setup".

References

Ashourloo, D., Matkan, A. A., Huete, A., Aghighi, H. & Mobasheri, M. R. Developing an index for detection and identification of disease stages. IEEE Geosci. Remote Sens. Lett. 13(6), 851–855. https://doi.org/10.1109/LGRS.2016.2550529 (2016).
Article ADS Google Scholar
Asha Rani, K. P. & Gowrishankar, S. Pathogen-based classification of plant diseases: A deep transfer learning approach for intelligent support systems. IEEE Access 11, 64476–64493. https://doi.org/10.1109/ACCESS.2023.3284680 (2023).
Article Google Scholar
Noon, S. K., Amjad, M., Qureshi, M. A. & Mannan, A. Handling severity levels of multiple co-occurring cotton plant diseases using improved YOLOX model. IEEE Access 10, 134811–134825. https://doi.org/10.1109/ACCESS.2022.3232751 (2022).
Article Google Scholar
Delnevo, G., Girau, R., Ceccarini, C. & Prandi, C. A deep learning and social IoT approach for plants disease prediction toward a sustainable agriculture. IEEE Internet Things J. 9(10), 7243–7250. https://doi.org/10.1109/JIOT.2021.3097379 (2022).
Article Google Scholar
Liu, Z. et al. Internet of Things (IoT) and machine learning model of plant disease prediction-blister blight for tea plant. IEEE Access 10, 44934–44944. https://doi.org/10.1109/ACCESS.2022.3169147 (2022).
Article Google Scholar
Zeng, Q., Ma, X., Cheng, B., Zhou, E. & Pang, W. GANs-based data augmentation for citrus disease severity detection using deep learning. IEEE Access 8, 172882–172891. https://doi.org/10.1109/ACCESS.2020.3025196 (2020).
Article Google Scholar
Kumar, M., Kumar, A. & Palaparthy, V. S. Soil sensors-based prediction system for plant diseases using exploratory data analysis and machine learning. IEEE Sens. J. 21(16), 17455–17468. https://doi.org/10.1109/JSEN.2020.3046295 (2021).
Article ADS Google Scholar
Masood, M. et al. MaizeNet: A deep learning approach for effective recognition of maize plant leaf diseases. IEEE Access 11, 52862–52876. https://doi.org/10.1109/ACCESS.2023.3280260 (2023).
Article Google Scholar
Shovon, M. S. H. et al. PlantDet: A robust multi-model ensemble method based on deep learning for plant disease detection. IEEE Access 11, 34846–34859. https://doi.org/10.1109/ACCESS.202J (2023).
Article Google Scholar
Chen, J., Chen, W., Zeb, A., Yang, S. & Zhang, D. Lightweight inception networks for the recognition and detection of rice plant diseases. IEEE Sens. J. 22(14), 14628–14638. https://doi.org/10.1109/JSEN.2022.3182304 (2022).
Article ADS Google Scholar
Li, F. et al. A remote sensing and airborne edge-computing based detection system for pine wilt disease. IEEE Access 9, 66346–66360. https://doi.org/10.1109/ACCESS.2021.3073929 (2021).
Article Google Scholar
Tian, L. et al. VMF-SSD: A novel V-space based multi-scale feature fusion SSD for apple leaf disease detection. IEEE/ACM Trans. Comput. Biol. Bioinf. 20(3), 2016–2028. https://doi.org/10.1109/TCBB.2022.3229114 (2023).
Article Google Scholar
Patle, K. S. et al. IoT enabled, leaf wetness sensor on the flexible substrates for in-situ plant disease management. IEEE Sens. J. 21(17), 19481–19491. https://doi.org/10.1109/JSEN.2021.3089722 (2021).
Article ADS CAS Google Scholar
Ashourloo, D., Aghighi, H., Matkan, A. A., Mobasheri, M. R. & Rad, A. M. An Investigation into machine learning regression techniques for the leaf rust disease detection using hyperspectral measurement. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 9(9), 4344–4351. https://doi.org/10.1109/JSTARS.2016.2575360 (2016).
Article ADS Google Scholar
Zhu, X. et al. LAD-Net: A novel light weight model for early apple leaf pests and diseases classification. IEEE/ACM Trans. Comput. Biol. Bioinf. 20(2), 1156–1169. https://doi.org/10.1109/TCBB.2022.3191854 (2023).
Article Google Scholar
Schor, N. et al. Robotic disease detection in greenhouses: Combined detection of powdery mildew and tomato spotted wilt virus. IEEE Robot. Automat. Lett. 1(1), 354–360. https://doi.org/10.1109/LRA.2016.2518214 (2016).
Article Google Scholar
Nie, X., Wang, L., Ding, H. & Xu, M. Strawberry verticillium wilt detection network based on multi-task learning and attention. IEEE Access 7, 170003–170011. https://doi.org/10.1109/ACCESS.2019.2954845 (2019).
Article Google Scholar
Saini, R., Patle, K. S., Kumar, A., Surya, S. G. & Palaparthy, V. S. Attention-based multi-input multi-output neural network for plant disease prediction using multisensor system. IEEE Sens. J. 22(24), 24242–24252. https://doi.org/10.1109/JSEN.2022.3219601 (2022).
Article ADS Google Scholar
Moupojou, E., Tagne, A., Retraint, F. & Tadonk, A. FieldPlant: A dataset of field plant images for plant disease detection and classification with deep learning. IEEE Access 11, 35398–35410 (2023).
Article Google Scholar
Saleem, M. H., Potgieter, J. & Arif, K. M. A performance-optimized deep learning-based plant disease detection approach for horticultural crops of New Zealand. IEEE Access 10, 89798–89822 (2022).
Article Google Scholar
Sunil, C. K. & Jaidhar, C. D. Cardamom plant disease detection approach using EfficientNetV2. IEEE Access 10, 789–804 (2022).
Article Google Scholar
Hosny, K. M., El-Hady, W. M., Samy, F. M., Vrochidou, E. & Papakostas, G. A. Multi-class classification of plant leaf diseases using feature fusion of deep convolutional neural network and local binary pattern. IEEE Access 11, 62307–62317 (2023).
Article Google Scholar
Amin, H., Darwish, A., Hassanien, A. E. & Soliman, M. End-to-end deep learning model for corn leaf disease classification. IEEE Access 10, 31103–31115 (2022).
Article Google Scholar
Vishnoi, V. K., Kumar, K., Kumar, B., Mohan, S. & Khan, A. A. Detection of apple plant diseases using leaf images through convolutional neural network. IEEE Access 11, 6594–6609 (2023).
Article Google Scholar
Zhao, Y., Chen, Z., Gao, X., Song, W. & Xion, Q. Plant disease detection using generated leaves based on DoubleGAN. IEEE/ACM Trans. Comput. Biol. Bioinf. 19(3), 1817–1826 (2021).
Article Google Scholar
Ahmad, M., Abdullah, M., Moon, H. & Han, D. Plant disease detection in imbalanced datasets using efficient convolutional neural networks with stepwise transfer learning. IEEE Access 9, 140565–140580 (2021).
Article Google Scholar
Hassan, E., Saber, A., El-Sappagh, S. & El-Rashidy, N. Optimized ensemble deep learning approach for accurate breast cancer diagnosis using transfer learning and grey wolf optimization. Evol. Syst. 16(2), 59 (2025).
Article Google Scholar
Hassan, E., Saber, A., El-Kenawy, E.-S. M., Bhatnagar, R. & Shams, M. Y. Early detection of black fungus using deep learning models for efficient medical diagnosis. In 2024 International Conference on Emerging Techniques in Computational Intelligence (ICETCI) 426–431 (IEEE, 2024).
Hassan, E., Saber, A., El-Hafeez, T. A., Medhat, T. & Shams, M. Y. Enhanced dysarthria detection in cerebral palsy and ALS patients using WaveNet and CNN-BiLSTM models: A comparative study with model interpretability. Biomed. Signal Process. Control 110, 108128 (2025).
Article Google Scholar
Chouhan, S. S., Singh, U. P., Saxena, A. & Jain, S. Assessing the importance and need of artificial intelligence for precision agriculture. In: Artificial Intelligence Techniques in Smart Agriculture 1–6 (Springer Nature Singapore, Singapore 2024).
Chouhan, S. S., Kaul, A. & Singh, U. P. Image segmentation using fuzzy competitive learning based counter propagation network. Multimed. Tools Appl. 78(24), 35263–35287 (2019).
Article Google Scholar
Sharma, A., Patel, R. K., Pranjal, P., Panchal, B. & Chouhan, S. S. Computer vision-based smart monitoring and control system for crop. In Applications of Computer Vision and Drone Technology in Agriculture 4.0 65–82 (Springer Nature Singapore, Singapore 2024).
Chouhan, S. S., Singh, U. P. & Jain, S. Performance evaluation of different deep learning models used for the purpose of healthy and diseased leaves classification of Cherimoya (Annona Cherimola) plant. Neural Comput. Appl. 37(6), 4531–4544 (2025).
Article Google Scholar
Mahmood, T. et al. A brief survey on breast cancer diagnostic with deep learning schemes using multi-image modalities. IEEE Access 8, 165779–165809 (2020).
Article Google Scholar
Rehman, A., Mahmood, T. & Saba, T. Robust kidney carcinoma prognosis and characterization using Swin-ViT and DeepLabV3+ with multi-model transfer learning. Appl. Soft. Comput. 170, 112518 (2025).
Article Google Scholar
Ali, A. M., Benjdira, B., Koubaa, A., Boulila, W. & El-Shafai, W. TESR: two-stage approach for enhancement and super-resolution of remote sensing images. Remote Sens. 15(9), 2346 (2023).
Article ADS Google Scholar
Mahmood, T., Saba, T., Rehman, A. & Alamri, F. S. Harnessing the power of radiomics and deep learning for improved breast cancer diagnosis with multiparametric breast mammography. Expert Syst. Appl. 249, 123747 (2024).
Article Google Scholar
Mahmood, T., Rehman, A., Saba, T., Wang, Y. & Alamri, F. S. Alzheimer’s disease unveiled: Cutting-edge multi-modal neuroimaging and computational methods for enhanced diagnosis. Biomed. Signal Process. Control 97, 106721 (2024).
Article Google Scholar
Mahmood, T., Saba, T. & Rehman, A. Breast cancer diagnosis with MFF-HistoNet: a multi-modal feature fusion network integrating CNNs and quantum tensor networks. J. Big Data 12(1), 60 (2025).
Article Google Scholar
Mahmood, T., Saba, T., Al-Otaibi, S., Ayesha, N. & Almasoud, A. S. AI-Driven microscopy: Cutting-edge approach for breast tissue prognosis using microscopic images. Microsc. Res. Tech. 88(5), 1335–1359 (2025).
Article PubMed Google Scholar
Abdollahzadeh, B., Gharehchopogh, F. S. & Mirjalili, S. African vultures optimization algorithm: A new nature-inspired metaheuristic algorithm for global optimization problems. Comput. Ind. Eng. 158, 107408 (2021).
Article Google Scholar
Abasi, A. K. et al. Lemurs optimizer: A new metaheuristic algorithm for global optimization. Appl. Sci. 12(19), 10057 (2022).
Article CAS Google Scholar
Cai, L., Long, T., Dai, Y. & Huang, Y. Mask R-CNN-based detection and segmentation for pulmonary nodule 3D visualization diagnosis. IEEE Access 8, 44400–44409 (2020).
Article Google Scholar
Takahashi, N. & Mitsufuji, Y. Multi-scale multi-band densenets for audio source separation. In: 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (2017).
Bairwa, A. K., Joshi, S. & Singh, D. Dingo optimizer: A nature-inspired metaheuristic approach for engineering problems. Math. Probl. Eng. 2021, 2571863 (2021).
Article Google Scholar
Bader, A. S., Jummar, W., Jasem, F. M. & Al-Yousif, M. Eurasian oystercatcher optimiser: New meta-heuristic algorithm. J. Intell. Syst. 31, 332–344 (2022).
Google Scholar
Liang, J. Image classification based on RESNET. J. Phys: Conf. Ser. 1634, 012110 (2020).
Google Scholar
Tammina, S. Transfer learning using VGG-16 with deep convolutional neural network for classifying images. Int. J. Sci. Res. Publ. 9, 143–150 (2019).
Google Scholar
Xu, G., Shen, X., Chen, S., Zong, Y. & Zhang, C. A deep transfer convolutional neural network framework for EEG signal classification. IEEE Access 7, 112767–112776 (2019).
Article Google Scholar
Hasan, D. A. & Abdulazeez, A. M. A modified convolutional neural networks model for medical image segmentation. Test Eng. Manage. 20, 22 (2020).
Google Scholar
Chang, B., Wang, Y., Zhao, X., Li, G. & Yuan, P. A general-purpose edge-feature guidance module to enhance vision transformers for plant disease identification. Expert Syst. Appl. 237, 121638 (2024).
Article Google Scholar
KP, A. R. & Gowrishankar, S. Convnext-based mango leaf disease detection: Differentiating pathogens and pests for improved accuracy. Int. J. Adv. Comput. Sci. Appl. 14(6), 2023.
Kalpana, P., Anandan, R., Hussien, A. G., Migdady, H. & Abualigah, L. Plant disease recognition using residual convolutional enlightened Swin transformer networks. Sci. Rep. 14(1), 8660 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Ma, W. et al. Crop disease detection against complex background based on improved atrous spatial pyramid pooling. Electronics 12(1), 216 (2023).
Article Google Scholar
Chen, S., Yang, W. & Chen, X. Automatic recognition of agriculture pests with balanced feature pyramid network. Appl. Eng. Agric. 40(5), 525–535 (2024).
Article Google Scholar

Download references

Acknowledgements

I would like to express my very great appreciation to the co-authors of this manuscript for their valuable and constructive suggestions during the planning and development of this research work.

Funding

Open access funding provided by Manipal University Jaipur. This research did not receive any specific funding.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Sharda University, Greater Noida, Uttar Pradesh, 201310, India
Kalicharan Sahu
Department of Computer Science & Engineering, Sharda University, Greater Noida, Uttar Pradesh, 201310, India
Shivam Tiwari
School of Computer and Systems Sciences, Jawaharlal Nehru University (JNU), New Delhi, 110067, India
Manoj Kumar Singh
Rustamji Institute of Technology, Tekanpur, BSF Academy, Tekanpur, Gwalior, Madhya Pradesh, 475005, India
Jankisharan Pahareeya
Department of Artificial Intelligence and Machine Learning, Manipal University Jaipur, Jaipur, Rajasthan, 303007, India
Harish Kumar Shakya
Department of IoT and Intelligent Systems, Manipal University Jaipur, Jaipur, Rajasthan, 303007, India
Gyanendra Kumar
School of Built Environment, Engineering and Computing, Leeds Beckett University, Leeds, LS6 3HF, UK
Shitharth Selvarajan

Authors

Kalicharan Sahu
View author publications
Search author on:PubMed Google Scholar
Shivam Tiwari
View author publications
Search author on:PubMed Google Scholar
Manoj Kumar Singh
View author publications
Search author on:PubMed Google Scholar
Jankisharan Pahareeya
View author publications
Search author on:PubMed Google Scholar
Harish Kumar Shakya
View author publications
Search author on:PubMed Google Scholar
Gyanendra Kumar
View author publications
Search author on:PubMed Google Scholar
Shitharth Selvarajan
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: K.S. and S.T.; Methodology: M.K.S., S.S.; validation: J.S.P. and H.K.S.; Formal analysis: H.K.S., S.S. and G.K.; investigation: H.K.S., S.S. and J.S.P.; resources: K.S. and M.K.S.; data curation: G.K. and J.S.P.; writing original draft preparation: K.S., S.T. and M.K.S.; writing review and editing: H.K.S. and G.K.; visualization: K.S. and J.P.S.; supervision: H.K.S. and J.P.S.; project administration: M.K.S. and S.T.

Corresponding authors

Correspondence to Kalicharan Sahu or Harish Kumar Shakya.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sahu, K., Tiwari, S., Singh, M.K. et al. Plant disease detection using a hybrid dilated CNN with attention mechanisms and optimized mask RCNN segmentation. Sci Rep 15, 42008 (2025). https://doi.org/10.1038/s41598-025-26192-w

Download citation

Received: 08 July 2025
Accepted: 27 October 2025
Published: 23 November 2025
Version of record: 25 November 2025
DOI: https://doi.org/10.1038/s41598-025-26192-w

Subjects

Abstract

Similar content being viewed by others

AI based real time disease diagnosis in plants using deep learning driven CNNs

Deep learning-based disease detection in potato and mango leaves: a comparative study of CNN, AlexNet, ResNet, and EfficientNet

Classification of rice plant diseases using efficient DenseNet121

Introduction

Literature survey

Related works

Problem specifications

Illustration of plant disease detection: adaptive segmentation and classification model

Proposed system of plant disease detection

Plant disease dataset

Novel heuristic algorithm: AVLO

Dilated, adaptive, and attention-based mask RCNN for segmentation to detect the plant disease

Model of mask RCNN

Dilated and attention in mask RCNN

Suggested DAA-MRCNN for segmentation

Identifying the plant disease using dilated, adaptive, and attention-based multiscale densenet with AVLO

Multiscale densenet

Recommended DAA-MDeNet for classification

Imaging results

Results and discussion

Simulation setup

Performance measures

Determination of classification process using algorithms and classifiers

Statistical determination over the segmentation process using algorithms and classifiers

Analysis on the confusion matrix for the plant disease classification model

Overall analysis for plant disease classification model

Overall analysis for plant disease classification model using algorithms

Accuracy report for plant disease classification model using classifiers in terms of datasets

Ablation study of the suggested model

Comparison of the suggested model with the recent architectures

Generalizability and robustness of the suggested model, varying lighting, occlusions, or camera angles

Comparison of the presented model with other multi-scale strategies

Attention maps of the proposed models

Comparison of the AVLO with other evolutionary algorithms

Accuracy margin of the proposed model compared to the direct classification on raw images

Conclusion

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links