Abstract
Precise pest classification plays an essential role in smart agriculture. Crop yields are severely impacted by pest damage, which poses a critical challenge for agricultural production and the economy. Identifying pests is of utmost importance, but manual identification is both labor-intensive and time-consuming. Therefore, the realm of pest identification and classification requires more advanced and effective techniques. The proposed work presents an innovative automatic approach based on the incorporation of deep learning in smart farming for pest monitoring and classification to tackle this challenge. In this work, the IP102 dataset is used to identify and classify 82 classes of pests. Autoencoder is utilized to address data imbalance issue by generating augmented images. RedGreenBlue colour code and object detection techniques are employed to localize and segment pests from the field images. Finally, these segmented pests are classified using Convolutional neural networks. The Average Intersection of Union (IoU) of object detection used for pest segmentation is 80%. The proposed classification model achieved an accuracy of 84.95% with the balanced dataset, outperforming the existing model. Identifying the count of pests in the image helps in determining the extent of pest damage. The results showcase the potential of this approach to revolutionize traditional pest monitoring methods, offering a more proactive and precise strategy for pest control in agricultural settings. This research work contributes to the advancement of smart farming practices through intelligent pest classification for pest control.
Similar content being viewed by others
Introduction
Effective pest management is a cornerstone of sustainable agriculture, as the presence of pests significantly impacts crop productivity and quality. Traditional pest control methods often rely on manual labor for pest identification and monitoring, a process that is labor-intensive, time-consuming, and human error-prone procedure. Assessing the density of the pests present in the crops is necessary for pest forecasting decisions. As pest-related challenges become increasingly complex due to climate change and the globalization of agriculture, there is a growing need for automated, accurate, and scalable solutions. Modern machine learning and deep learning techniques, especially Convolutional Neural Networks (CNNs), have become a potential method for automating the detection, location, and classification of pests in agriculture1. Recent advancements in deep learning approaches, particularly Convolutional Neural Networks (CNNs), have significantly enhanced object detection and localization in natural environments, leading to transformative applications in agriculture1. Pest identification and classification are critical for effective pest management, providing insights into pest behavior, life cycles, and ecological roles, which are essential for designing targeted control measures2. Given the diversity of pest species, a systematic classification approach is necessary to manage pests effectively, categorizing them into hierarchical groups such as species, genus, and family3.
Earlier efforts in pest detection relied on traditional image processing techniques, focusing on enhancing image quality through noise removal, geometric correction, and other preprocessing steps. However, these methods faced limitations in scalability and adaptability4. The advent of CNNs revolutionized this field, enabling efficient image segmentation, classification, and localization of pests. For instance, Wang et al. used the Pest24 dataset to train AI models for real-time pest monitoring in agriculture, although challenges such as dense pest distributions and small object sizes persist5,6,7. Image segmentation plays a vital role in pest detection, dividing images into meaningful regions to identify pests based on unique visual features. Techniques such as entropy-based thresholding, color-based segmentation, and advanced models like Mask R-CNN have demonstrated significant improvements in pest localization8,9. Additionally, Mask2Former has emerged as a versatile segmentation architecture, excelling in panoptic, semantic, and instance segmentation tasks, though further analysis of its computational efficiency is needed10,11. Despite these advancements, there remains a need for comparative evaluations of traditional and modern segmentation approaches to address diverse pest scenarios12,13.
Balancing datasets is crucial for enhancing pest classification accuracy. Autoencoders have been employed to address imbalances, enabling CNNs to classify pests more effectively. This approach has achieved superior accuracy relative to cutting-edge models like DenseNet and ResNet14,15. Data augmentation techniques, such as GridMask and Progressive and Selective Instance-Switching (PSIS), have further improved model robustness, though challenges related to scalability and applicability to diverse datasets remain16,17. Deep learning frameworks have increasingly been adopted for pest classification. Methods such as DeepPest and SAFFPest leverage mobile vision-based approaches and self-attention feature fusion, respectively, to achieve significant accuracy improvements in pest detection18,19. Rui Li et al. employed data augmentation and image preprocessing techniques, including Watershed and Grabcut algorithms, achieving an impressive 94.61% classification accuracy20. Furthermore, metaheuristics combined with deep learning offer robust approaches for pest identification, though their performance in diverse agricultural environments requires further validation21.
Synthetic pesticides have traditionally been used for pest management but pose environmental and ecological risks. Research has explored alternatives, such as integrating deep learning with biological control methods, to develop sustainable pest management solutions22. Najwa Seddiqi Kallali et al. highlighted the efficacy of symbiotic bacteria and entomopathogenic nematodes (EPNs) in pest control, emphasizing the need for further exploration of their scalability23. Similarly, studies on pesticide impacts and nanotechnology-based mitigation strategies have paved the way for environmentally friendly solutions, although practical implementation challenges remain24,25. Recent advancements in data augmentation strategies, such as GridMask and PSIS, have proven effective in enhancing detection algorithms for pest classification tasks. These methods address challenges like dataset biases and high object similarity, enabling models to achieve better generalization across datasets26,27. Innovative segmentation techniques, including entropy-based and clustering methods, have also been employed to improve pest detection, although comprehensive performance evaluations are needed28,29.
Moreover, cutting-edge pest detection models such as SAFFPest and self-attention-based architectures have shown promise in addressing limitations of earlier models. However, their scalability to diverse datasets and computational efficiency requires further research30,31. Researchers have also explored integrating CNNs with metaheuristics and novel feature extraction techniques to enhance pest classification accuracy, achieving superior performance compared to traditional methods32,33,34. In addition, new technologies like the self-attention feature fusion model (SAFFPest) have accurately detected pests in rice crops, improving detection accuracy for pests like rice leaf caterpillars and rice leaf rollers. Despite its success, further research is required to explore the model’s scalability and adaptability to other pest types and agricultural environments35. Studies on the integration of nanotechnology have also explored innovative ways to mitigate pesticide usage in agriculture. Adsorption methods, utilizing various nanomaterials, have shown effectiveness in pesticide reduction, which could benefit pest management strategies in the long term36. Furthermore, research into entomopathogenic nematodes (EPNs) and their symbiotic bacteria for pest management has revealed their potential, although limitations in real-world applications persist37. Researchers have also examined the environmental impacts of synthetic pesticides, underscoring the need for more sustainable pest management practices in agriculture38.
The incorporation of CNNs in real-time pest detection has also been explored through systems integrating both image processing and temporal analysis. Md. Akkas Ali et al. proposed a system using CNN and Bidirectional Long Short-Term Memory (Bi-LSTM) technique for pest detection, suggesting a hybrid approach that could further improve real-time agricultural monitoring39. Additionally, recent work has introduced a novel precision weeding too especially designed for maize fields. This device, utilizing a spiral tendon-type cutter head, addresses the challenges of traditional mechanical weeding techniques and significantly enhances weed removal efficiency. The theoretical analysis and mechanical design of the study enable optimization of working parameters such as speed, depth, and thrust, ensuring high performance with a greater than 95% weed eradication rate and a minimum of 3% wounding rate, showcasing its potential for integration into sustainable pest management practices40.
This paper resolves key challenges in pest monitoring and management, such as data imbalance and the accuracy of pest identification and classification. In contrast to conventional methods that often rely on generic, broad-spectrum solutions, our work proposes a refined approach that directly tackles these challenges through a combination of advanced techniques. By leveraging Autoencoders for dataset balancing, YOLO for precise segmentation, and Convolutional Neural Networks (CNNs) for detailed pest classification, we aim to create a more robust and scalable model for pest detection. This approach improves the overall effectiveness and accuracy of pest monitoring systems by improving each component—dataset processing, image segmentation, and classification—going beyond conventional pest control techniques. The proposed solution addresses the limitations of existing approaches, providing a more focused and sustainable framework for pest management in agricultural environments, while mitigating the risks posed by ineffective or indiscriminate pest control methods.
Proposed framework
This section explains the methodology of the proposed approach for efficient classification of pests, balancing the dataset, segmenting and increasing the accuracy. The proposed method uses an Autoencoder for balancing the pest image dataset and then with the balanced dataset, it uses CNN for training the pest classification module. The tools can capture crop images and pests from the segmented image with the help of RGB color code and object detection. The input to CNN is the segmented images to detect and classify pests. Object segmentation helps to count the pests present in the image, which involves experimenting with two different object segmentation techniques to isolate the pests from the image. Figure 1 illustrates the entire workflow of the proposed methodology and Fig. 2 represents the flowchart of the proposed framework.
Data augmentation
IP102 is a massive dataset for pest classification with 102 pest varieties with 75,222 image instances26. Figure 3 displays the distribution of the classes across the 102 pest varieties in the IP102 dataset, illustrating the highly imbalanced nature of the class distribution. The unbalanced distribution of instances in the IP102 dataset can greatly influence the accuracy of Machine Learning models trained on it. Minority classes with fewer instances may challenge the models to classify effectively. When training and evaluating models, addressing imbalanced datasets is essential. Balancing such datasets effectively achieved with data augmentation. Basic image augmentation methods cause exposure, rotating, and resolution changes. There are neural network methods available that can help expansion of the dataset. The Autoencoder generates new images and combines them with multiple images of the same class to create augmented images.
Combining features
Autoencoders are a powerful tool for addressing sample imbalances. They leverage unsupervised learning and focus on feature learning to capture essential characteristics of both majority and minority classes. They also capture informative features and can generate synthetic samples for the minority class. Autoencoders are versatile and adaptable to imbalances, making them effective in resolving sample imbalances in scenarios with skewed class distributions.
The primary components of Autoencoder neural network are the encoder and the decoder. The encoder reduces the input before com-pressing it, while the decoder employs this smaller version to reconstruct the initial input. The extracted features with the gradient de-scent algorithm use to modify an input image, and the loss between the input and output im-ages is calculated to reduce errors. The VGG19 architecture is a popular convolutional neural network model used for image classification. VGG19 has learned to extract useful features from images through pre-training on ImageNet. When using VGG19 for feature extraction, the bottom layers of the network extract simple features like edges and textures, whereas the higher layers extract complex features like object parts. The VGG19 Network extracts features from particular layers. They are:
-
1.
Block1_conv1.
-
2.
Block2_conv1.
-
3.
Block3_conv1.
-
4.
Block4_conv1.
-
5.
Block5_conv1.
-
6.
Block5_conv2.
The values of the multiple intermediate feature maps represent the image’s content. Every layer extract different type of features from the image. Figure 4 represents the architecture of the Autoencoder for augmentation.
The Autoencoder process for generating augmented images follows a mathematical framework comprising three key steps: encoding, decoding, and loss calculation. During the encoding step, for a given input image X, the encoder E transforms it into a latent representation Z as:
Where \(\:{W}_{e}\) and \(\:{b}_{e}\) are the weights and biases of the encoder, and f is a non-linear activation function like ReLU. In the decoding step, the decoder D reconstructs the input image X′ from the latent space Z using
Where \(\:{W}_{d}\) and \(\:{b}_{d}\) are the weights and biases of the decoder, and g is another activation function. Finally, the loss function evaluates reconstruction accuracy, typically measured using Mean Squared Error (MSE):
Where N represents the total number of pixels in the image. This loss function ensures that the reconstructed image closely resembles the original, enabling the Autoencoder to generate high-quality augmented images.
The Autoencoder used for image augmentation processes resized RGB input images of size 224 × 224 × 3. The encoder consists of six convolutional layers, utilizing ReLU (Rectified Linear Unit) as the activation function, and extracts feature representations using a latent space dimension of 256. The decoder consists of six transposed convolutional layers, utilizing ReLU activations in the hidden layers and a sigmoid activation in the output layer. The model is trained with the Mean Squared Error (MSE) loss function and optimized with the Adam optimizer (learning rate: 0.001). The training process runs for 50 epochs with a batch size of 32. To enhance feature extraction, a pretrained VGG19 network is employed, leveraging gradient-based feature map optimization for augmentation. The dataset originally contained 28,840 images, which increased to 39,990 images after augmentation.
We first calculate the mean and correlation for each feature map we generate to generate augmented images. The mean value estimated by multiplying the feature vector with itself at every location and then averaging it across all locations. Our objective is to transform the input image into the desired augmented image. We achieve this by using the gradient descent algorithm to generate the target image. Loss function has used to determine whether the current image has lost any features relative to the target image. Finally, we obtain the gradient and apply it using an optimizer. Figure 5 illustrates the sample input and output for augmenting the image. Creating augmented image using Algorithm 1 as follows:
Weighted loss calculation
Mean-square error estimates the loss of the in-put and the target image. The weight of each image analyzed for respective loss and the final loss results by summing up those multiplied losses. The following Algorithm 2 used to calculate the weighted loss as:
Feature selection
It is possible to obtain different types of images by changing the features to extract and merging with the other image. To fine-tune the feature selection process for an image, one can adjust the list of layer names utilized for feature extraction. The outputs obtained through selecting various feature maps showed in the following Fig. 6.
Figure 6a extracts features in the first layer of the convolutional block by copying small details from another image. Then, it applies the color of that image to the content image in Fig. 6b, resulting in different feature maps that produce various types of images. By changing the layers used for feature extraction, this method can create multiple types of data, which helps balance and optimize the dataset for better classification. To achieve a balanced class distribution, we randomly select and augment two images from the same category until each class has between 500 and 1000 images. The augmented dataset trained by using a convolutional neural network. Figure 7 illustrates the quantity of images in every class after balancing the dataset. Some classes increase in the number of images, whereas others undergo under-sampling due to the extensive data.
Pest segmentation and overlap elimination
The image taken from the crop field may contain multiple pests in it. CNN classifiers cannot find various classes of pests in a single pass. For classifying the pest using CNN, the image should not contain multiple types of pests in a single image. Pests’ location in the image is necessary to separate them from the rest. This section explains the process of pest segmentation and elimination of overlapping in the segmentation.
Segmentation using RGB color codes
We first use color-based object segmentation to segment the image and locate pests as shown in Algorithm 3. RGB color codes segmentation offers simplicity, computational efficiency and compatibility with existing tools, making it intuitive and easy to interpret. Standard image processing workflows align well with the use of natural color information in images, enabling efficient transfer learning with pre-trained models. We eliminate crops in the background by removing any green parts of the image. To create a segmented pest image, first designate the background in the image. Then, use the remaining portion as the foreground and generate a mask. This mask has applied to the Grabcut algorithm to obtain the segmented pest image. We remove any segmented parts that occupy a small area and count the remaining pest parts to count pests in the image. This method effectively segments the pest image and obtains the location of the pest.
Therefore, the image captured from the field typically shows a predominantly green color due to the crops. Removing the crops from the background is a simple task in images using RGB color space. To segment pests, we remove the green-colored crops from an image. To identify the background, we examine the RGB data of every pixel. To simplify the task of differentiating pests from crops, we identify a green pixel with a higher value than the red and blue pixels to determine the background. This approach simplifies the task of differentiating pests from crops. Figure 8 divides the mask into segments, with the black background and the red foreground from the original image as shown in Fig. 9.
The segmented mask had unclear pixel values due to lighting, contrast, and posture changes, resulting in errors. To mitigate the impact of lighting variations, the proposed system incorporates several strategies like Grabcut Algorithm for refinement and Threshold area filtering. After initial segmentation using RGB thresholds, the Grabcut algorithm was applied to refine the segmented masks. Grabcut leverages a Gaussian mixture model to estimate the color distribution of foreground (pests) and background (crops), improving robustness against minor lighting inconsistencies. This step helps reduce errors caused by uneven illumination and ensures more accurate pest localization. Threshold area filtering eliminates small segments below a predefined threshold area during post-processing. This filtering step reduces noise introduced by lighting artifacts, such as isolated bright or dark spots that do not correspond to actual pests. We utilized the mask as input for the existing Grabcut algorithm. Our system employed a Gaussian mixture model to determine the color distribution of the target object and the background. We took an image and its corresponding masks as inputs and successfully divided the pest into segments using Grabcut, as shown in Fig. 10 (c) displays the resulting mask. We only considered the segments that exceeded a specific threshold value to obtain an accurate pest count. We also removed small area segments and highlighted the final segmented pest in Fig. 10 (a) with a red overlay as shown in Fig. 10 (b).
Object detection
The RGB color space method effectively identifies distinct pests in an image that are not overlapping. Nevertheless, in some cases, the segment may contain overlapping pests, leading to multiple pests in a single segmented image.
Figure 11 shows two pests in a single image, segmented separately. However, in Fig. 12, both pests overlap in a single segment. To address this issue, we use object detection, which allows us to position multiple objects in a single image with bounding boxes. This technique helps to separate the overlapped pests in the image. To develop an object detection module, we need a dataset of images with bounding boxes around the objects of interest for training. We created 2,359 images with bounding boxes and trained them using the YOLOv3 object detection architecture as shown in Fig. 13. YOLO was chosen primarily due to its real-time detection capabilities, end-to-end architecture, and excellent balance between speed and accuracy, making it well-suited for pest detection tasks in agricultural settings. While Faster R-CNN and SSD are also valid object detection models, YOLO’s efficiency, scalability, and practicality in field applications offer a distinct advantage for this research.
The model extracts significant features from the images by using filters through convolution layers, assessing the probability of an object’s presence within a specific bounding box. The Leaky ReLU activation function solves the “dying ReLU” issue by applying a small negative slope for negative input values, determined by a constant α value. The function expresses as f(x) = 1(x < 0)(αx) + 1(x > = 0)(x). Figure 14 illustrates a more detailed understanding of the object detection model utilized.
Pest classification
Accurate identification of the specific type of pest present in a crop is crucial for preventing pest damage. This module is vital in determining the appropriate pesticide to eliminate the pest. The input to this module is the segmented image of the pest produced by the previous process. The Convolutional Neural Network identifies crucial features from the segmented image of the pest, resulting in accurate pest images classification. CNN automatically and independently detects advanced features, providing a significant advantage in pest image analysis.
A Convolutional Neural Network has input layers, hidden layers comprising convolutional layers, ReLU activation layers, pooling layers, fully connected layers and output layers. The convolutional layers execute the convolution process on the input data and pass it to the next layer. The pooling layers merge the outputs of neuron groups in the preceding layer into a single neuron in the subsequent layer. Fully connected layers create connections between neurons in one layer and every neuron in the next layer.
The ResNet-50 model contains five stages, each consisting convolutional and identity blocks. Every convolutional block has three convolutional layers, while every identity block has three. Res-Net-50 is a powerful backbone model widely used in computer vision tasks, with over 23 million trainable parameters. ResNet incorporates “skip” connections that link the previous layer’s output to the next to ad-dress the vanishing gradients issue. ResNet also introduces the concept of a residual network as illustrated in Fig. 15 to combat degradation by bypassing specific layers’ training and directly connecting the skip connection to the output. Figure 14 shows an example of the output of object detection process. Note that the color segmentation procedure only resulted in a single segment for this particular image, as Fig. 12 shows. This example demonstrates how accurately distinguishing between overlapping pests and images is possible through the object detection process. The module captures and separates multiple pests from each drone or camera image. The following mod-ule then identifies each segmented pest class. Including skip connections in the network architecture offers the advantage of bypassing any layer that may negatively influence the model’s performance, effectively regularizing the network. This method allows for training neural networks that are very deep without experiencing problems with vanishing or exploding gradients. The input images used for training have dimensions of 227 × 227 × 3, with 39,990 images used for training the model.
After augmenting the image data, we add an average-pooling layer in the skip connection to smooth the image and preserve its contrast information. Adding residual blocks with convolutional layers to the neural network increased the number of layers in the model to 152, compared to the original 50 layers. Due to these modifications, the number of trainable parameters increased to 66 million, Fig. 16 illustrates the intermediate results of CNN. Consequently, the model’s accuracy has improved when compared to the previous version.
Pest severity
Agricultural crop field covers a vast area, and manual analysis of the pest intensity or severity in that area is impossible, but technology helps with easy identification. Segmenting the pest images has proposed in detail in the second module. Counting the segments in images of pests on plants allows us to determine the count of pests and assess the ex-tent of pest damage in the field as illustrated by the Fig. 17. Pests are segmented and calculated using different methods. Determining the quantity of pesticide required to manage the pests achieved by utilizing the extent of pest damage. Identifying the type of pest is crucial for determining the ap-propriate pesticide to control them. Moreover, calculating the necessary amount of pesticide required can be done by understanding the severity of the pest damage.
Based on their classification, we created a dataset recommending pesticides for pests. However, using pesticide amounts without taking into ac-count the severity of the pest infestation can result in detrimental consequences, such as contamination and death of domestic animals, depletion of natural pest antagonists, development of pesticide resistance, decline in honeybee populations and pollination, damage to adjacent crops, loss of fishery and bird populations, and groundwater contamination. Limiting the use of pesticides to the necessary amount required for pest control can promote healthier crop growth and maintain a more environmentally sustainable approach to farming.
We collected crop images from the internet to train a neural network model similar to the one used for pest classification to create a crop detection module. The classified crop species are alfalfa, beet, citru, corn, groundnut, mango, rice, vitis and wheat. We have identified the pests’ location and the plant species affected by the pests. Table 1 displays the count of pests classified for each plant species. For instance, Alfalfa plants are affected by alfalfa plant bug, blister beetle, alfalfa weevil, etc. Cultivators of various crops can gain a better understanding of pests, which can be helpful.
Materials and methods
Pest dataset
We used the dataset “IP102” with over 39,990 insect pest images of 82 different types, thoroughly tested for identification26. We used 70% of the images to train the model and 30% of the images to test the model. We employed deep neural net-works, specifically CNNs, and trained the model on a balanced and augmented dataset to classify the images. Table 2 provides the name of each pest category and the number of images used to classify them and also represents the comparison between before and after augmentation.
Tools and platform used
OPENCV offers essential features like back-ground removal, filters, pattern matching, and classification for image processing tasks. We utilized the Colab platform to develop and assess the Neural Network. We trained the CNN model using Darknet, a free and open-source C and CUDA neural network framework. The Google Colab platform provided GPUs to speed up the training process. We used the system specification provided below to train the model.
-
12GB System memory.
-
Intel(R) Xeon(R) CPU @ 2.20 GHz.
-
NVIDIA-SMI 455.38, CUDA Version: 10.1, Tesla P4, 33 MHz.
We configured the threshold area for objects in pest segmentation to be 500 square pixels and decided the learning rate for CNN object detection to 0.001.
Result and discussion
CNN training results
The Resnet model proved to be the most effec-tive for pest classification after testing various types of CNN. The dataset used for training consisted of 39,990 images. We trained the model in batches of 100 images due to the large dataset, with each model having varying input image sizes. We achieved an average loss of less than 0.3 dur-ing the training process and set a learning rate 0.001. We saved the model at the end of every 100 epochs. Figures 18 and 19 depict that each trained CNN model attained a training accuracy of 91.91%.
Classification performance metrics
The experimental model employed different evaluation metrics namely, accuracy, precision, recall, and F1-score to measure the pest classification performance. These evaluation metrics are represented by the following Eq. 4 to 7 as
The True Positive (TP) represents the predicted pest class is accurately classified. The True Negative (TN) indicates to other categories that do not fall into the present pest class. The False Positive (FP) indicates the inaccurately classified pest class type. The False Negative (FN) pertains to the present pest class category that was inaccurately classified and did not fall into the existing pest class. The precision metric shows how many anticipated positive points are positive. The recall metric indicates the number of total positive points is truly positive. Table 3 presents the testing accuracy of each class before and after augmentation.
Figure 20 displays the classification accuracy results of CNN on each subclass. Conversely, Figs. 21 and 22 depict the feature embedding visualization of the augmented dataset utilizing t-SNE21.
Comparative study
We created a balanced dataset using an autoencoder from the proposed work and compared models for classification trained with augmented and unaugmented images to determine its effectiveness. Table 4 compares CNN models using augmented and unaugmented datasets, revealing that the balanced dataset led to an overall increase in accuracy. While the performance improvement of 22.95% may appear modest in isolation, it is highly significant when considering the practical applications of pest detection in agriculture. This improvement can lead to more reliable and efficient pest management systems, with potential benefits for both crop yield and environmental sustainability. The factors contributing to this gain—such as the novel combination of Autoencoders, YOLO, and CNNs, along with optimized dataset handling and segmentation techniques—work together to create a powerful tool for future pest control applications.
Figure 23 shows that some classes exhibit low accuracy, while half have accuracy above 80%. However, we increased training accuracy to 99% and testing accuracy to 85.48% by bringing the total number of classes down to 46 and training the same model. Table 5 compares the overall count of categories in the data set and their corresponding accuracy for pest classification, demonstrating accuracy levels ranging from 76% to over 90%. Our results are comparable to related literature that typically only uses a few classes (less than 16) compared to the dataset we utilized. For ex-ample, the Greenhouse pest20 achieved 91% accuracy with five classes, DenseNet20114 achieved 92.43% accuracy with ten classes, GAEnsemble29 achieved 95.16% accuracy with ten classes, Crop pest1 achieved 98.91% accuracy with ten classes and Mango pest8 achieved 76% accuracy with sixteen classes.
Object detection results
We trained the YOLO model using a 2359 image training set and achieved a maximum Intersection over Union (IoU) of over 80% after 8000 epochs. Figure 24 depicts the reduction in loss to around 0.39 as we utilized bounding boxes to segment pests for detection with Yolo. Intersection over Union compares bounding box to the ground truth of pest image using A and B as two sections of region proposals as given in Eq. (8).
Conclusion
The proposed work presents a significant advancement in pest monitoring and classification for field crops. The research encompasses various techniques such as dataset augmentation, segmentation, and pest classification, which are essential tools for enhancing crop yield and supporting farmers. In this work, the dataset named IP102 has been employed to identify and classify 82 classes of pests. The proposed work utilizes an augmentation technique called Autoencoders to balance the image dataset. RGB colour and object detection techniques are used for pest monitoring and classification. The proposed augmentation technique can generate diverse types of images by applying various texture and colour features. Augmenting image datasets across multiple fields has broad applications for this technique. The proposed system explores pest segmentation for classifying and identifying the severity of the pests in the crops. It can locate and segment the pests from the crops, and with that segmented pest image, it can identify the class of the pest with the help of CNN. The accuracy of CNN is 84.95% after balancing the dataset with the proposed system which surpasses the existing system by 22.95%. The Average IOU of object detection used for pest segmentation is 80%. Identifying the type of pest and quantifying their numbers on plants provides valuable information regarding pest infestations and damage severity in the field. This information can guide farmers in deciding which pesticides to use and the appropriate quantity to apply, resulting in increased yield and decreased production costs.
The model’s performance varied across pest classes, with Class 71 (Miridae) showing the lowest accuracy (42%), likely due to high intra-class variance and similarity with other pests. Class 25 (Aphids) and Class 52 (Therioaphis maculata Buckton) also performed poorly (48% accuracy), possibly due to their small size and varied appearances. Factors such as visual similarities between classes, imbalanced data before augmentation, and environmental variations contributed to misclassifications. While overfitting and misclassification of pest categories posed challenges during the training process, we utilized various strategies like data augmentation, class balancing, regularization, and robust validation techniques to address these issues. These efforts significantly improve the generalization capability of our model, decrease the impact of overfitting and increase the accuracy of pest detection and classification across different species. Dropout layer is integrated into the CNN layers to randomly disable a portion of neurons throughout the training to prevent the model from becoming overly dependent on specific features. Batch normalization helps stabilize and accelerate learning by normalizing activations across mini-batches, reducing internal covariate shifts. Additionally, early stopping allows training to halt once the validation performance ceases to improve and prevents unnecessary overfitting. Future work on this model will focus on expanding the dataset to 102 pest classes, addressing environmental variability using domain adaption, improving real-time deployment with YOLO-tiny and model compression techniques. and incorporating multimodal data sources. Additionally, investigating few-shot learning and integrating pest management strategies will enhance the overall impact of the research. These efforts will contribute to more efficient, scalable, and sustainable pest control practices in agriculture.
Data availability
The datasets generated and/or analysed during the current study are publicly available in the following link [https://github.com/xpwu95/IP102].
References
Ayan, E., Erbay, H. & Varçın, F. Crop pest classification with a genetic algorithm-based weighted ensemble of deep convolutional neural networks. Comput. Electron. Agric. 179, 105809 (2020).
Bodhe, T. S. & Prachi Mukherji Selection of color space for image segmentation in pest detection, In 2013 International Conference on Advances in Technology and Engineering (ICATE), pp. 1–7. IEEE, (2013).
Chen, P. & Liu, S. Hengshuang Zhao, and Jiaya Jia, Gridmask data augmentation. Preprint at arXiv:2001.04086 (2020).
Ghiasi, G. et al. Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation. Preprint at arXiv:2012.07177 (2020).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In: Proc. IEEE conference on computer vision and pattern recognition. pp. 770–778, (2016).
Jiao, L., Dong, S., Zhang, S., Xie, C. & Wang, H. AF-RCNN: an anchor-free convolutional neural network for multi-categories agricultural pest detection. Comput. Electron. Agric. 174, 105522 (2020).
Haq, S. I. U., Raza, A., Lan, Y., Wang, S. & Identification of Pest Attack on Corn Crops Using. Machine Learning Techniques. Eng. Proc. 56(1), 183. https://doi.org/10.3390/ASEC2023-15953 (2023).
Kusrini, K. et al. Data augmentation for automated pest classification in Mango farms. Comput. Electron. Agric. 179, 105842 (2020).
Li, R. et al. An effective data augmentation strategy for CNN-based pest localization and recognition in the field. IEEE Access. 7, 160274–160283 (2019).
Li, R., Wang, Y. H. & Minh Dang, L. Abolghasem Sadeghi-Niaraki, and Hyun Joon Moon, crop pest recognition in natural scenes using convolutional neural networks. Comput. Electron. Agric. 169, 105174 (2020).
Lin, T. Y. et al. Microsoft coco: Common objects in context, In European conference on computer vision. pp. 740–755. (Springer, 2014).
Liu, W. et al. Ssd: Single shot multibox detector In European conference on computer vision. pp. 21–37. (Springer, 2016).
Liu, W., Wu, G., Ren, F. & Kang, X. DFF-ResNet: an insect pest recognition model based on residual networks. Big Data Min. Analytics 3 (4), 300–310 (2020).
Nanni, L., Maguolo, G. & Pancino, F. Insect pest image detection and recognition based on bio-inspired methods. Ecol. Inf. 57, 101089 (2020).
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: Unified, real-time object detection. In: Proc. IEEE conference on computer vision and pattern recognition. pp. 779–788. (2016).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at arXiv: 1409.1556, (2014).
Apurva Sriwastwa, S., Prakash, S., Swarit, K., Kumari & Sitanshu Sekhar Sahu Detection of pests using color based image segmentation, In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT). pp. 1393–1396. (IEEE, 2018).
Song, Y. et al. Identification of the Agricultural Pests Based on Deep Learning Models. In 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). pp. 195–198. (IEEE, 2019).
Ren, F., Liu, W. & Wu, G. Feature reuse residual networks for insect pest recognition. IEEE Access. 7, 122758–122768 (2019).
Rustia, D. J., Arcega, J. J. & Chao Lin‐Ya Chiu, Ya‐Fang Wu, Jui‐Yung Chung, Ju‐Chun Hsu, and Ta‐Te Lin, automatic greenhouse insect pest detection and recognition based on a cascaded deep learning classification method. J. Appl. Entomol. (2020).
Van der Maaten, L. & Geoffrey Hinton Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 11 (2008).
Wang, F., Wang, R., Xie, C., Yang, P. & Liu L. Fusing multi-scale context-aware information representation for automatic in-field pest detection and recognition. Comput. Electron. Agric. 169, 105222 (2020).
Wang, H., Wang, Q., Yang, F., Zhang, W. & Wangmeng Zuo. and. Data augmentation for object detection via progressive and selective instance-switching. Preprint at arXiv:1906.00358 (2019).
Wang, Q. J. et al. Pest24: A large-scale very small object data set of agricultural pests for multi-target detection. Comput. Electron. Agric. 175, 105585 (2020).
Wang, R. et al. AgriPest: A Large-Scale Domain-Specific benchmark dataset for practical agricultural pest detection in the wild. Sensors 21 (5), 1601 (2021).
Wu, X., Zhan, C., Lai, Y. K., Cheng, M. M. & Yang, J. Ip102: A large-scale benchmark dataset for insect pest recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition. pp. 8787–8796. (2019).
Yan, Y., Zhang, Y. & Su, N. A novel data augmentation method for detection of specific aircraft in remote sensing Rgb images. IEEE Access. 7, 56051–56061 (2019).
Zhou, S. Y. and Chung-Yen Su, Efficient Convolutional Neural Network for Pest Recognition-ExquisiteNet. In 2020 IEEE Eurasia Conference on IOT, Communication and Engineering (ECICE). pp. 216–219. (IEEE, 2020).
Thanh Dat Truong et al. Optimal Transport- Based Approach for Unsupervised Domain Adaptation. Intenational Conference on Pattern Recognition (ICPR). (IEEE, 2022).
Li, S., Wang, H., Zhang, C. & Liu, J. A Self-Attention Feature Fusion Model for Rice Pest Detection. IEEE Access 10, 84063–84077 (2022).
Atul, B., Kathole, J., Katti, S., Lonare, G. & Dharmale Identify and classify pests in the agricultural sector using metaheuristics deep learning approach. Frankl. Open 3, ISSN 2773 – 1863. (2023).
Cheng, B. et al. Masked-attention Mask Transformer for Universal Image Segmentation, 2022 IEEE/CVF Conference on Computer Vision and (CVPR), New Orleans, LA, USA. 1280–1289. https://doi.org/10.1109/CVPR52688.2022.00135 (2022).
Aye, C. M., Pholdee, N., Yildiz, A. R., Sujin Bureerat, S. M. & Sait Multi-surrogate-assisted metaheuristics for crashworthiness optimization. Int. J. Veh. Des. (IJVD) 80, 2/3/4, (2019).
Haq, S. I. U., Tahir, M. N. & Lan, Y. Weed detection in wheat crops using image analysis and artificial intelligence (AI). Appl. Sci. 13 (15), 8840. https://doi.org/10.3390/app13158840 (2023).
Intisar, A. et al. Occurrence, toxic effects, and mitigation of pesticides as emerging environmental pollutants using robust nanomaterials – A review, Chemosphere. 293, (2022).
Kallali, N. S. et al. Abdelmalek Boutaleb joutei, Moussa El Jarroudi, Fouad Mokrini, rachid Lahlali, from soil to host: discovering the tripartite interactions between entomopathogenic nematodes, symbiotic bacteria and insect pests and related challenges. J. Nat. Pesticide Res. (2023).
Prem Rajak, S. et al. Agricultural pesticides – friends or foes to biosphere? J. Hazard. Mater. Adv. 10 (2023).
Jameer Kotwal, D. R., Kashyap, D. S. & Pathan Agricultural plant diseases identification: From traditional approach to deep learning. Materials Today: Proceedings, Volume 80, Part 1. (2023).
Md, A., Ali, R. K., Dhanaraj, A. & Nayyar A high performance-oriented AI-enabled IoT-based pest detection system using sound analytics in large agricultural field. Microprocess. Microsyst. 103 (2023).
Hu, W. et al. Design and performance evaluation of a spiral bar precision weeding mechanism for corn fields. Sci. Rep. 14, 28186. https://doi.org/10.1038/s41598-024-76311-2 (2024).
Author information
Authors and Affiliations
Contributions
Stella Mary V : Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing. Jayashree P: Supervision, Proof reading, Review.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Venkateswara, S., Padmanabhan, J. Deep learning based agricultural pest monitoring and classification. Sci Rep 15, 8684 (2025). https://doi.org/10.1038/s41598-025-92659-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-92659-5
Keywords
This article is cited by
-
A Reproducible Deep Learning Framework for Field-Deployable Classification of Sugarcane Pest Damage Using Ensemble CNNs
Sugar Tech (2026)
-
Interpretable deep learning models for independent fertilizer and crop recommendation
Scientific Reports (2025)
-
HDL-Net: Hybrid deep learning and IoT Network-based system for pest detection using pest sound analytics
Discover Applied Sciences (2025)
-
Performance Metrics of Low-Code AI for Insect Species Classification
Acta Universitatis Sapientiae, Informatica (2025)
-
Application of Different Transfer Learning Models to Classify Destructive Insects in Walnut Fruit
Applied Fruit Science (2025)





























