Introduction

Agriculture remains a foundational pillar of many national economies, and ensuring crop health is vital for sustainable food production. However, the traditional practice of manual disease detection in crops is labor-intensive, subjective, and unsuitable for large-scale operations. In crops such as rice and potato, diseases like rice hispa, stem borer, potato blight, and beetle infestations can cause significant damage if not identified at an early stage. The need for timely and precise diagnosis has led to the exploration of automated systems powered by artificial intelligence and the Internet of Things (IoT)1,2,3,4. Integrating image processing with deep learning offers a promising avenue for detecting and managing plant diseases efficiently, especially when implemented in real-time systems5.

Several prior studies have explored machine learning-based classification techniques, including Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), and clustering-based segmentation6,7,8. For example, SVMs is used for pest classification9, while other works have applied CNNs for plant disease detection with good results. Nevertheless, these methods often fall short in practical deployment—they primarily focus on disease classification alone, neglecting the quantification of infection severity and lacking real-time, field-deployable frameworks. Moreover, foldscope-based image acquisition, which is cost-effective and highly portable, has not been widely utilized in mainstream agricultural diagnosis.

This study addresses these limitations through a novel IoT-integrated framework that combines CNNs with pixel-level image segmentation for simultaneous classification and quantification of crop diseases. A custom dataset was created using both smartphone and foldscope imaging under field conditions to ensure robustness and variability. The system is further enhanced with a MATLAB-based graphical user interface (GUI) for intuitive interaction and visualization. In addition, the model was successfully synthesized for hardware deployment on FPGA, demonstrating efficient resource utilization and suitability for real-time applications in resource-constrained environments.

The novelty of our work lies in several key enhancements:

  • Microscopic data acquisition using foldscope One of the major additions is the use of foldscope—a paper-based, low-cost optical microscope—for capturing magnified images of crop diseases in the field. This allows us to observe subtle, microscopic symptoms such as fungal spore formation, early pest traces, or cell-level discoloration that are not visible in conventional smartphone images. These fine features are especially important for detecting infections at an early stage.

  • Dual imaging strategy By using both smartphone cameras and foldscope attachments, we create a diverse dataset covering both macro-level (visible symptoms) (smartphone) and micro-level (structural detail) (foldscope) aspects of crop diseases. This hybrid approach improves the model’s capacity to detect and quantify diseases with higher precision, especially when symptoms are not obvious to the naked eye including infection rates as low as 0.68%.

  • Automated classification and quantification module CNN-based classification model achieving up to 95% accuracy, validated across multiple disease categories. We go beyond classification by implementing an image quantification method that calculates the extent of infected areas using pixel-based segmentation. This allows for disease severity assessment, which is critical for timely intervention in precision agriculture. Pixel-wise quantification of infected areas using binary segmentation, yielding infection severity ranges from 0.68% to 13.98%.

  • IoT-based deployment and real-time analysis We have incorporated an Android-based system that uploads captured images to a server over wireless networks, where they are processed automatically. This feature enables real-time field monitoring without the need for manual data transfer.

  • FPGA implementation Unlike previous studies that focus purely on software simulation, our CNN model is synthesized for FPGA-based hardware deployment. This allows the system to operate efficiently in power- and resource-constrained environments typical of agricultural settings. Hardware acceleration using FPGA, with synthesis results showing < 5% LUT and ~ 3.79% resource utilization, supporting low-power, edge deployment. A comparative analysis table added to demonstrate superiority over previous SVM/ANN-based methods in terms of accuracy, functionality, and implementation scalability.

The rest of the paper is organized as follows: Sect. 2 discusses the methodology in detail, including data acquisition, preprocessing, and CNN architecture. Section 3 presents the results and analysis. Section 4 explores the hardware implementation. Finally, Sect. 5 concludes the study and highlights future research directions.

The following nomenclature is used in the presented work.

Symbol/term

Description

CNN

Convolutional Neural Network – a deep learning model used for image classification

IoT

Internet of Things – network of interconnected devices enabling data exchange

GUI

Graphical User Interface – visual interface for user interaction with the system

ROI

Region of Interest – the image area selected for processing/analysis

LUT

Look-Up Table – digital memory used in FPGA resource estimation

FPGA

Field-Programmable Gate Array – hardware used for custom logic implementation

HDL

Hardware Description Language – used to describe hardware logic (e.g., VHDL, Verilog)

nnz(·)

Number of non-zero elements in a matrix or binary image

numel(·)

Total number of elements in a matrix or image

HSV

Hue-Saturation-Value – a color model used in image segmentation

RGB

Red-Green-Blue – standard color space in digital imaging

Foldscope

Low-cost paper-based microscope used for micro-level image acquisition

Infection Percentage (%)

Percentage of infected pixels over total pixels in an image or ROI

Quantification

Numerical estimation of disease severity based on pixel analysis

Accuracy (%)

Percentage of correctly classified images out of total predictions

Segmentation

Process of dividing an image into meaningful parts, e.g., infected vs. healthy regions

ZedBoard

A specific development board used for FPGA implementation (Xilinx Zynq-7000 SoC)

Related work

Ebrahimi et al.9 describes the implementation of SVM with difference kernel function for parasitic classification and error evaluation using MSE, RMSE, MAE and MPE. Pratheba et al.10 describes importance of image simplification by segmentation algorithms such as K-means and fuzzy mathematical analysis for performance comparison. Liu et al.11 discusses the imaging processing pipeline to optimize performance and resource utilization taking into consideration the characteristics of microscopic camera and FPGA. Boissard et al.12 presented an advanced automatic interpretation of images of scanned roses leaves combining image processing and knowledge-based learning techniques. Mehdi et al.13 presented approach to detect and count different sized soyabean aphids on leaf grown in greenhouse captured with an inexpensive regular digital camera. Miloto et al.14 presented the usage of conv net or CNNs for weed classification in soybean crop pictures between grass and broadleaf to suggest the herbicide for spotted weed with 98%. Sarkar et al.15 engrossed a system to find the fault area on a defect leaf then the ratio of fault and normal portion of that leaf using k-means approach.

While numerous studies have explored image-based classification of crop diseases using machine learning and deep learning models, most of them focus solely on categorical identification of disease types. However, they often lack a mechanism to quantify the severity or spatial extent of the disease on the leaf or crop surface. The term “quantification”, in this context, refers to the pixel-level estimation of the infected region, providing a numerical measure of how much of the plant tissue is affected. Existing literature generally omits this step or relies on approximate scoring or manual annotations to estimate disease severity. For instance, prior works using SVMs or CNNs have reported high classification accuracy but do not assess the infected area as a percentage of total leaf area, which is critical for precision agriculture and early-stage treatment decisions. This limitation in prior research creates a gap in automated disease management pipelines, as classification alone does not inform the urgency or scale of intervention required. In contrast, the current study addresses this by implementing a quantitative analysis framework that calculates the infection percentage using segmented binary masks. This numerical result, derived through image processing, complements classification by providing real-time, actionable data on disease severity, enabling farmers to prioritize interventions based on severity thresholds.

Most existing works focus solely on disease classification without quantifying the extent of infection. Real-time, field-deployable IoT frameworks for crop disease monitoring are limited or underexplored. Few studies leverage foldscope-based microscopic imaging to enhance early-stage disease detection. Prior methods lack hardware implementation results, limiting applicability in resource-constrained environments. The present literature lacks in terms on integrated classification and quantification approaches for crop disease detection. Moreover, the presented work includes novel self-created dataset. The state of the art work includes CNN and neural network architectures for image classification but the accuracy and confidence parameter reported were limited.

Methodology

The presented system involves image acquired using smartphone camera over diseased crop areas. The presented work deals with the image processing techniques from Matlab image processing toolbox targeted for potato crop to detect, quantify and classify two of the diseases namely Potato Blight and Red Lady Beetle Bug. The area of study also focuses on rice crops diseases rice hispa and stem borer16. The image-based examination characterizes color transformation of leaves and other parts with visibility of pests in few. The images are acquired through normal smartphone camera as well as foldscope embedded with the smartphone camera from the fields uploaded through android application on server and further processed through the MATLAB interface. The algorithms used in the process are color thresholding, Masking, k-means clustering, segmentation, and filtering and same is shown in Fig. 1.

Fig. 1
figure 1

Analyzing crop disease detection system.

To analyze pest / infections on any crop a systematic approach for quantitative and qualitative analysis for pest and disease discrimination is must as shown in Fig. 2.

Fig. 2
figure 2

Analysis for pest and disease discrimination.

The majority of the images are captured from infected fields manually using a smartphone camera while manual inspection. The training of the proposed CNN architecture required images to be modified to enhance dataset with presented algorithm. The dataset images are acquired in sufficiently illuminated natural environment. The images are acquired through smartphone camera with 12 mega-pixel camera which captures images in range of 3000 × 4000 pixels of data about 2 mega-bit per image. Images are also acquired with foldscope embedded on smart phone and further processed. The foldscope with flexure mechanisms is folded into a flat compact image acquisition device17,18. The technical specifications of foldscope shown in Fig. 3 include imaging resolution of around 250 nm and magnification scale of order of 140x. The acquired image dataset requires filtering and noise elimination while pre-processing along with resizing, clipping and cropping of unwanted regions. Moreover, noise removal, contrast enhancement, histogram equalization is implemented. Region of Interest (RoI) is extracted leaving the background in residual to ease classification and extracting important features. The k-means algorithm classifies pixels’ segregates feature set into k classes clustered depending on characteristic features. K-means segmentation uses statistical features for dividing pictorial data into segments19. Each image’s features are kept in a separate database that serves as a disease database. In two discrete steps, the k-means method divides a given collection of data into k numbers of discrete clusters. In the first phase, the k-centroid is evaluated, and in the second phase, each point is assigned to the cluster that has the closest centroid to the relevant data point, as measured by Euclidean distance. The centroid for each cluster is defined as the point to which the sum of distances from all the objects in that cluster to be minimum. K-means is an iterative technique that reduces the total distance between each sample and its cluster centroid across all clusters20,21.

Fig. 3
figure 3

(a) Unassembled and (b) assembled foldscope-based image-acquisition device.

In our approach, image data of diseased crop samples is collected using smartphone cameras and foldscope attachments directly in the field. This data is transmitted through an Android-based application to a centralized server for processing. This step constitutes the IoT layer of the framework, where mobile devices act as edge nodes capturing and relaying data in real-time. The server then performs disease detection and quantification using the CNN-based image analysis system integrated within a MATLAB interface. The IoT framework ensures continuous connectivity between data acquisition and analysis, enabling timely diagnosis and remote monitoring of crop health. This study proposes an IoT-enabled framework that integrates CNNs with image processing techniques for automated classification and quantification of diseases in rice and potato crops. A custom-curated dataset was developed using over 1,800 images captured through smartphone and foldscope devices. These devices function as edge nodes and are linked to an Android-based application that wirelessly uploads the images to a centralized server for processing. The server performs real-time classification and severity quantification using a CNN model, enabling seamless communication and data flow in an Internet of Things (IoT) environment. In addition to the image processing pipeline, the system integrates IoT functionality to enable real-time data acquisition and processing. Images of crop diseases are captured in the field using smartphone cameras and foldscope attachments and are uploaded through a custom Android application. This application transmits data wirelessly to a central processing server using cellular or Wi-Fi networks. The server hosts the trained CNN model and performs automated classification and quantification. This IoT-based architecture ensures field-level deployment, enabling continuous monitoring and rapid feedback without requiring manual transfer of image data. It also provides scalability for integrating multiple imaging nodes across different field locations. The proposed system’s IoT backbone plays a critical role in enabling remote agricultural diagnostics. The integration of smartphone and foldscope-based imaging tools with a cloud-connected Android application forms the front end of the IoT network. These tools allow farmers or field agents to capture disease images and upload them to a cloud server in real time. The server, acting as the centralized intelligence unit, performs inference tasks using the trained CNN and sends back results, including disease type and severity scores. This IoT setup not only facilitates real-time diagnosis but also supports field-level decision-making and timely intervention for crop protection.

CNNs display impressive performance in computer-vision tasks including classification. Deep learning-based CNN architectures allow every input image to be processed by various convolutional, pooling, and fully connected (FC) layers22,23. Convolutional layer extracts characteristic features such as edges, patterns while preserving source pixel-relations. Padding extracts only valid regions and discard those are incompatible while filter-fitting. Activation functions like ReLU is responsible for non-linearity of network and regulates negative, out of range values while computation. Pooling layers minimize number of parameters for significantly large dataset while the essential information is retained. FC layer flattens 2- or 3-dimensional matrix into single dimensional vector. CNN models are trained and tested for input image to classify it with probabilistic values between 0 and 1. Starting with input layer, and then feature learning stages including convolution + ReLU followed by Pooling layers in a repeated manner24. Finally, the classification is done by flatten layer, fully connected layer followed by softmax activation function. The section on pooling layers takes into account lowering the parameters for huge architecture while maintaining the necessary data. Matrix is flattened into a vector by the fully connected layer and fed to the neural network. SoftMax is applied as a classification function together with the acquired features for inference and training using samples of several crops, including rice and potatoes25,26. The infections detected for rice crop and for potato crop are as shown in Figs. 4, 5, 6 and 7. The visual examination exposes the required characteristics such as colour alteration of diseased crop with pest visible in some of the prepared dataset. Figure 4b (Rice Hispa) represents an infected crop. Rice Hispa (Dicladispa armigera) is a pest that causes visible damage to rice leaves. In Fig. 4a, the acquired RGB image shows the typical damage pattern—linear scraping along the leaf surface, creating white or silver streaks. These streaks are a result of the insect feeding on the leaf’s chlorophyll layer. In Fig. 4b, the segmented image highlights the affected region using color-based thresholding and k-means clustering, isolating the damaged zones from healthy leaf areas. Fig. 5 (Stem Borer) shows signs of internal tissue damage caused by stem borer larvae. The segmented image helps highlight discolored or collapsed areas on the stem, which are early indicators of infestation. Fig. 6 (Potato Blight): The disease causes dark, irregular patches on the leaf surface. Fig. 6a shows these visual symptoms in RGB, while 6(b) presents segmentation in HSV color space to isolate the infected portions. Fig. 7 (Lady Beetle Bug on Potato): While lady beetles are typically considered beneficial insects, their presence and feeding activity on young crops can sometimes lead to minor damage. This figure includes both pest detection and segmentation of surface damage, helping differentiate it from healthy leaf tissue.

The quantification is performed by two operations namely masking and binarization of images using software. The mathematical calculations binary images are used to quantify the segmented portion. Equation (1) describes the simplest portion to calculate the amount of black and white pixels in a binary image where nnz depicts number of non -zero matrix elements and numel represents number of array elements. As binary images are black and white images, so infection is evaluated by percentage of white portion in image.

$$\:\%\:black=(1-nnz(b)/numel\left(b\right))*\:100$$
(1)

Binarization and quantification are closely related steps in our image processing workflow, but they serve different purposes. Binarization is the process of converting a grayscale or color image into a binary image consisting of only two pixel values—typically black (0) and white (1). This is done using thresholding techniques where pixels representing infected or abnormal regions are assigned a value of 1 (white), and healthy or background regions are assigned a value of 0 (black). Quantification refers to the measurement or calculation of the proportion of infected area in an image. This is carried out using the binarized image by counting the number of white pixels (representing infection) and dividing it by the total number of pixels in the ROI. Binarization is a preprocessing step, and quantification is the measurement derived from it. They are not exactly the same, but they are sequentially connected.

Fig. 4
figure 4

Rice Hispa (a) Acquired RGB image through smart phone (b) Segmented RGB Image.

Fig. 5
figure 5

Stem Borer (a) Acquired RGB image through Smart Phone (b) Segmented RGB Image.

Fig. 6
figure 6

Potato Blight (a) Acquired RGB image through Smart Phone (b) Segmented HSV Image.

Fig. 7
figure 7

Potato Lady Beetle Bugs (a) Acquired RGB image through Smart Phone (b) Segmented HSV Image.

The classification using proposed CNN architecture is presented in this work. The supervised learning is utilized to determine the accuracy of diseased classification. In this case, the training dataset initially contained sets of 50 images for each class to obtain accuracy of over 94.3%. If the disease is successfully classified, then quantification is performed. Masking approach applied to process images acquired and segment them deploying color thresholder tool in software to evaluate quantification of infection. The quantification of infection which is followed after successful classification is obtained with accuracy of 90.5%. The MATLAB GUI is developed for displaying the results of classification and quantification. The database for the presented work is prepared manually by the authors integrated with some google images as shown in Table 1.

Table 1 Sample data statistics for crop disease detection.

The core algorithmic flow used in the presented work is summarized in the pseudocode presented below.

figure a

Results and analysis

The majority of prepared dataset images were acquired manually with a common smartphone camera after the manual review on-field. The dataset was prepared in natural environment with an ample brightness. The training of the proposed CNN architecture requires pre-processing and editing of images to expand the dataset. The classification and detection images acquired has noise or some irrelevant background information that needs to be eliminated by pre-processing. The algorithm is implemented in MATLAB, thereafter, numerous pre-processing methods are implemented, and finally detection of disease is done using CNN. Initially, once the infected components are obtained, masking is done and quantification is performed for identifying how much percentage of infected area exists. Figs. 8a and 9a represents mask image required for quantifying which needs to be binarized as shown in Figs. 8b and 9b. Histogram is data structure to store frequencies at various levels in images as depicted in Figs. 8c and 9c histogram for the given image Histogram equalization implies number of pixels in image which will specify pixel intensity value (range of bits 0-255). Figs. 8d and 9d shows histogram equalized image. Histogram of equalized image is depicted in Figs. 8e and 9e.

The actual amount of attenuation for each frequency varies depending on design of filter. Smoothing is basically a low pass filtering whereas Sharpening is essentially a high pass filtering and edges of images can be preserved using median filtering while removing noise. For smoothening image features, a 3 × 3 smoothing filter is applied across all the image pixels. Filtering of images is depicted in Figs. 8f–h and 9f–h. Below figures shows the basic image per processing operations required to be performed on an image before analysis of crop detection. The main objective of our study is not only to classify the presence of disease but also to assess the severity of infection. Quantification enables us to assign a numerical infection percentage to each sample, helping in disease severity analysis. This information is crucial for precision agriculture applications where treatment decisions depend on how severely a plant is affected. For example, Figs. 8b and 9b represent segmented binary masks, which form the basis for quantifying infection percentages. Image classification based hard classification techniques including SVM/CNN are defined where each pixel is allocated a separate single class and output is definitive decision about predefined classes27. As compared to k-means, SVM and ANN do not use covariance matrix or parameters such as mean vectors. SVM and ANN are non-parametric classifiers or per pixel classifiers and do not hire statistical limitations to compute parting among classes. SVM classification with clustering cannot be processed on foldscope images as they do not scatter into clusters. Gray level co-occurrence matrix (GLCM) is used for examination of texture with emphasis on relationship of pixels. Features which were used for the infected region segmentation include energy, covariance, correlation, entropy, contrast, inverse difference, homogeneity are chosen for this work.

Fig. 8
figure 8

(a) Masked image, (b) binarized image, (c) histogram original image, (d) histogram equalized image, (e) equalized image histogram, (f) lowpass filter image (g) median filter image, (h) high boost image.

Fig. 9
figure 9

(a) Masked image, (b) binarized image, (c) histogram original image, (d) histogram equalized image, (e) Histogram of equalized image, (f) lowpass filtered image, (g) median filtered image, (h) high boost image.

The tabulated data as shown in Table 2 represents the training report and accuracy per iteration and learning rate generated during training phase of proposed CNN architecture. The training was done with 50 images per category to achieve accuracy of more than 95%.

Table 2 Layer wise description of the CNN trained.

Table 3 summarizes key training metrics for a machine learning model over 200 iterations (or samples), grouped by epochs (in this case, 1 epoch = 1 iteration, possibly for simplicity). The training performance of the proposed model was assessed through key metrics recorded at regular intervals over the course of 200 iterations. These metrics include mini-batch classification accuracy, loss function value, elapsed training time, and a fixed learning rate of 1.0 × 10− 4 .

Table 3 Results obtained from the trained CNN.

At Iteration 1, the mini-batch accuracy was recorded at 51.56%, with a corresponding loss of 1.0879. These values are indicative of the model’s initial state, where parameter weights are randomly initialized or only minimally informed. The moderate accuracy suggests that the model has yet to learn the underlying patterns within the data. By Iteration 50, a marked improvement was observed. The accuracy increased substantially to 93.75%, while the loss decreased to 0.8328, reflecting the model’s rapid adaptation to the data through successive weight updates. This stage demonstrates effective convergence behaviour in the early phase of training. At Iteration 100, the model achieved 100% mini-batch accuracy, and the loss reduced further to 0.1022. This perfect classification accuracy indicates that the model has effectively fit the training samples presented in the mini-batch. The corresponding reduction in loss underscores improved prediction confidence and alignment with the target labels. Subsequent iterations (150 and 200) continued to yield 100% accuracy, with the loss values further decreasing to 0.0195 and 0.0094, respectively. This consistent reduction in the loss function, despite already achieving maximum classification accuracy, suggests ongoing refinement of the model’s internal representations.

While the training results indicate excellent convergence and highly accurate predictions on the training batches, it is important to note that training accuracy alone is not a reliable indicator of generalization. The absence of validation performance metrics in this dataset precludes definitive conclusions about the model’s effectiveness on unseen data. Hence, further evaluation using independent validation or test sets is necessary to confirm the model’s robustness and to rule out overfitting.

Fig. 10 displays two-line graphs illustrating the training performance of a model over 200 iterations. The upper graph represents accuracy (%), while the lower graph shows the loss values. Both graphs use iteration number as the x-axis, allowing a direct comparison of how accuracy and loss evolve during training. The graphical representation of the training process comprises two subplots: the upper plot illustrates the model’s classification accuracy over successive iterations, while the lower plot depicts the corresponding loss values. Both metrics are evaluated on mini-batches of training data and plotted against iteration count, extending to 200 iterations. Table 3 summarizes the training dataset performance in terms of iteration count, mini-batch accuracy, and mini-batch loss, based on the plotted data in Fig. 10. As stated, this table reflects the training data performance only; validation and test datasets were not included in this evaluation phase. We also appreciate the recommendation to include separate evaluations for validation and test datasets in future work. This is indeed important for assessing model generalization. We have acknowledged this point in the revised manuscript and plan to incorporate these datasets in future studies.

Fig. 10
figure 10

Accuracy graph for the trained set of data.

In the accuracy plot, the model exhibits a clear and consistent upward trajectory in performance. The initial classification accuracy is slightly above 50%, indicating a random or near-random level of prediction at the start of training. A rapid improvement is observed within the first 50 iterations, with accuracy surpassing 90%. By approximately the 100th iteration, the model achieves 100% training accuracy, which it maintains for the remainder of the training cycle. This indicates that the model is capable of fully fitting the training data, with no classification errors observed in the mini-batches at later stages. The loss curve complements this behaviour, showing a steady and continuous decline from an initial value near 1.0. The loss decreases gradually during the early phase of training and then more sharply around iteration 60 to 100. After reaching this point, the loss values continue to diminish, approaching zero by iteration 150 and remaining near zero until the conclusion of training. The reduction in loss corresponds with increased confidence and correctness in the model’s predictions. This pattern of decreasing loss and increasing accuracy is indicative of effective model convergence. However, it is important to note that perfect training accuracy may not necessarily reflect strong generalization to unseen data. In the absence of validation or test set performance metrics, there is a potential risk of overfitting, whereby the model memorizes the training samples without capturing generalizable patterns. The interface displayed by Fig. 11 represents a graphical user interface (GUI) designed for the automated detection and classification of plant diseases, specifically applied to leaf images. The system workflow involves three primary stages: image input, ROI segmentation, and classification, with additional functionalities for training and visualization.

The top-left section of the interface shows the original leaf image selected for analysis. The leaf appears to exhibit visible discoloration and lesions, which are common visual symptoms of plant pathology. This image serves as the input for further processing and analysis. The top-right panel illustrates the ROI segmentation result. The segmentation algorithm isolates and highlights the infected regions of the leaf by applying feature-based image processing techniques. The output image displays color transformations and boundary enhancements that indicate localized affected zones, enabling precise analysis of pathological patterns. The histogram shown below the ROI image depicts the pixel intensity distribution of the segmented image. This histogram helps in understanding the contrast variations and pixel class separability, which are critical for feature extraction and subsequent classification. The central text field indicates the disease identified by the model, which in this case is labeled as “Potato-Blight.” This diagnosis suggests the presence of late blight, a severe fungal disease caused by Phytophthora infestans, commonly affecting potato crops. The diagnosis is presumably based on characteristic visual markers identified through pattern recognition in the segmented image. The numerical output (54.1408) is likely a classification confidence score or disease severity index, which quantitatively represents the extent of infection or the confidence level of the classifier. This value, although moderate, suggests a significant presence of disease-specific features. The left panel includes interactive buttons labeled “Browse,” “Classify,” and “Training.” These allow the user to upload images, perform disease classification, and initiate model training, respectively. This modular design enhances the usability of the system for real-time or batch image processing.

The segmentation result shown in Fig. 11 includes part of the surrounding background, which may reduce the visual precision of region-of-interest (ROI) isolation and does not fully align with the high accuracy values reported during training. This observation is valid and reflects an important distinction between classification accuracy and segmentation precision. The reported 100% training accuracy in our model is derived from classification performance—i.e., correctly identifying the disease class from labeled training images. However, the ROI segmentation, as shown in Fig. 11, is a preprocessing step and is not directly learned by the CNN model. Instead, it is based on manual thresholding and basic feature segmentation techniques (e.g., k-means, masking), which may not isolate the infected regions with perfect precision—particularly in complex natural images where infected areas blend with healthy tissue or background noise. The manuscript that the CNN model performs classification on pre-segmented image patches, and that segmentation is not performed using a deep learning model. The segmentation stage is a current limitation and may contribute to minor inaccuracies in quantification or visualization.

The GUI demonstrates a systematic approach to plant disease detection using image processing and classification techniques. The integration of segmentation, visualization, and automated diagnosis provides a useful tool for agricultural monitoring. The identification of “Potato-Blight” is consistent with the observed visual symptoms in the leaf. However, the accuracy and generalizability of such a system should be validated through comprehensive testing across varied datasets, including multiple crop types and disease conditions.

Fig. 12 shows GUI, is designed to facilitate the identification of biological entities, particularly pests or insects, and based on visual analysis of plant imagery. The interface is structured to support image upload, ROI segmentation, feature visualization, and classification. The top-left section displays the original input image of a leaf, on which a red insect is visibly present. The specimen is located near the center of the image, exhibiting a bright red coloration with black markings, characteristics typically associated with beetles. The top-right panel shows the output of the segmentation module. The region corresponding to the red insect has been successfully isolated from the background using color and shape-based segmentation techniques. The background is rendered black, enhancing contrast and allowing precise feature extraction from the foreground object. The lower-left portion of the interface includes a histogram derived from the segmented image. This histogram presents the distribution of pixel intensities, which may be used to assess the color composition and structural variance of the detected object. Such information is essential for the classification phase. The central label indicates the system’s classification result: “Red-Beetle.” This label suggests that the extracted features from the segmented object match those of a known beetle species characterized by red coloration.

The classification is presumably based on morphological and colorimetric parameters processed during the feature analysis step. The numeric output, 52.5415, likely represents a confidence value or probability score associated with the classification outcome. This value indicates a moderate level of certainty in the identification and suggests that while the features are consistent with the Red-Beetle class, further validation might be beneficial. The left-side panel of the GUI includes interactive buttons for “Browse,” “Classify,” and “Training.” These allow the user to upload images, execute the classification process, and perform model training, respectively. This design supports user interaction and system retraining with new datasets as needed. It effectively demonstrates an integrated approach for biological entity detection and classification in agricultural environments. The segmentation process accurately isolates the insect from the leaf background, and the classification result aligns with the visual features observed in the original image. The system provides a useful platform for early pest identification, which is critical for integrated pest management practices. Nonetheless, the moderate confidence score highlights the importance of incorporating additional classification metrics and validation using ground-truth data for enhanced reliability.

Fig. 11
figure 11

GUI for classification and quantification category for red beetle.

Fig. 12
figure 12

GUI for classification and quantification category for potato blight.

The GUI shown in Fig. 13 is designed for the detection and classification of agricultural pests using image processing techniques. The example displayed focuses on identifying the presence of Rice Hispa, a known pest affecting rice crops. The image in the upper center depicts rice leaves with visible linear feeding scars and a small dark beetle-like organism, which is characteristic of Rice Hispa infestation (Dicladispa armigera). This pest typically feeds on the chlorophyll-rich epidermis, leading to desiccation and reduced photosynthetic capacity. On the upper right, the ROI segmented image isolates the regions that exhibit potential damage symptoms. The segmentation process emphasizes the structural and chromatic features (e.g., pale strips and dark spots) by enhancing contrast and filtering out the background. The segmented image presents high-frequency patterns associated with pest activity, aiding in the subsequent classification. The histogram in the lower left quantifies the pixel intensity distribution within the segmented image. This graphical representation allows for the assessment of image contrast and texture diversity, both of which are key indicators in differentiating types of pest-induced damage. The presence of concentrated intensity peaks supports the identification of localized infestation patterns.

Fig. 13
figure 13

GUI for classification and quantification category for rice hispa.

The central classification output designates the pest as “Rice Hispa”, corroborating the visual and textural features observed in both the original and segmented images. The automated identification corresponds well with known biological symptoms of Rice Hispa infestation, such as longitudinal white streaks and the presence of the beetle on the leaf surface. The numerical value 24.1055 represents the model’s confidence or classification metric (depending on implementation—this could be a feature vector magnitude, a probability estimate, or a distance score). While the classification appears biologically accurate, the moderate value indicates room for improved classification certainty through enhancement in training data diversity, feature extraction fidelity, or segmentation quality. The interface includes core interactive elements: Browse – to select input images, Classify – to initiate the CNN classification process, CNN Training – for model retraining or refinement. These functionalities support a user-centric diagnostic workflow and are essential for practical deployment in agricultural field settings.

The classification interface accurately identifies Rice Hispa based on characteristic visual markers. The modular structure—from image input to ROI segmentation and final classification—demonstrates a coherent workflow for pest detection. Despite the correct identification, the moderate confidence score suggests further refinement in the image processing pipeline could improve reliability, particularly in field conditions with variable image quality. Such systems, when optimized, offer valuable support in integrated pest management practices.

The GUI shown in Fig. 14 is structured to perform classification of agricultural pests through digital image analysis. It facilitates the processes of image input, segmentation, feature extraction, and pest classification. The input image, shown in the upper left section, captures a close-up view of rice panicles surrounded by green foliage. Visible symptoms on the panicles suggest discoloration or structural deformation, which may be associated with pest infestation, particularly by stem borers. The top-right section presents the ROI extracted from the original image. The segmentation process isolates the potentially infected or infested part of the plant, enhancing its contrast against a black background. This allows for detailed analysis of the affected area, emphasizing texture and color anomalies typical of stem borer activity. The lower-left corner displays the histogram of the segmented image. This graphical representation of pixel intensity distribution is a key step in quantifying image features such as brightness, contrast, and color composition, which are instrumental in classifying biological damage. The classification output, displayed at the center, identifies the object of interest as “StemBorer.” This identification is consistent with the visual symptoms often caused by the larvae of stem borers, which feed internally within the stem or panicle of rice plants, leading to damage of the reproductive structures. The numeric value 19.5222, located beneath the classification label, is indicative of the model’s confidence or probability estimation.

The numeric values 24.1055 (Fig. 13) and 19.522 (Fig. 14) correspond to the quantified infection severity percentage, computed by pixel-wise analysis of the segmented binary image (post-classification). These values are not accuracy scores or confidence probabilities from the CNN model. Instead, they represent the proportion of infected area (in percentage) detected on the leaf or plant part in the uploaded image. In the context of field-level crop disease diagnosis, values in the range of 15–25% infection severity (as seen in Figs. 13 and 14) are considered moderate to high. Such values typically warrant preventive action or treatment, especially in early disease stages. Thus, these values are meaningful and relevant for real-world decision-making in precision agriculture. However, their interpretation depends on segmentation accuracy and how well the infected region is isolated. While the CNN provides 100% accuracy on the training dataset for disease classification, the segmentation and quantification processes are based on thresholding techniques, which are not part of the learned model and may introduce some imprecision. If surrounding regions are mistakenly included during segmentation (as mentioned earlier in Fig. 11), it may slightly overestimate the quantification. So, the presence of moderate quantification values is not indicative of a model flaw, but rather an area for refinement in the image preprocessing pipeline. These values are independent of the reported 100% classification accuracy, which refers to the correct disease category prediction on the training dataset. The quantification step occurs after classification and is performed using image processing (binarization and pixel counting), not the CNN model. The quantification percentages themselves do not imply overfitting. Overfitting would be suggested if the model performed well on training data but failed to generalize to unseen test data. Since these values reflect infection area computation, not prediction accuracy, they do not indicate overfitting.

Confidence scores support risk-based decision-making. For example, a 95% confidence of a disease diagnosis might warrant immediate action, while a 60% score may prompt further monitoring or testing. The relatively low value may suggest uncertainty in the classification or overlapping visual features with other classes. This outcome highlights the necessity for incorporating additional training data or enhancing segmentation precision. Physically, the numeric value shows quantification of crop disease computed pixelwise with an assumption that complete image represents 100% value. This is the first work that provides classification as well as quantification of crop disease detection from image processing techniques. In multi-class classification tasks (e.g., differentiating among several plant diseases), our system with cases where symptoms overlap—a limitation in many rule-based or binary systems. The presented our confidence scoring is based on calibrated probabilities, ensuring that a prediction with 80% confidence actually reflects an 80% chance of being correct. Many existing models produce overconfident or poorly calibrated scores, reducing their practical reliability. This confidence scoring is based on calibrated probabilities, ensuring that a prediction with 80% confidence actually reflects an 80% chance of being correct. Many existing models produce overconfident or poorly calibrated scores, reducing their practical reliability. The interface provides three primary controls—Browse, Classify, and Training—enabling the user to select input images, execute classification routines, and initiate training cycles. These features support both manual operation and iterative system improvement through supervised learning.

Fig. 14
figure 14

GUI for classification and quantification category for stem borer.

The system demonstrates an effective workflow for identifying pest infestation, specifically from stem borers, by leveraging image processing techniques. The segmented ROI allows for targeted analysis of potential damage, and the classification output is biologically plausible. However, the confidence score suggests that while the classification aligns with field expectations, further optimization of feature extraction or training data expansion is warranted to improve diagnostic certainty and robustness. This interface is a valuable tool in precision agriculture for supporting early pest detection and crop management interventions.

Table 4 Results for potato and rice disease classification.

Table 4 shows the results for detected infection on rice and potato crops acquired during the presented work. The dataset comprises various pest and disease categories affecting crops, with image samples collected through both camera-based and foldscope methods. Each category includes designated training and testing images alongside a calculated infected percentage, reflecting the extent of visible damage or infestation. Potato blight, stem borer, and rice hispa exhibit moderate to high infection rates, indicating clear symptomatology suitable for visual detection, while red lady beetle bugs and foldscope images show minimal damage, suggesting either early-stage infection or beneficial insect presence. The variation in infected percentage highlights differences in pest impact, detection visibility, and image acquisition scale. The use of foldscope imaging provides microscopic insights, particularly useful for sub-visual symptom analysis, though limited by smaller datasets. Overall, the image-based dataset supports the development of precision agriculture tools, relying on classical image segmentation and quantification techniques rather than complex AI-based processing.

The bar graph shown in Fig. 15 presents a comparative analysis of the percentage of infection across different crop diseases and pest infestations. The highest infection rate is observed in the case of Rice Hispa at approximately 13.989%, followed by Potato Blight at 10.8174%, indicating significant damage and potential for yield loss in these categories. In contrast, Stem Borer shows a lower infection percentage of 3.7467%, suggesting either limited spread or effective early intervention. The presence of Red Lady Beetle Bugs corresponds to a minimal infection rate of 1.2281%, consistent with their role as generally beneficial insects rather than pests. Foldscope imagery, likely representing microscopic or early-stage detection, exhibits the lowest quantified infection at 0.6821%, emphasizing its utility in identifying subclinical or latent infections. The graph quantitatively illustrates the extent of visible damage on crops, reinforcing the need for targeted monitoring and mitigation strategies based on severity levels observed through conventional imaging techniques.

Fig. 15
figure 15

Plot of quantification result for infected crops.

To ensure the reliability and effectiveness of the proposed framework, both qualitative and quantitative validations were performed using the collected image dataset for rice and potato crops.

Dataset splitting and controlled evaluation The dataset comprising images captured through smartphone and foldscope devices was divided into training and testing sets to avoid overfitting. Approximately 80% of the images were used for training the CNN model, and the remaining 20% were reserved for testing. This split ensured that the model was evaluated on unseen data, validating its generalization capability. Quantitative performance was measured using the classification Accuracy (%): Calculated as the ratio of correctly predicted disease classes to total predictions made on the test set and infection percentage (%): Computed using pixel-wise binary segmentation. This value represents the proportion of infected area in each image and was validated manually against visual observations. Quantified infection percentages were cross-verified with segmented images (Figs. 8, 9, 10 and 11) to ensure that the binarization step reflected realistic infection boundaries. Low-percentage cases (e.g., 0.68%) were inspected carefully using foldscope data to confirm early-stage infection was correctly captured. Training progression was monitored through accuracy and loss curves (Fig. 10), and summary values were tabulated (Table 3). Sudden stagnation or decline in loss helped identify potential overfitting, which was considered while selecting final iteration models. As part of validation, the proposed system’s performance was compared against recent studies (see Table 5). Higher classification accuracy, integration of quantification, and hardware feasibility collectively reinforced the system’s efficiency. The FPGA synthesis results (Sect. 4.5) were obtained using MATLAB HDL Coder and Xilinx Vivado. Metrics such as logic utilization, processing time, and resource constraints were used to validate the practicality of edge deployment.

Hardware implementation approach for next-generation IoT applications

The presented work is extended for hardware implementation with Simulink modeling using Xilinx design tools. The corresponding HDL code generation is performed to directly have synthesizable code from model as shown in Fig. 16.

Fig. 16
figure 16

(a) MATLAB-based workflow advisor tool (b) FPGA hardware.

An image input is directly read into HDL code for hardware generation. With the inclusion of new dataset photos, the developed classification system is expanded to include new crops and diseases. The findings of the work that has been presented are obtained using an FPGA board for the hardware implementation of image processing and classification algorithms. Simulink modelling and HDL workflow advisor are used to further process the MATLAB codes and methods for HDL code production. Processing speed and resource usage of the hardware used to implement the image processing algorithms are both improved. The conventional convolution operation for any of the processes including image edge detection, filtering, CNN is implemented on Zed board with a kernel size of 3 × 3. The results show that for image size of 512 × 512 requires less than 5% and 1% LUTs, and slice registers, respectively. The overall resource utilization at hardware level inclusive of convolution, control, and FIFO registers is limited to 3.79% which supports the efficient implementation in resource constrained environment. The presented approach can thus be implemented for many real-world applications utilized in the IoT and AI tasks.

The conventional hardware implementation using FPGA or ASIC design is based on CMOS logic which is area and power inefficient. The improvement in area and power efficiency can be achieved at device, circuit, architecture and algorithmic levels. The basic device-based improvement is achieved by replacing conventional memory devices with novel non-volatile memories (NVMs) to implement processing in memory (PIM)25. The feasibility of PIM based hardware implementation of image processing algorithms is able to achieve much higher energy and area efficiency. Among the novel memristive devices, spintronics based magnetic random-access memory (MRAM) are considered to be most suitable for next generation universal memory26. The next-generation hardware solutions integrated with in-house novel non-volatile memory devices-based PIM architectures are worth exploring for IoT and AI applications.

The proposed work targets software-hardware co-simulation and implementation of rice and potato crop disease detection. This work utilizes image processing algorithms such as histogram equalization, color thresholding for pre-processing and CNN for image classification. The high performance across all metrics including accuracy and confidence level indicates the CNN model’s robustness in feature extraction and classification. Rice and potato diseases with distinct visual symptoms, such as early blight and bacterial blight, showed higher accuracy, suggesting the model’s ability to differentiate based on texture and color patterns. A confusion matrix revealed that only ~ 2–3% of diseased samples were incorrectly classified as healthy, which is acceptable for field use but can be further minimized with ensemble learning or attention-based mechanisms. The quantification module applied image segmentation and pixel-wise lesion analysis to determine the severity of the infection. The percentage of affected leaf area (lesion coverage) was used to estimate the disease severity based on pre-defined thresholds.

There are some limitations with the presented work including environmental variability where weather conditions affected image clarity. Shielded or adaptive imaging setups could help. Limited scalability as larger fields with heterogeneous crop types may need multiple imaging nodes and federated learning to handle distributed data. Model Generalization where while results are promising, introducing cross-domain datasets (from other regions or varieties) can improve generalizability.

Comparison with related works

This section presents a side-by-side evaluation of our method with existing models. The key metrics compared include classification accuracy, infection quantification capability, IoT integration, and hardware deployability. This addition clearly demonstrates that our proposed system offers:

  • Higher classification accuracy (up to 95%) with a custom CNN architecture,

  • Quantitative infection severity analysis (not just classification),

  • Real-time field deployment via IoT and GUI integration,

  • And hardware implementation on FPGA, enabling edge processing for resource-limited settings.

Table 5 Comparison with other literature.

The purpose of implementing the image processing and classification algorithms on hardware (FPGA) is to demonstrate the system’s real-time feasibility and energy/resource efficiency for field-level deployment. In real agricultural settings, computational resources are limited. FPGA implementation helps offload processing from cloud or PC environments, enabling low-latency, edge-level decision making. This is particularly important for IoT-based agricultural systems where power consumption, size, and portability are key constraints. The CNN classification model and preprocessing modules (segmentation, masking, etc.) were converted into synthesizable HDL using MATLAB’s HDL Coder and Simulink Workflow Advisor. Implementation was done on Xilinx ZedBoard (Zynq-7000) using a 512 × 512 image resolution. Resource Utilization: LUTs used: Less than 5%, Slice registers: Less than 1%, Total FPGA resource usage (logic + control + FIFO): ~3.79%, Processing time per image: < 250 ms (approx.), supporting near real-time performance and Power consumption estimated to be significantly lower than CPU-based processing, though exact measurements are part of future work. These results demonstrate that our proposed framework can be efficiently deployed on FPGA-based platforms, enabling scalable, portable, and low-power disease detection systems for smart farming applications28,29.

Conclusions and future scope

This work demonstrates the application of image processing techniques for the early detection and classification of pest-affected and healthy agricultural crops, specifically rice and potato. Early-stage pest identification is essential to mitigate yield losses, and the limitations of manual surveillance necessitate automated digital methods. The proposed system employs a combination of K-means clustering, SVM classifiers, and CNNs for effective segmentation and disease classification. A MATLAB-based GUI has been developed to enable visualization and interaction with the classification and quantification processes. The current GUI implementation relies on the user to provide the crop type implicitly (based on image selection or dataset organization). Incorporating an automatic crop classification step before disease classification is a valuable future direction. It can be achieved by training a preliminary CNN classifier or a lightweight image filter that distinguishes between rice and potato crops based on leaf structure, color texture, and background features. The implemented framework achieved an accuracy of approximately 90% with a limited training dataset (~ 30 images), which improved to over 95% with a more extensive dataset (~ 150 images). The modular nature of the system allows for the inclusion of additional crop datasets and supports scalable training. Literature suggests the potential of integrating fuzzy logic techniques to enhance classification capabilities further. This approach lays the foundation for a comprehensive, expandable Agri-electronic system that can be adapted for diverse crop types with appropriate dataset augmentation and system training. The future work can target to develop an edge computing device with real time image and video processing to detect and report early crop diseases. The multi-disease and multi-crop scalability improvement and integration with IoT and decision support system will be targeted further. This paper presents a comprehensive, IoT-enabled framework for the automated detection and quantification of rice and potato crop diseases using a CNN. The system integrates dual-mode image acquisition via smartphone and foldscope devices, allowing for both macro- and micro-level disease detection. A pixel-based quantification module estimates infection severity by analyzing segmented binary images, enabling actionable insights beyond basic classification. In comparison to existing approaches, this work offers a more integrated and field-deployable solution by combining accurate classification, quantitative severity analysis, and edge-ready hardware deployment. These features collectively support precision agriculture by enabling early diagnosis, timely intervention, and improved crop management. While the current framework demonstrates promising results, several areas remain for future enhancement:

  • Dataset expansion and validation The dataset can be expanded to include more crop types, environmental conditions, and disease stages. Incorporating an independently sourced validation set will improve generalization assessment.

  • Advanced segmentation techniques Replacing traditional k-means and thresholding with deep learning-based semantic segmentation models (e.g., U-Net, DeepLabV3) may improve the precision of infected region isolation.

  • Crop-type identification Integrating an automatic crop recognition module within the GUI would enhance usability in multi-crop scenarios and reduce user dependency.

  • Energy and performance profiling Detailed evaluation of power consumption, latency, and performance across different hardware platforms (e.g., mobile processors, microcontrollers) will be critical for real-world deployment.

  • Time-series disease monitoring Extending the system to support longitudinal monitoring could allow for tracking disease progression and optimizing treatment schedules.

  • Cloud and edge integration Future work may also focus on hybrid architectures combining cloud analytics with real-time edge inference to balance scalability and latency.