Introduction

Precision agriculture has become increasingly significant in addressing challenges related to global food security, environmental sustainability, and economic efficiency1,2,3. Precision agriculture offers timely and accurate diagnosis of plant diseases, which can significantly reduce crop losses and enhance yield quality4,5. Cotton (Gossypium)is an essential cash crop and a cornerstone of the textile industry, providing raw materials for clothing and fabric production6,7. It is particularly important in countries like Pakistan, Bangladesh, and India, where it serves as a major economic driver8,9. In Pakistan, cotton contributes nearly 10% to the GDP and accounts for 55% of foreign exchange earnings, with approximately 1.5 million people engaged in its value chain8. Similarly, India cultivates 24% of the world’s cotton-growing land, generating substantial revenue from crops9. Unlike synthetic fibres such as polyester and nylon, which are less environmentally friendly, cotton is biodegradable and can improve soil health when managed sustainably9. However, the crop is highly susceptible to various biotic and abiotic stresses, including bacterial, viral, and pest-induced diseases, which can cause severe economic losses7. The process, speed and cost of these stress detection and management is a major influence on crop yield and quality9,10 .Recent advancements in artificial intelligence (AI) and deep learning (DL) have transformed the agricultural sector, leading to the development of automated systems for recognizing plant diseases10,11,12,13,14. Among these advancements, the You Only Look Once(YOLO) architecture has become particularly well-known for its speed and accuracyaccuracy in object detection and classification tasks15,16. The latest YOLOv8 model features improved capabilities for precise and efficient classification, making it an excellent choice for diagnosing cotton leaf diseases across various environmental conditions15. Automated systems that utilize DL enable real-time monitoring and data analytics, allowing farmers and researchers to identify issues early and take corrective actions17,18 These systems analyse spectral signatures to evaluate and classify cotton plants, offering insights into crop diseases, pests, and environmental stressors. Ultimately, this improves crop management and optimizes production9.

Many different DL models are prevalent for real-time disease detection in cotton plants, which are mentioned in Table 1. CDDLite-YOLO model is one such model achieving an average precision of 90.6% with easy deployment on resource-constrained devices. These advancements ensure timely disease detection and intervention, which are crucial for maintaining cotton yield and quality10. Additionally, techniques such as model pruning minimize computational overhead, allowing deployment on mobile devices without sacrificing accuracy. These advancements enable farmers to proactively tackle crop issues, leading to improved yield optimization9.

This study presents a systematic workflow for identifying and classifying cotton leaf diseases using the YOLOv8m classification model. The dataset used in this study is a high-resolution “SAR-CLD 2024” image dataset. This dataset consists of seven categories of leaf images, i.e., healthy, herbicide-infected, leaf hopper jassids, bacterial blight, red leaf, curl virus, and variegated leaves. Preprocessing is integrated before k-fold cross-validation, ensuring higher reliability and robustness of the model in diverse conditions. The following objectives are identified for this study:

  1. 1.

    To identify the area of research that includes the AI-based diagnosis of cotton leaf diseases.

  2. 2.

    To utilize the YOLOv8 deep learning architecture to accurately classify multiple cotton leaf diseases using real-field images.

  3. 3.

    To implement a k-fold cross-validation approach to reduce overfitting, improve robustness, and ensure the model performs consistently across diverse subsets of data.

  4. 4.

    To achieve high model performance, ensuring reliable and balanced disease classification, which minimizes false predictions.

By utilizing advanced DL techniques, the proposed system has significant potential to improve crop management practices and alleviate the negative impacts of cotton diseases on crop performance19. Furthermore, this study thoroughly assesses the performance of the model, establishing a foundation for future innovations in automated plant disease detection systems.

Recent developments in DL-assisted disease detection in plants

Recent advancements in the detection of cotton leaf disease and machine vision classification, as shown in Table 1,have been significant. Search Query (“Cotton” AND “Deep Learning”) has been defined for extracting relevant studies from Frontiers, Web of Science, Science Direct, IEEEXplore, and Springer Link databases. The initial findings showed that there were limited publications specifically focused on disease detection in cotton leaves. Table 1 summarizes the studies, highlighting the authors, publication years, study objectives, the dataset used, results, and identified limitations.

Table 1 The following table explores the study in terms of author and year of publication (reference), objectives of the study, dataset used for the study, results of the study and limitations of the study.

The above study concluded that most approaches identified and classified a maximum of four classes. While these models achieved good accuracy, their effectiveness was limited due to the few classes in the dataset. Additionally, most studies relied on a single DL model. To the best of our knowledge, no prior study has applied k-fold cross-validation specifically with YOLO-based architectures, particularly YOLOv8, for multi-class cotton leaf disease classification using field images. Our approach overcomes these limitations and produces a robust, high-accuracy model to mitigate them.

Materials and methods

The proposed work follows the workflow shown in Fig. 1. It starts with collecting data from the “SAR-CLD-2024” (https://data.mendeley.com/datasets/b3jy2p6k8w/2) dataset32which contains images categorized into seven classes, of diseases and healthy leaves. During the pre-processing stage, the dataset is resized and organized into a standardized format suitable for classification. The workflow uses k-fold cross-validation, which divides the dataset into multiple folds to ensure robust training and evaluation. The YOLOcls8m architecture is employed for neural network training to classify the images. Finally, the process includes a validation phase, where the predictions are assessed for performance.

Fig. 1
figure 1

Schematic workflow of the research work.

Dataset and preprocessing

The dataset was sourced from the SAR-CLD-2024 dataset, which contains high-quality images of cotton disease. Dataset of 2137 images from the NCRI (National Cotton Research Institute), Gazipur. The images are taken by a smartphone (Redmi Note11s). This robust dataset covered 7 different classes, including both biotic and abiotic stresses.

The leaves from all 7 classes are illustrated in Fig. 2, and the names of the cotton diseases and their corresponding images are shown in Table 2.

Fig. 2
figure 2

The individual image of the seven classes. (A) Healthy Leaf, (B) Bacterial Blight, (C) Curl Virus, (D) Leaf Variegation, (E) Jassids by Leaf Hopper, (F) Red Leaf and (G) Herbicide Growth Damage.

Table 2 Number of images in individual class.

To apply the YOLO classification model to the obtained dataset, the data must be organized into three folders: “train”, “val”, and “test”. Each folder contains seven subfolders, each named after one of the seven classes, with the corresponding images for that class. Figure 3 illustrates the format used by YOLO to classify the dataset.

Fig. 3
figure 3

Format of dataset for YOLO classification.

Several preprocessing steps are applied to ensure consistency in the model’s learning efficiency. All images were resized to 640 × 640 pixels, matching the input size of the YOLOv8architecture.The dataset was split using Python, with the data randomly divided into training (69%), validation (12%), and testing (19%) sets. A significant number of images were allocated for testing to assess the accuracy of the trained model. Table 3 outlines the distribution of the dataset into training, validation, and testing subsets for each individual class.

Table 3 Distribution of images in train, validation, and test.

K-fold validation

This technique is applied to divide the dataset into ‘k’ parts, named ‘folds’, to carry out a more accurate method of model performance. Each fold provides data for both training and validation. The k-fold cross-validation applied to object classification scenarios ensures robustness to the DL model, making it perfectly generalizable for various data splits. Cross-validation is particularly important in agriculture, as environmental conditions may vary, causing the appearance of leaves and disease symptoms to differ from those seen in the training set33,34 (Sohail et al., 2023; Samuel et al., 2024). K-fold cross-validation, combined with DL architectures such as CNN and ResNet-152V2, has been shown to improve the predictive capabilities of the model for classifying and diagnosing cotton plant diseases, thus enhancing its effectiveness in real-world applications (Jai Vignesh et al., 2023)35. Training dataset frequently suffers from overfitting,i.e., reduced performance on new, unknown images. K-fold cross-validation addresses this problem by evaluating the performance of the model across different data partitions. It ensures that the model does not simply memorize the training data but instead learns to generalize (Gayatri et al., 202)36. For further enhancement of the model’s robustness, a largely diversified dataset for cross-validation is used34 (Samuel et al., 2024; Kumar et al., 2024)34,37. To further strengthen the model, we employed the k-fold technique, creating ten distinct training, validation, and test folds. Each fold was randomly split to ensure variability in the dataset, with the random splitting and fold formation implemented using Python programming. An example of a k-fold process is shown in Fig. 4. The dataset is split into three categories, and the same process is done for ten different Folds. For each of the ten folds, the dataset was divided into three categories, with each fold containing a unique set of images.

Fig. 4
figure 4

K-fold splitting of the dataset.

Experimental setup

The output of an image classifier consists of a single class label and a confidence score. Image classification is particularly useful when the goal is to identify the class to which an image belongs without needing to pinpoint the exact location or shape of the objects within it. YOLOv8 models, specifically the yolov8m-cls.pt variant (Fig. 5) is designed for efficient image classification. The model assigns a class label and a confidence score to an entire image. This approach is especially valuable in applications where knowing the class of an image is sufficient, rather than requiring detailed information about the location or shape of objects it contains.

Fig. 5
figure 5

Shows the detailed architecture of the YOLOv8 classification model.

The YOLOv8m-cls model contains 141 layers, 15,781,303 parameters, 15,781,303 gradients, and 41.9 GFLOPs. Out of these, we used 103 layers, 15,771,623 parameters, 0 gradients, and 41.6 GFLOPs. NVIDIA GeForce RTX 3050 Ti Laptop GPU, 4096MiB and Intel i7 12th gen processor were used to perform the desired experiment. Initial hyperparameters {Ir0 = 0.01, momentum = 0.937, Irf = 0.01, wgt_decay = 0.0005, warmup_epochs = 0.0005, warmup_decay = 3.0, warmup_momentum = 0.8, warmup_bias_Ir = 0.1, box = 7.5, cls = 0.5, dfl = 1.5, pose = 12.0, kobj = 1.0, label_smoothing = 1.0, label_smoothing = 0.0, and nbs = 64} have been used.

Figure 6 illustrates the augmentation strategies of the YOLOv8 model. Default parameters {hsv_h = 0.015, hsv_s = 0.7, hsv_v = 0.4, degrees = 0.0, translate = 0.1, scale = 0.5, shear = 0.0, perspective = 0.0, flipud = 0.0, fliplr = 0.5, bgr = 0.0, mosaic = 1.0, mixup = 0.0, copy_paste = 0.0, auto_augment: randaugment, erasing = 0.4 and crop_fraction = 1.0} has been used.

These augmentation techniquesaddress the class imbalance present in the SAR-CLD-2024 dataset. It increases the representation of minority classes and helps the model to learn balanced features, which improves the generalization and reduces class-wise prediction bias.Additionally, the use of 10-fold cross-validation ensured, all classes were fairly represented across training and validation splits.

Fig. 6
figure 6

Augmentations of YOLO model shows the different types of augmentations used internally by the YOLOv8m classify model to classify the leaf for example the defalt settings are {hsv_h = 0.015, hsv_s = 0.7, hsv_v = 0.4, degrees = 0.0, translate = 0.1, scale = 0.5, shear = 0.0, perspective = 0.0, flipud = 0.0, fliplr = 0.5, bgr = 0.0, mosaic = 1.0, mixup = 0.0, copy_paste = 0.0, auto_augment: randaugment, erasing = 0.4, and crop_fraction = 1.0}.

Results and validation

The model has been thoroughly tested and evaluated using a wide variety of matrices. The main metrics used are: precision, recall, F1 score, and mean Average Precision (mAP). The fundamental principles of two positives, i.e., True Positive (T.P.) and False Positive (F.P.) and two negatives, False Negative (F.N.) and False Positive (FP), have been used for the calculation of metrics.

  • Accuracy is evaluated by calculating the percentage of correct predictions as a ratio of total predictions.

    $$\:Accuracy=\frac{\text{T}.\text{P}.+\text{T}.\text{N}.}{\text{T}.\text{P}.+\text{F}.\text{P}.+\text{T}.\text{N}.+\text{F}.\text{N}.}\text{*}100$$
    (1)
  • Precision is evaluated by calculating the percentage of correct positive predictions as a ratio of all positive predictions.

    $$\:\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}=\frac{\text{T}.\text{P}.}{\text{T}.\text{P}.+\text{F}.\text{P}.}\text{*}100$$
    (2)
  • Recall is evaluated by calculating the percentage of true positives as a ratio of all real positives.

    $$\:\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}=\frac{\text{T}.\text{P}.}{\text{T}.\text{P}.+\text{F}.\text{N}.}\text{*}100$$
    (3)
  • F1 Score is evaluated by calculating the Harmonic Mean of precision and recall.

    $$\:{\text{F}}_{1}\:\:=\frac{2\text{*}\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}\text{*}\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}{\:\:\:1\text{*}\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}+\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}\text{*}100$$
    (4)
  • Mean average precision (mAP): mAP is the mean of the Average Precision (AP) across all classes, where AP is the area under the precision-recall curve.

    $$\:\text{m}\text{A}\text{P}=\frac{1}{\text{n}}{\sum\:}_{k=1}^{n}\text{A}\text{P}\left(\text{n}\right)$$
    (5)

    mAP is typically evaluated at different IoU thresholds, such as 50% (mAP50) and between 50% and 95% (mAP50-95).

  • mAP50 (B) This is the mAP calculated specifically for bounding box detection at an IoU threshold of 50%.

Evaluation of YOLOv8

Figure 7 illustrates the plots depicting losses (both training and validation) and Top_1 and Top_5 accuracy for 100 epochs. The losses decreased and stabilized at 0.1 and 1.2 for training and validation, respectively. The value of these losses demonstrates that effective learning with minimum overfitting is achieved. The Top_1 accuracy exhibits a rapid increase from approximately 80% to around 99%, demonstrating the strong ability of the model to predict the correct class on the first attempt. The Top_5 accuracy remains consistently at 1.0, signifying that the model consistently includes the correct label within its Top_5 predictions. Table 4 illustrates trial metrics, and Table 5 illustrates best trials.

Fig. 7
figure 7

Best trial validation results show the four graphs, including two metrics and two loss graphs. All four graphs show excellent results; both losses are equivalent to zero. Top_1 and Top_5 accuracy of 99.60% and 100% is achieved respectively.

The 100% Top-5 accuracyachieved by the model is expected in this case due to the limited number of classes (8) and strong performance of the trained model. Since Top-5 accuracy only checks whether the correct label appears in the top five predictions, such results are common when the model learns well-separated features. However, Top-1 accuracy remains the primary indicator of model effectiveness, as it reflects the model’s ability to correctly predict the disease in a single attempt.

A consistently high performance with 98.41% accuracy, 98.39% precision, 98.53% recall, and 98.42% of F1_Score is achieved throughout the 10 trials as illustrated in Table 4. The second trial yielded the best results, achieving the highest accuracy at 99.60%, while the other trials also had strong performance. This suggests that the model is robust (Fig. 10), with minor variations in the results likely due to differences in conditions (Figs. 11, 12, 13, 14).

Despite achieving high accuracy values (Top-1: 99.60%, Top-5: 100%), the proposed model does not suffer from overfitting. This conclusion is supported by multiple observations drawn from model behavior and data characteristics. Firstly, the model was evaluated using 10-fold cross-validation, ensuring that each subset of data is used for both training and validation. The performance remained consistentacross all folds, which reflects the robustness and generalizability of the model.Secondly, the confusion matrix generated during validation shows minimal misclassifications, which confirms that the model maintains its classificationabilities on unseen data.Also, the SAR-CLD-2024 dataset, without any augmentations, containing real-worldunique images, is used to train the model. No synthetic data or repetition was used during training, which guarantees that the model has learnt diverse and realistic field conditions.

Principal Component Analysis (PCA) was performed on the deep feature vectors extracted from the final layer of the YOLOv8 classifier. As shown in Fig. 8, the embeddings from different classes have formed distinct and well-separated clustersin the plot. This evidence confirms that the model has effectively learnt discriminative features and is not merely memorizing the training data.

Fig. 8
figure 8

Principal Component Analysis (PCA) on the deep feature vectors extracted from the final layer of the YOLOv8 classifier.

Comparative evaluation of YOLOv8 with YOLOv11

To further evaluate the effectiveness of the proposed YOLOv8 model, a comparative analysis with YOLOv11, which is a recently released version of the YOLO architecture, is conducted. Both models were trained and validated on the same dataset using identical parameters, including batch size, epochs, and input resolution.

As shown in Fig. 9, YOLOv8 consistently outperformed YOLOv11 in key performance metrics, includingtrain_loss, val_loss, top1 accuracy and top5 accuracy. This suggests that although YOLOv11 is a newer version in the YOLO series, it may not yet be fully optimised for image classification tasks, particularly in the context of fine-grained agricultural disease detection.

Fig. 9
figure 9

Comparative analysis between YOLOv8 and YOLOv11.

Comparison of classification metrics between YOLOv8 and YOLOv11 on cotton leaf disease classes.YOLOv8 demonstrates smoother convergence, supporting its use for the proposed method.Our experiments revealed that YOLOv11 struggled to achieve stable convergence, as shown in Fig. 9, with fluctuating loss curves and lower accuracy.In contrast, YOLOv8 offersa well-balanced architecture and consistent results across multiple datasets, especially in our use case, making it more suitable for deployment in real-world agricultural scenarios.It is also important to note that YOLOv9 and YOLOv10 do not provide support for image classification tasks, which further supports the selection of YOLOv8 for our study.These findings justifythe selection of YOLOv8 in our study over newer yet less stable alternatives like YOLOv11.

Table 4 Metrics of all ten trials.
Table 5 Metrics of best trial.

Table 5 highlights the peak performance of the DL model during its most successful trial in diagnosing cotton diseases. In this best trial, the model achieved a Top_1 accuracy of 99.60%, indicating that it correctly identified the disease as its top prediction nearly every time. The Top_5 accuracy remained at 100%, ensuring that the correct diagnosis was always included within the top five predictions. The recall was 99.55%, demonstrating the exceptional ability of the model to correctly identify nearly all actual disease cases, minimizing the likelihood of missed diagnoses. With a precision of 99.53%, the model demonstrated that nearly all of its positive predictions were accurate, effectively reducing the number of false positives. F1_Score of 99.60% balanced out the precision and recall results, proving the efficiency of the model in cotton disease detection. This best trial underscores the superior performance of the model, which shows its potential as a highly reliable tool for precision agriculture (Figs. 10, 11, 12, 13 and 14).

Fig. 10
figure 10

Accuracy vs. epochs of each trial: shows the accuracy Vs epochs for each trial, and the dark blue line shows the average of all ten trials.

Fig. 11
figure 11

Graph of F1-score for each trial: shows the bar chart of the F1-score in each trial. The best trial was observed to be trial 2.

Fig. 12
figure 12

Graph of precision for each trial: shows the bar graph of the precision of each trial. Trial two performed the best in all the trials.

Fig. 13
figure 13

Graph of Recall for each trial: shows the bar chart of the recall of each trial. Trial two showed excellent results of 99.55% in all ten trials.

Fig. 14
figure 14

Graph of Accuracy for each trial: shows the bar chart of the accuracy of each trial. The best trial was trial two, which showed an outstanding accuracy of 99.60%.

Table 6 Shows the average of all five metrics of ten trials. The average accuracy of ten trials is noted as 98.41%.

Table 6 reveals the strength and high accuracy of the proposed model in diagnosing cotton diseases. The Top_1 accuracy of the model was equal to 98.41%, meaning that in almost all cases, the disease was predicted correctly as the top prediction. The Top_5 accuracy grew to 100%, ensuring the correct disease was always present among the first five predictions and emphasizing the reliability of the model. The model performed pretty well on the test set: 98.53% Recall, meaning it had the ability to effectively identify almost all cases of disease, which minimizes missed diagnoses; 98.39% precision, meaning most positive predictions by the model are correct, thus avoiding false positives. This makes the F1-score 98.42%, indicating that this model is highly effective and consistent over ten separate trials. These indicators support the ability of the model to accurately and reliably diagnose cotton diseases, being of use in precision agriculture.

Figure 15 is a confusion matrix presenting the validation performance of a classification model on different diseased leaves. True labels have been mapped on the x-axis, prediction labels have been mapped on the y-axis, true classifications are illustrated on the diagonal cells, and the off-diagonal cells illustrate misclassifications. This model performs the job extremely well in classification, where it is able to identify “Curl Virus” in 53 out of 53 samples, “Leaf Redding” in 72 out of 72, and “Herbicide Growth Damage” in 34 out of 34. However, out of the 263 samples, the model made just one confusion between the classes “Healthy Leaf” and “Bacterial Blight”. Figure 14 illustrates the normalised confusion matrix (best validation) (Fig. 16).

Fig. 15
figure 15

Confusion matrix of best trial validation shows the confusion matrix of the validation dataset done by the best trial. In this confusion matrix, all the images are correctly classified as the labels given to them, and only a single image was not correctly identified as the true value. Out of 263 images, only one image was not predicted correctly; otherwise, all the predictions were correct. The accuracy of the given matrix is 99.60%.

Fig. 16
figure 16

Shows the normalized confusion matrix of the best trial. All the predictions are clearly done correctly, and only one image is not detected correctly.

Prediction of the best trial shows the prediction of the validation set of best trial with an accuracy of 99.60%, and all 16 leaves in the above images are correctly classified. The output of the proposed approach shows the class of each cotton diseased leaf in the left corner of each image, as shown in Fig. 17.

Fig. 17
figure 17

Prediction of the best trial: shows the prediction of the validation set of the best trial with an accuracy of 99.60%, and all 16 leaves in the above images are correctly classified. The output of the proposed approach shows the class of each cotton diseased leaf in the left corner of each image.

Discussion

The methodology presented in this work employs a robust approach to classify cotton diseases using a DL model based on YOLOv8. Our results proved that DL, with the help of the YOLOv8 classification model and a 10-fold cross-validation technique, can diagnose diseases in cotton leaves for precision agriculture. The advantage of using SAR-CLD 2024 sourced from NCRI, Gazipur, has been robust testing on diverse leaf images in a real-time environment. This comprehensive coverage of conditions in the dataset is essential for creating a model capable of distinguishing between various diseases and stress factors. Additionally, the dataset is thoughtfully organized into training, validation, and testing sets, ensuring that the model undergoes a thorough evaluation, which is crucial for developing a reliable disease classification tool. Cross-validation helps deal with overfitting and enhances the generalizability of the model when exposed to different subsets of datasets for both training and testing. In contrast20, Elaraby et al. obtained an accuracy of 98.83% for multi-crop disease classification using the PlantVillage dataset21. Pan et al., in 2024, whose model CDDLite-YOLO achieved a mAP of 90.6%. Additionally22, Ahmed (2021) and23Gao et al. (2024) employed transfer learning and YOLOv8 to further improve cotton disease detection, with a development accuracy in both cotton pest and cotton disease detection set at 94%. A key aspect of the study is the use of k-fold cross-validation, which divides the dataset into multiple folds. This technique is essential for ensuring that the model performs robustly across various data subsets. It is particularly important in agricultural applications, where environmental variations can significantly impact the appearance of cotton leaves. By utilizing k-fold cross-validation, the model is exposed to a wide range of disease symptoms and ecological conditions, which enhances its ability to generalize and reduces the risk of overfitting. When combined with advanced DL architectures like YOLOv8, this method ensures that the model can perform effectively in real-world scenarios.

The YOLOv8m-cls model used for image classification in this study demonstrated high effectiveness. Both confidence and class labels are mapped for each image, ensuring that the measure of certainty of classification is also evaluated along with the predicted class. This feature is particularly beneficial in precision agriculture, where knowing the specific class of an image is often sufficient for decision-making without the need to localize individual objects within the image. The YOLOv8 architecture, consisting of 141 layers and millions of parameters, enables fast and accurate classification, making it well-suited for large-scale deployment in field conditions31. Shahid et al., (2024) used GoogleNet, achieving 93.40% accuracy and 95% F1_score, AlexNet achievedaccuracy 93.40%, and InceptionV3 achieving accuracy 91.80%29. Rai and Pahuja (2023) used DCNN to achieve 97.98% accuracy25Li et al., (2024) used CFNet-VoV-GCSP-LSKNet-YOLOv8s achieving 89.9% precision26. Nazeer et al., (2024) identified curl disease with 99% accuracy.This study deals only with detecting Cotton Leaf Curl Disease. Many current datasets, such as those used by27Kolachi et al. (2023) and8Latif et al. (2021), are limited by the number of classes or environmental conditions they capture. While their model was effective for a specific application, the proposed model in this study surpasses these results by achieving a higher degree of accuracy in a more complex task, as the proposed model has seven classes in the dataset.

The experimental results highlight the effectiveness of the proposed approach. The model achieved Top_1 and Top_5 accuracy of 99.60% and 100% respectively. Top_1 accuracy demonstrates accurate detection in the first attempt and Top_5 accuracy demonstrated overall accuracy Minimisation of F.P. has been ensured by 99.55% recall and 99.53% precision results. These metrics, along with an F1 score of 99.60%, underscore the exceptional performance and robustness of the model. The key aspect of the study is the utilization of 10-fold cross-validation, which offers a more robust performance than single train-test splits. By rotating through the dataset and using every sample as part of the training and validation set, the model was able to avoid overfitting, a common challenge in DL models for agriculture due to limited or skewed datasets. The k-fold approach, as shown by the consistent average Top_1 accuracy of 98.41% and recall of 98.53% across all trials, provided more robust and generalizable results.

Table 6 illustrates consistently high Top_1 and Top_5 accuracy, of 98.41% and 100% respectively, across 10 trials. The values for precision, recall, and F1-score further support the reliability of the model, making it a promising tool for diagnosing cotton diseases in practical applications. The confusion matrix (Fig. 16) also highlights the excellent classification ability of the model, with very few misclassifications. This indicates that the model can reliably diagnose diseases such as “curl virus,” “leaf redding,” and “herbicide growth damage” with minimal error.

This study has been able to achieve an accurate and reliable model for cotton disease detection, which outperformed the majority of contemporary models when tested on a diverse range of metrics. 10-fold cross-validation integration ensured robustness of the model for real-time usage.

Conclusion

This study proposed an efficient YOLOv8 classification model integrated with 10-fold cross-validation for improving the robustness and scalability of the model. This method has been able to outperform with 99.60% Top_1 accuracy, and 100% Top_5 accuracy. The method exhibited a high precision, recall, and F1-score level, which showed an accurate and robust approach to diagnosing multiple diseases on cotton leaves. The model could effectively use k-fold cross-validation to minimize overfitting, hence performing very well over different data subsets, a feature critical to practical agricultural systems. The proposed model exceeded the benchmark accuracies and remedies some limitations noted from available literature: low classes, controlled datasets, and inadequacy with adaptability to field conditions. This evidently showed its potential to be used as a very important tool in precision agriculture that will give timely disease detection with great accuracy, thus reducing crop losses while improving cotton yield. Future work will include more data collection from the field in real time and include environmental variables that might affect detection.