Abstract
This work investigates the YOLOv5 object detection algorithms for classifying commercial crops such as tomatoes, chili, and cotton. The data sets comprise 707 images of green chillies, 200 images of tomato crops and 130 images of weeds from Ponnandagoundanoor farms in western agro climatic Zones (WAZ) of Tamil Nadu. The objective of this research is to explore the determination of weed present in the crops and further the machine learning (ML) algorithms that have deployed for computing the F1 score, detection time, and mAP of each machine learning algorithms. As a result, a tomato dataset contains an F1 score of 98%, a mAP of 0.995, and a detection time of 190 ms; a cotton dataset with an F1 score of 91% and a mAP of 0.947; and a chilly dataset with an F1 score of 78% and a mAP of 0.811. A Further investigation has been carried out for the same crops; improving YOLOv5 accuracy includes adaptively spatial feature fusion (ASSF) blocks to its architecture head. An enhanced YOLOv5 algorithm using ASFF modules on the same datasets achieved an F1 score of 99.7% in the tomato dataset and 79.4% in the chilly dataset, resulting in a 1.14% improvement in the F1 score. With a 93.53% F1 score, were able to obtain a 2% enhancement over the cotton dataset. The extended YOLOv5 increased the mAP by about 0.5% and resulted in an insignificant drop in the number of computations carried out, rendering the model more compact.
Introduction
The horticulture and plantation sectors play a significant role in the state of Tamil Nadu, contributing to its agricultural economy and providing employment opportunities to a large number of people1. The cultivation of fruits, vegetables, and ornamental plants is known as horticulture, while the large-scale cultivation of crops such as tea, coffee, spices, etc. is known as plantation. The diversified climate, fertile soil, and ample water supplies of Tamil Nadu2 make it ideal for horticulture practices. The state contains a diverse range of agro-climatic zones, from coastal regions to hill stations that allow for the production of a variety of horticulture crops3.
Weeds can have a considerable influence on horticulture crop output. These unwanted plants that grow along with the crops compete with them for nutrients, reduce sunlight penetration, and serve as hosts for pests, insects, and, in some cases, diseases. They also make harvesting difficult by getting tangled with the crops4. Some weed species exhibit the phenomenon of allelopathy5, where they release chemicals into the soil that inhibit the growth of nearby plants. To mitigate the negative effects of weeds on horticulture crop yield, weed management practices are essential. These can include cultural practices like crop rotation, mulching, and regular cultivation to suppress weed growth6.
In horticulture, traditional weed control methods often utilize a combination of human, mechanical, and chemical procedures. Some commonly used methods include hand plucking, hoeing (using a hoe or similar tool to cut and uproot weeds), mulching, and spraying weedicides7. However, these methods are labor-intensive, lack precision and accuracy, and have high manual costs. Automating the weed identification process is necessary to expedite the weed removal process, thereby reducing manpower and expenditure and significantly enhancing production8.
Information and computing technologies play a crucial role in modern weed control practices. ICT makes use of data-driven decision-making, which enables the collection, storage, and analysis of large amounts of data related to weed populations, crop growth, environmental conditions, and management practices. Computing technologies, namely satellite imagery, remote sensing, and geographic information systems (GIS), allow for precise mapping and monitoring of weed distribution and dynamics9. Machine learning and artificial intelligence algorithms can assist in automated weed detection and identification10. We can integrate these technologies to create an automatic weeding robot for variable site-specific weedicide spraying.
Image normalization, plant segmentation, annotations, etc. are common procedures carried out in weed detection systems. Machine learning-based algorithms increase the accuracy of weed identifications, but they reduce the robustness. Deep learning approaches showed that they could handle complicated issues with ease11.
When weeds grow in the same area as crops, they deprive the crops of the necessary nutrients, leading to lower yields, increased production costs, difficulty in harvesting, and a reduction in the overall quality of the product12. Weeds, many of which act as hosts for various plant diseases, also hinder the flow of irrigation water and make it more difficult to water crops. The cost and amount of herbicides applied to a field also harm crop production and soil quality. As a result, it is vital to automate the weed removal process and limit the use of herbicides13.
This research work aims to develop a crop-weed detection mechanism that enhances accuracy and speed of detection, with the potential to upgrade to an autonomous weeding system that sprays a limited amount of herbicide on weeds and removes them early. This work develops a CNN-based classification algorithm for crops and weeds.
The structure of this research paper is presented below. Section II offers a thorough overview of the studies conducted in this field. Section III delves into the description of the dataset, the methods that were used, and the proposed technique along with its block diagram. Section IV delves into the experimental data and its interpretation. Section V concludes and discusses future work.
Literature survey
In14, the authors proposed convolutional neural networks to segment vegetables and classify remaining green objects as weeds to enhance weed identification. A deep learning algorithm is under development for robotic weed removal to detect vegetable-bounding boxes and perform weed segmentation based on color features outside the boxes. They employed the CenterNet al.gorithm13 for bok choy detection, which generates bounding boxes based on class probability. Color index-based segmentation is then used to identify weeds. Images of bok choy from Nanjing, China, were augmented to 11,500 and manually annotated using LabelImg software. The trained CenterNet achieved impressive precision (95.6%), recall (95.0%), and F1 score (0.953) by treating the remaining green objects as weeds. Future work involves weed identification in in-situ videos, optimizing the deep learning model, and evaluating accuracy in vegetable detection.
In15, the authors developed a decision-based predictive algorithm for early weed detection in a green-on-green setting, validated on testing data. A DJI Phantom 3 professional drone extracted two datasets from a weed-infested sugarcane field. The datasets consisted of high-resolution images captured at a fixed altitude of 70 m, resulting in 261 images with a pixel size of 2.13 cm. The authors performed UAV data preprocessing steps using Pix4D software, which included mosaicking, RGB band stacking, and spatial subset segmentation and they marked and masked polygon-shaped regions of interest to effectively encapsulate weed patch structures. Colour-based features were extracted from the training imagery for classification, achieving an 89% accuracy in detecting “early weed” patches in red and “crop” patches in green. The algorithm demonstrated the potential for localized and accurate early-stage weed detection.
The authors in16 investigated trade-offs between semantic segmentation quality and imaging sensor complexity. The authors used a generative adversarial neural network to generate a fake infrared channel for alignment. OpenDroneMap software aligned unsynchronized frame sequences, and a greedy-region-based procedure constructed a paired dataset for training. They manually label using RGB orthophotos and propose a single, upgraded RGB camera for cost-effectiveness. The evaluation based on mIoU scores shows 95% and 90% for old and young crops, respectively. Inconsistencies in manual labelling are the main source of error. Other agricultural industry-related applications, such as disease and insect detection, can apply the proposed approach.
In17, the authors classified weeds and maize using a snapshot mosaic hyperspectral imaging sensor, processing images to obtain the spectral reflectance of regions of interest. The objective of the work was to explore the feasibility of a NIR snapshot mosaic hyperspectral camera for classification, determine relevant wavelengths and features, and provide optimal parameters for a RF model. They sowed seeds of C. arvensis, Z. mays, Rumex, and C. arvense in sandy soil-filled pots and recorded them using the snapshot mosaic hyperspectral camera in the plant laboratory of ILVO in Belgium. They assessed and compared the performance of RF and KNN models using the McNemar test and tested RF classifiers with different spectral feature combinations, and selected 30 significant features through a feature reduction procedure based on importance scores. The crop (Z. mays) exhibited exceptional precision (94%) and recall (100%), while precision values for C. arvensis, Rumex, and C. arvense were 95.9%, 70.3%, and 65.9%, respectively. These findings support the potential application of the camera in implementing site-specific weed management (SSWM) for targeted weed management strategies.
In18, the authors proposed YOLOv4-tiny for detecting sesame crops and weeds, using a labelled dataset of 1300 photos obtained from Kaggle. The dataset includes 512 × 512 colour images of sesame crops and various weed types. We trained the model using 1000 images for training and 300 images for testing, with an input size of 416 × 416. We evaluated our object identification model using Mean Average Precision (mAP), taking into account bounding box coordinates, class probabilities, probability score threshold, and intersection over union (IoU). We trained the model on a Tesla K80 GPU, converted the weights to TensorFlow Lite (TFlite) format, and deployed it on a Raspberry Pi 4 model B, achieving 4 frames per second (FPS). Future work includes performance testing on Nvidia Jetson Nano and Jetson TX2, leveraging the TensorRT library for accelerated inference speed.
The authors in19 introduced a promising technique for crop and weed detection that addresses the challenges posed by morphological similarities. The authors used a combination of contour masks and filtered LBP operators20. The filtered LBP operators allow for better feature extraction by capturing local texture information, while the contour masks aid in separating plants based on their shape and structure. The experimental results demonstrated a classification accuracy of 98.63% with four classes in the “bccr-segset.” We could conduct further research by integrating the contour masks and filtered LBP operators with other advanced technologies like machine learning algorithms, deep learning models, or computer vision techniques.
In21, the authors focused on enhancing the classification of sugar beet and weeds using the VGG-Beet CNN, a modified VGG16 model22 with 11 weight layers. They utilize three diverse datasets: a local dataset captured with the Agrocam sensor, a public dataset captured with the Parrot Sequoia camera, and another publicly available dataset captured with the Red Edge camera sensor. These datasets vary in lighting conditions, geographical locations, and soil conditions. We manually separate crop and weed patches during training to exclude mixed-class patches. We train the VGG-Beet CNN using cross-validation data, stopping at optimal weights. The maximum overall testing accuracy for the Agrocam and Red Edge datasets is 86.3% and 92.4%, respectively. Further, this work will develop methods for classifying mixed-class patch images, reducing mixed-class patches, improving weed detection accuracy, and minimizing the misclassification of crops as weeds.
In23, the authors investigated the use of convolutional neural networks (CNN) with hyperspectral imaging, patch-based classification methods, and RGB images for weed mapping. The authors employed data augmentation methods such as rotation and random cropping to enhance training data for CNN models, thereby improving recognition accuracy. The study includes uniform band selection and analyses the impact of resolution on weed classification, emphasizing the relationship between patch size and sensitivity. The results showed CNN’s superiority over traditional feature extraction methods and highlighted the advantages of hyperspectral data over RGB images. The sensitivity of CNN performance to patch size underscores its correlation with spatial resolution in image processing.
The authors of24 created a machine learning-based crop-weed identification system capable of performing site-specific detection. The authors utilized an SVM-based classification and detection system in real-time to extract features from crops and weeds. The SVM-based classifier reached a real-time detection speed of 5–6 frames per second and a 96% accuracy. Future work will also include determining the spray amount for each plant based on its size and adjusting the spray flow rate.
The authors in25 focused on developing a real-time monitoring system for detecting broken corn during harvesting. The system includes a camera, a hardware platform, and a designed corn detection method. The camera captured images of both broken and non-broken corns before sending them to the peeling device. We generated synthetic images of broken corn with different backgrounds to train a corn detection network. We fine-tuned the network using real images of broken corn. The authors resized the input image to 416 × 416 pixels and trained the network using YOLOv3 Tiny, which predicts bounding boxes, objectless scores, and class probabilities for corn and non-corn objects at multiple scales. We implemented the proposed corn detection method on NVIDIA TX2, achieving a speed of up to 10 fps, enabling nearly real-time detection.
In26, the authors proposed YOLOv4-tiny for detecting sesame crops and weeds, using a labelled dataset of 1300 photos obtained from Kaggle. The dataset includes 512 × 512 colour images of sesame crops and various weed types. The model was trained using 1000 images for training and 300 images for testing, with an input size of 416 × 416. We evaluated our object identification model using Mean Average Precision (mAP), taking into account bounding box coordinates, class probabilities, probability score threshold, and intersection over union (IoU). They trained the model on a Tesla K80 GPU, converted the weights to TensorFlow Lite (TFlite) format, and deployed it on a Raspberry Pi 4 model B, achieving 4 frames per second (FPS). Future work includes performance testing on Nvidia Jetson Nano and Jetson TX2, leveraging the TensorRT library for accelerated inference speed.
Table 1 shows the comparison of the existing literature with the proposed methodology.
Proposed methodology
Consider a crop weed identification system that has Ct, Cc, and Ch numbers for tomato, cotton, and chili crop images, and Wt, Wc, and Wh for the corresponding numbers of weed images. Let N x N be the dimension of the input image, and n x n is that of the pre-processed image. Let B represent the backbone network and N represent the neck network of YOLOv5. The dimensions of the feature map obtained from the BottleNeck CSP architecture are (Fh The architecture of YOLOv5 is represented by B. while the output dimensions of the pooled feature maps are (Fho, Fwo). We denote the pooling w Let k be the number of feature maps. w size as k and the stride as s. Let Fbi stand for the feature maps with dimensions (Hf, Wf) derived from the backbone network, where i ranges from 1 to 3. Let (hin, win) represent a position in the Fbi feature map, and (hout, wout) represent the upsampled feature map. Let w1, w2, w3, and w4 represent their weights.
The variables boxx, boxy, boxw, and boxh are used to denote the x and y coordinates, width, and height of the bounding box. The variables tbx, tby, tbw, and tbh represent the X and Y coordinates, width, and height of the bounding box. The variables “abw” and “abh” represent the dimensions of the anchor box. Let lx, ly denote the Cartesian coordinates of the upper-left corner of the grid that contains the object. The symbol represents the logistic activation function.
This research work is analyzed in terms of parameters, namely precision (λ), recall (µ), F1 score (Ψ), and confidence score27,28,29. Let tp stand for true positive, indicating the correct classification of crop images as such, and tn for true negative, indicating the incorrect classification of weed as crop. When we incorrectly classify a crop as weed, let fn stand for false negative, and fp for false positive.
System architecture
Figure 1 shows the system architecture of the proposed YOLOv5. A custom dataset of green chili and an open access dataset of tomato and cotton to train and test the CNN model. Then preprocess the raw data to improve its quality and reliability. We fed the pre-processed images into two detection and classification mechanisms, YOLOv530 and Adaptable YOLOv5, to make accurate predictions and classify crops and weeds. Performance validation of both mechanisms evaluated the robustness of the model.
Data acquisition
This study examined three different crop types: tomato, chilli, and cotton. The chilli dataset was generated using unique data gathering equipment with high-resolution sensors. These photographs were taken in January 2023 on fields near Ponnandagoundanoor, Coimbatore. Seven hundred and seventy pictures show the plants and weeds of green chilli. We took these pictures two weeks after the first seedling planting. We captured the images from a height of approximately 2 feet, with a temperature of approximately 29 degrees Celsius. There was not a single image in the collection that featured an overcast sky or dim lighting. Table 2 illustrates the description of Chilli datasets.
Conventional dataset
The “Early crop weed dataset” provided by the Agricultural University of Athens contains images of early-stage tomato, cotton, velvetleaf, and black nightshade in the field31. The photographs were taken with a Nikon D700 (2272 × 1704) from three separate farms in Greece between May and June 2019 under identical lighting conditions. Table 3 Summarizes the Early Crop Weed dataset.
Pre-processing
Figure 2 shows the steps involved in the proposed algorithm. Image preprocessing prior to training a model is essential for increasing data quality, enhancing model performance, and ensuring training efficiency. The input image of size N x N is first resized to Nr x Nr to match the network’s input size. To standardize the input, normalize the pixel values of the image to a specific range, such as by dividing them by 255. We then performed data augmentation techniques to enhance the model generalization capability, employing image transforms such as flip, rotate, brightness, and blur. Create anchor boxes in various sizes to serve as reference points for predicting bounding boxes. Convert the ground truth bounding box coordinates to YOLO format to obtain the pre-processed image n x n.
Detection and classification algorithm
YOLOv532
Figure 3 illustrates the architecture of YOLOv5 algorithm.
The backbone network B of YOLOv5 receives a nxn input image. The image first passes through the BottleneckCSP27 architecture, which extracts the deep semantic information from the input images. The Spatial Pyramid Pooling (SPP) layer33 extracts the feature maps of dimension (Fhin, Fwin) and passes them on, assisting in removing the fixed-size constraint of the neural network. The SPP layer divides the input feature map into discrete spatial bins and performs pooling from each bin. Every spatial bin undergoes an independent pooling operation, usually max pooling, which is a mathematical calculation of the highest value in each bin.
The output dimensions of the pooled feature map is given by
where floor is a mathematical operation that rounds a given number down to the nearest integer. The three feature maps, Fb1, Fb2 and Fb3 obtained from the backbone network are 80 × 80 × 256, 40 × 40 × 512, and 20 × 20 × 1024.
The feature maps from B are transferred to the PANet34 neck N, where 1 × 1 convolutions are performed to reduce computations and reduce the number of feature channels. Up sampling is then performed to convert the low-resolution features to higher resolutions and the output is fused with the output of the bottleneck layer of B.
As mentioned, Hfi and Wfi represent the input height and width and Hfo and Wfo represent the output height and width of the feature map from B respectively. In the up sampled feature map, the position (hout, wout) corresponds to the position (hin, win) in the input feature map.
The corresponding position in the input feature map is computed by
where SFw and SFh are the scaling factors in both dimensions which is given by
The four nearest neighbours surrounding the position hin, win (from Eqs. 3 and 4) are (a1, b1), (a1, b2), (a2, b1), and (a2, b2), were
The weights are calculated based on the distance between hin, win and its neighbours.
The pixel value is interpolated using the four nearest neighbour values and their corresponding weights:
The up-sampling process is achieved by repeating this process for each pixel in the feature map. The concatenated feature Z from Eq. 17 is represented as
The axis parameter indicates the specific axis along which the concatenation operation is executed.
This output undergoes a similar process before being fused with the output of the first bottleneck layer of B. The output here is then passed through similar layers but while performing 3 × 3 convolutions in between to extract the spatial information. The fused feature maps Fn1, Fn2, and Fn3 are then passed on to the head network, H, where 1 × 1 convolutions are done to reduce the number of channels and H predicts the bounding box coordinates bx, by, bh, bw and the class of the object.
The logistic activation function, denoted by σ, restricts the output values to the range of 0 to 1. The application of this procedure involves the utilization of tbx and tby to ensure that the centroid of the bounding box is located within the grid. As per the equations provided above, the bounding boxes that correspond to anchors falling within the range of 0.5 to 2 times the anchor size are taken into account. The final output image is a no x no image with bounding box, class label and confidence score.
Adaptable YOLOv5
The adaptable YOLOv5 model introduces adaptively spatial feature fusion (ASFF) blocks into the head layer as shown in the Fig. 4. This allows the head network to learn in an adaptable manner the spatial weights necessary for feature fusion at various scales, thereby increasing the detection accuracy of objects. ASFF balances the conflicts between different level features and improves the effectiveness of the feature pyramid to increase the model’s average detection accuracy35.
The feature maps Fb1, Fb2, and Fb3 are derived from the backbone network, while the feature maps Fn1, Fn2, and Fn3 are obtained from the neck network. To generate x1→2, Fn1 undergoes convolution to match the number of channels in the Fn2. Subsequently, the feature map with the same dimensions as Fn2 is up sampled. Fn3 undergoes convolution and down sampling operations in order to modify its channel count and dimensions. As a result, x3→2 is obtained. The adjustment of the Fn2 feature map is achieved by modifying the number of channels subsequent to the convolution operation, resulting in the acquisition of X2→2. The three feature maps are subjected to the SoftMax function for processing. This yields the weight coefficients α, β, and γ for x1→2, x2→2, and x3→2, respectively. The ASFF fusion calculation is performed utilizing the following formula.
where \(\:{y}_{ij}^{l}\) is the feature map obtained using ASFF module.
Results and discussion
This section uses graphs to analyse the obtained simulation results, elucidating the evaluation metrics to provide precise details and debating the conclusions drawn from the outcomes of both models.
Simulation parameters
Table 4 shows the simulation parameters for the proposed research.
Performance metrics
The variable tp stands for true positive, signifying cases where crop images accurately identify as a crop. Similarly, tn stands for true negative, referring to instances where weed images correctly identify as weeds. The notation “fn” signifies a false negative, indicating an incorrect classification of a crop as a weed. Simply put, “fp” signifies a false positive when an incorrect classification of a weed as a crop occurs36.
Precision: Out of the total number of crop images, precision refers to the number of correctly labelled crop images.
Recall: The recall measures the proportion of correctly labelled crop images to the total number of crop images.
The F1 score is a statistical measure that represents the harmonic mean of a model’s precision and recall values. The values of the variable fall within the interval of 0 to 1.
The confidence score represents the probability of the algorithm detecting an image accurately. We compute the mean average precision at various intersection over union (IoU) thresholds to determine the scores.
Crop–weed identification
Outcome of crop–weed identification using YOLOv5
We obtain the resulting confidence score by multiplying the conditional class probabilities and confidence score per bounding box. Figures 5 and 6, and Fig. 7 display scores of 0.9, 0.78, and 0.48. These scores represent the probability of the occurrence of a crop and weed in the box.
Outcome of crop–weed identification using adaptable YOLOv5
During the testing process, one of the outputs provided is the confidence score per bounding box. The confidence score multiplies the probability of classifying an image as a crop or weed given the presence of an object within the bounding box. The resulting numerical value, as shown in Figs. 8, 9 and 10, determines the final confidence score. These scores are 0.9, 0.8, 0.7, and 0.4.
Outcome of crop–weed identification using adaptable YOLOv5
Performance analysis
From Table 5, it is evident that the developed crop-weed identification mechanism has enhanced the model’s accuracy and PR values around 1.5–3.5%. Adaptable YOLOv5 has outperformed the existing YOLOv5 model in terms of metrics such as F1 score, mAP, precision, and recall, which denote the quality of the classifier.
Simulation results
F1 score vs confidence
Tomato dataset – YOLOv5
Tomato dataset – Adaptable YOLOv5
The confidence score that optimizes precision and recall is 0.782, which corresponds to the maximum F1 value of 1.00 for Improved YOLOv5. Since higher confidence and F1 values are desirable, it is evident from Figs. 11 and 12 that the improved YOLOv5 has better accuracy.
Chilli dataset – YOLOv5
The confidence score that optimizes precision and recall is 0.393, which corresponds to the maximum F1 value of 0.79 for Improved YOLOv5. Since higher confidence and F1 values are desirable, it is evident from Figs. 13 and 14 that the improved YOLOv5 has better accuracy.
Chilli dataset – Adaptable YOLOv5
Cotton dataset – YOLOv5
The confidence score that optimizes precision and recall is 0.561, which corresponds to the maximum F1 value of 0.92 for Improved YOLOv5. Since higher confidence and F1 values are desirable, it is evident from Figs. 15 and 16 that the improved YOLOv5 has better accuracy.
Cotton dataset – Adaptable YOLOv5
Figures 17 and 18, and 19 indicate a decrease in the percentage of false predictions when transitioning from YOLOv5 to Adaptable YOLOv5. For the tomato dataset, the percentage of false predictions fell from 47% to 33%; for the Chilli dataset, the values dropped from 34.67% to 26%; and for the Cotton dataset, the percentage of false predictions fell from 33.56% to 28%. Thus, the adaptable YOLOv5 model has yielded results of approximately 5–10%.
Conclusion and future work
This research explored the YOLOv5 object detection algorithm for classifying crops and weeds in tomato and chili-cultivated farms. Three sets of data were used to train and test the model: a custom dataset with 707 images of green chili crops and weeds from Ponnandagoundanoor farms in Coimbatore district, Tamil Nadu, India; an open access dataset called “The Early Crop Weed set” from Borja Espejo Garcia et al.‘s “Towards weed identification assistance through transfer learning”; and a dataset of images of the tomato crop and its weed, Black nightshade; and the cotton plant and its weed, Velvetleaf. There were 200 images of tomato crops and 130 images of weeds in the dataset. After augmentation, the dataset’s size increased to 736. Similarly, we achieved a total of 707 images for the Chilli dataset. PyTorch serves as the basis for PyTorch serves as the foundation for the algorithm, and we set the training epoch at 200. We computed the F1 score, detection time, and mAP of each machine learning algorithm, resulting in a tomato dataset with an F1 score of 98%, a mAP of 0.995, and a detection time of 190 ms, a cotton dataset with an F1 score of 91% and a mAP of 0.947, and a chilli dataset with an F1 score of 78% and an mAP of 0.811. We improved YOLOv5’s accuracy by adding adaptively spatial feature fusion blocks to its architecture’s head.
We added three ASFFv5 blocks to YOLOv5. This learns to get rid of contradictions across a number of feature scales by spatially filtering input that is at odds with the scales. This makes the scale-feature invariance higher. We tested the improved YOLOv5 algorithm with ASFF modules on the same datasets and found that it achieved an F1 score of 99.7% in the tomato dataset and 79.4% in the chilli dataset, resulting in a 1.14% increase in the F1 score. We achieved an F1 score of 93.53% in the cotton dataset, indicating a 2% increase. The improved YOLOv5 also improved the mAP by about 0.5% and led to a slight decrease in the number of computations performed, thereby making the model more lightweight. After thinking about what has been said, we can say that a new method was created that combines adaptively spatial feature fusion with the YOLOv5 algorithm. The new YOLOv5 is better for real-time detection on an embedded platform and can be built into a robot that pulls weeds in farms. We can further develop the crop-weed identification system into a weed-removal robot with image sensors, which can monitor and perform site-specific weedicides spraying, thereby reducing the need for human labour. We can further enhance the bot to achieve better results.
Future research may employ adaptive tokenisation37 transformers to enable dynamic representation learning for accurate weed-crop differentiation, reducing computational costs while improving precision in complex field environments. Simultaneously, enhancing reinforcement learning via hierarchical game playing with state relay presents a promising framework for autonomous weeding robots38. It enables decision making across various levels and facilitates adaptation to diverse farming systems. The integration of these methods can establish a perception action loop, facilitating the advancement of fully automated and sustainable weed management systems.
Data availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
References
Kesavalu, K. R Asokan & A Abdul Raheem,Horticulture scenario in tamilnadu: progress and constraints. Shanlax Int. J. Econ. Shanlax Journals. 9 (3), 29–35 (2021).
K. V K et al., CNN Based Identification of weeds in Tomato Farm, 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 2023, pp. 1–6. https://doi.org/10.1109/ICCCNT56998.2023.10307829
Yogita Gharde, P. K., Singh, R. P., Dubey, P. K. & Gupta Assessment of Yield and Economic Losses in Agriculture Due To Weeds in IndiaVolume 107Pages 12–18 (Elsevier, Crop Protection, 2018).
Cheng, F. & Cheng, Z. Research Progress on the use of Plant Allelopathy in Agriculture and the Physiological and Ecological Mechanisms of Allelopathy, Frontiers in Plant Science, Volume 6, ISSN 1664-462X.
Oerke, E. C. et al. Crop production and crop protection: estimated losses in major food and cash crops. Nat. Plants. 2 (3), 1–7 (2016).
Rodríguez-Sinobas, L. et al. Spatial optimization of weed management in cereal fields using geostatistics and decision support systems. Comput. Electron. Agric. 147, 38–45 (2018).
Van Evert, F. K. et al. Big Data for weed control and crop protection, Wile, Weed Research, Volume 57, pp. 218–233. (2017).
Gonzalez-de-Santos, P. & Fernandez-Quintanilla, R. A. Fleets of robots for environmentally-safe pest control in agriculture, springerlink. Precision Agric. 18, 574–614 (2017).
LeCun., Y., Bengio., Y. & Hinton, G. Deep learning, Nature, 521, 436–444. (2015).
Vacher, C. et al. Assessing the impact of weed pressure on crop yield variability: A modeling approach applied to maize. Weed Res. 58 (5), 321–331 (2018).
Ghorbani, R. et al. Mechanical weed control in organic farming: A review. Agron. Sustain. Dev. 32 (2), 401–419 (2012).
Jin, X., Che, J. & Chen, Y. Weed Identification Using Deep Learning and Image Processing in Vegetable Plantation, in IEEE Access, vol. 9, pp. 10940–10950, (2021).
Duan, K. et al. CenterNet: Keypoint Triplets for Object Detection, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 6568–6577. (2019).
Singh, V. & Singh, D. Development of an Approach for Early Weed Detection with UAV Imagery, IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, pp. 4879–4882. (2022).
Koshelev, I., Savinov, M., Menshchikov, A. & Somov, A. Drone-Aided detection of weeds: transfer learning for embedded image processing. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 16, 102–111 (2023).
Gao, J., Nuyttens, D., Lootens, P., He, Y. & Pieters, J. G. Recognising weeds in a maize crop using a random forest machine-learning algorithm and near-infrared snapshot mosaic hyperspectral imagery. Biosyst. Eng. 170, 39–50 (2018).
Gurubaran, K. et al. Machine Learning Approach for Soil Nutrient Prediction, 2023 IEEE Silchar Subsection Conference (SILCON), Silchar, India, pp. 1–6, (2023). https://doi.org/10.1109/SILCON59133.2023.10405095
Vi Nguyen Thanh Le and & others March, A novel method for detecting morphologically similar crops and weeds based on the combination of contour masks and filtered local binary pattern operators, gigascience, 9,3, giaa017 https://doi.org/10.1093/gigascience/giaa017. (2020).
Hadid, A. The Local Binary Pattern Approach and its Applications To Face Analysis, 2008 First Workshops on Image Processing Theory, Tools and Applicationspp. 1–9 (Sousse, 2008).
Moazzam, S. I. et al. A Patch-Image based classification approach for detection of weeds in sugar beet Crop, in IEEE access, 9, 121698–121715, (2021).
Karen Simonyan, A. & Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition,arXiv,2015,p. 1409.1556.
A.Farooq, J., Hu & Jia, X. Analysis of Spectral Bands and Spatial Resolutions for Weed Classification Via Deep Convolutional Neural Network, in IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 2, pp. 183–187, Feb. (2019).
Tufail, M. et al. Identification of tobacco crop based on machine learning for a precision agricultural Sprayer, in IEEE access, 9, 23814–23825, (2021).
Liu, Z. & Wang, S. Broken Corn Detection Based on an Adjusted YOLO With Focal Loss, in IEEE Access, 7, 68281–68289, (2019).
Chen, J., Wang, H., Zhang, H., Luo, T. & Wang, Z. DepengWei,Teng Long, Weed detection in sesame fields using a YOLO model with an enhanced attention mechanism and feature fusion, Elsevier Journal on Computers and Electronics in Agriculture, 202, 0168–1699, 24–27, (2022).
Garciaa, B. E., Mylonasa, N., Athanasakosa, L., Fountasa, S. & Vasilakoglou, I. Towards weeds identification assistance through transfer learning. Elsevier J. Computers Electron. Agric. 171, 105306 (2020).
A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv preprintarXiv:2004.10934, (2020).
Wang, C. Y., Liao, H. Y. M., Yeh, I-H. & Wu, Y. H. Ping-Yang Chen,Jun-Wei Hsieh, CSPNet: A New Backbone that can Enhance Learning Capability of CNN, in CoRR,arXiv, Vol 1911.11929,2019.
He, K., Zhang, X., Ren, S. & Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,Computer Vision – ECCV 2014. ECCV 2014. 8691, 346–361 (Springer, 2014).
Shu Liu, L., Qi, H., Qin, J., Shi, J. & Jia Path aggregation network for instance Segmentation, in corrarxiv, 1803, 01534, (2018).
R, S. K. et al. Secured IoT Framework For Soil Moisture Detection, 2023 IEEE Silchar Subsection Conference (SILCON), Silchar, India, pp. 1–6, (2023). https://doi.org/10.1109/SILCON59133.2023.10404465
Hamid Rezatofighi, N. et al. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression, arXiv, 2019,pp.1902.09630.
Wang, Y., Tao, K., Wang, Z. & Sun, J. Memristor-Based GFMM neural network circuit of biology with multiobjective decision and its application in industrial autonomous firefighting. IEEE Trans. Industr. Inf. 21 (7), 5777–5786. https://doi.org/10.1109/TII.2025.3558347 (July 2025).
V, S. B. et al. Intelligent Mobility Prediction Leveraging Machine Learning Approaches,., International Conference on Communication, Computing, Smart Materials and Devices (ICCCSMD), Chennai, India, 2024, pp. 1–8, (2024). https://doi.org/10.1109/ICCCSMD63546.2024.11015144
Kunduracioglu, I. & Pacal, I. Advancements in deep learning for accurate classification of grape leaves and diagnosis of grape diseases. J. Plant. Dis. Prot. 131, 1061–1080. https://doi.org/10.1007/s41348-024-00896-z (2024).
Sun, J., Gao, P., Liu, P. & Wang, Y. Memristor-Based feature recall neural network circuit with Temporal differentiation of emotion and its application in parts inspection. IEEE Trans. Industr. Inf. 21 (7), 5633–5643. https://doi.org/10.1109/TII.2025.3556069 (July 2025).
Zhu, E., Wang, S., Liu, C. & Wang, J. Adaptive tokenization transformer: enhancing irregularly sampled multivariate time series Analysis, in IEEE internet of things journal, https://doi.org/10.1109/JIOT.2025.3554249
Liu, C. et al. Boosting reinforcement learning via hierarchical game playing with state relay. IEEE Trans. Neural Networks Learn. Syst. 36 (4), 7077–7089. https://doi.org/10.1109/TNNLS.2024.3386717 (April 2025).
Author information
Authors and Affiliations
Contributions
G Prabakaran, Bino J and Shabana Parveen M has done the Literature survey, investigation and methodology partParameswaran Ramesh and N Vidhya writing and reviewing the original draft.P. T.V. Bhuvaneswari supervised the research work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ramesh, P., Prabakaran, G., Nagavel, V. et al. Detection of commercial crop weeds using machine learning algorithms. Sci Rep 15, 38791 (2025). https://doi.org/10.1038/s41598-025-22676-x
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-22676-x


















