Abstract
Accurate localization of plant growth points is essential for precision agriculture applications, including electro-weeding and laser weeding. While crop and weed detection has been extensively studied, existing methods focus primarily on object-level recognition and often neglect fine-grained growth point localization. To address this limitation, we propose a novel training strategy, epoch-based prior annealing (EPA), which incorporates the excess green minus excess red (ExG-ExR) index as prior knowledge and introduces schedule factor and gain factor to effectively steer keypoint regression. The experimental results show that incorporating EPA improves keypoint localization performance, with mAP50 increasing by 0.024 and mAP50:95 by 0.011, while maintaining bounding box detection performance. The parameter sensitivity experiments confirmed that both excessively strong and weak guidance can hinder training. Furthermore, analysis of parameters and computational cost shows that the additional overhead introduced by the EPA strategy accounts for less than 0.5% of the total, and be considered negligible. In summary, the proposed EPA strategy significantly improves the accuracy, robustness, and generalizability of plant and growth point detection models, offering a practical and scalable solution for precision agricultural applications.
Similar content being viewed by others
Introduction
Weed management is a critical aspect of agricultural production, as weeds compete with crops for essential resources—such as sunlight, water, and nutrients—thereby hindering crop growth and reducing overall food yield.
Currently, herbicide application remains the most prevalent method for global weed control, however, it raises substantial concerns regarding environmental sustainability and food safety. Site-specific weed management (SSWM) was developed to mitigate these challenges by enabling the precise application of herbicides according to the spatial distribution of weed density1. The SSWM reduces herbicide waste and improves weeding efficiency by leveraging real-time weed density sensing, which necessitates accurate discrimination between crops and weeds. Mechanical weeding is considered a valuable complement to chemical weed control, but it carries an inherent risk of crop damage, potentially leading to partial or even complete plant loss2. Given the risk of potential crop damage, precise guidance of the mechanical weeding tools is essential. These tools should be used as close to the crop plants as possible to maximize the treated field area. For both SSWM and mechanical weeding, accurately identifying and localizing crops and weeds remains a critical requirement3,4. Crop and weed detection has emerged as a research hotspot5,6,7,8, yielding promising results.
In recent years, innovative weeding techniques such as flame weeding9, electro-weeding4, and laser weeding3 have been developed. While these methods have garnered considerable attention from both industry and researchers for their environmentally friendly nature, they impose greater demands on field data acquisition and processing, as they extend beyond simple crop and weed detection to the localization of plant growth points. The accurate detection and localization of plant growth points has gradually become a key objective for researchers. With the rapid advancement of computational technologies, deep learning has emerged as a powerful feature representation framework and has achieved remarkable success across various application domains10,11,12. In the field of precision agriculture, keypoint localization is primarily approached through either deep learning and point cloud-based methods or deep learning and image processing-based methods13,14,15. However, the point cloud-based model construction and post-processing work are computationally intensive, affecting the need for real-time application16. Consequently, most recent research has focused on leveraging deep learning and image processing techniques for keypoint localization. Among deep learning based object detection models, the YOLO series has been widely adopted due to its outstanding performance17,18.
Güldenring et al.19 developed a model to extract fine-grained phenotypic information, including leaf, stem, and vein instances; however, the mAP50:95 for stem detection is only 0.366, indicating room for further improvement. Lac et al.20 proposed an algorithm capable of detecting, locating, and tracking the stem positions of maize and bean crops in images, which is suitable for precision operations in vegetable fields such as mechanical hoeing within crop rows. However, the algorithm does not consider the detection of weed growth points, which limits its broader applicability. Xiang et al.21 proposed a two-stage detection method that combines YOLOv5 and EffiStemNet to specifically locate the stem positions of small weeds. Although it achieved a high detection rate on a custom dataset, the two-stage mechanism introduces practical inconveniences for its application. Li et al.22 compared a two-stage, heatmap-based architecture (YOLO-HRNet) with a single-stage, regression-based one (YOLO-Pose) for joint-stem and weed detection in grassland images. The results show that YOLO-Pose achieved an mAP50 of 0.501 and an mAP50:95 of 0.261 for joint-stem detection, making it an ideal candidate for implementation in autonomous weeding robots requiring precise localization of weeds’ joint stems. Although considerable research has been conducted in the field of plant growth point detection, limitations such as low detection accuracy, complex algorithms, and narrow applicability persist. Therefore, there is a continuing need for in-depth studies aimed at achieving higher accuracy, broader generalizability, and more compact models.
In recent years, multimodal fusion has shown strong potential in enhancing visual perception tasks23,24. Unlike depth or thermal infrared data, the excess green minus excess green (ExG-ExR) index does not require additional sensors and offers the advantages of low computational cost and affordability. It effectively highlights green vegetation against the soil background and plays a crucial role in real-time crop and weed detection, particularly for plant-soil segmentation25,26,27. Zou et al.28 successfully employed the ExG-ExR index combined with the minimum error threshold segmentation method to segment green plants from bare soil, achieving a segmentation accuracy of 93.5%. The incorporation of this important information as a fourth input channel in deep learning models is expected to improve the accuracy of both bounding box and keypoint regression, offering significant application potential. Curriculum learning29 is a training strategy inspired by human learning, in which a model is first exposed to simple tasks and data before gradually progressing to more complex ones. Learning rate annealing30 refers to a strategy in which a model uses a relatively high learning rate at the early stages of training to converge quickly, followed by a gradual reduction in the learning rate to achieve stable convergence toward a local optimum.
The ExG-ExR index highlights plant pixels, as illustrated in Fig. 1, with plant pixels exhibiting higher values than to the soil background does, making this feature more easily captured by the deep learning model. Moreover, plant growth points, being part of the plant, are clearly distinguishable from the soil background. Therefore, during model training, leveraging this feature to constrain the localization of plant growth points within plant pixels is highly beneficial for rapid and accurate detection. In this study, we incorporated the ExG-ExR index as a fourth input channel to the model, and the four-channel input simultaneously improved the model’s accuracy in detecting both plants and their growth points.
Comparison between the original RGB image and its ExG-ExR index. (a) RGB image, (b) ExG-ExR index.
To fully exploit the informative features embedded in the ExG-ExR space, we propose an epoch-based prior annealing (EPA) strategy for plant growth point localization, which integrates the core principles of curriculum learning and learning rate annealing. This strategy encourages the model to follow a “simple-to-complex” learning paradigm by leveraging the ExG-ExR index during the early training phase. Specifically, keypoint pixels with lower ExG-ExR values—typically corresponding to soil background—are generally farther from the plant growth points and are therefore assigned higher loss weights. In contrast, pixels with higher ExG-ExR values, corresponding to plant regions and usually closer to the growth points, are assigned lower loss weights. Therefore, it is necessary to establish a bridge linking the loss function with the prior guidance provided by the ExG-ExR index. When prior knowledge is used to guide the model throughout the entire training process, it can help the model focus more quickly on plant regions during the early stages. However, in later stages, it may misdirect growth point localization toward the points with the highest ExG-ExR values, which do not necessarily correspond to true growth points. Therefore, a dynamic loss adjustment mechanism conditioned on the training epoch is required to progressively reduce the reliance on ExG-ExR priors and prevent such misguidance. The annealing mechanism dynamically adjusts the weight of prior guidance based on the training epoch. During the early stage of training, higher weights are assigned to the prior knowledge to guide keypoint localization toward plant regions. Throughout training, the weights are smoothly adjusted to implement the annealing of prior guidance, minimizing abrupt changes that could disrupt learning. In the later stage, the prior knowledge is gradually withdrawn, allowing the model to autonomously learn keypoint features and fine-tune the localization of growth points. The EPA strategy requires balancing prior guidance and annealing based on training epochs, which necessitates careful and precise parameter tuning. With appropriate parameter settings, the EPA training strategy provides the model with the opportunity to achieve more precise localization of growth points.
The EPA strategy incorporates the ExG-ExR index as prior knowledge to guide the training of the growth point localization model. It establishes a link between the ExG-ExR index and the loss weights through a prior factor, and applies a schedule factor to dynamically adjust the strength of this prior during training. With appropriate parameter tuning, the model follows a “easy-to-hard” learning trajectory in growth point localization, ultimately improving keypoint accuracy without compromising bounding box detection performance.
The contributions of this study are summarized as follows:
-
(1)
The ExG-ExR index was introduced as a fourth input channel in plant and growth point detection tasks, enabling plant features to be more effectively captured by the model and thereby improving the accuracy of plant growth point detection.
-
(2)
An EPA training strategy for plant growth point localization was proposed. The model is guided to follow a “from easy to hard” learning principle during growth point localization, enhancing the overall learning effectiveness.
-
(3)
The effects of different prior weights, prior guidance durations, and annealing slopes on model performance were systematically evaluated, and the optimal parameter combination was determined. This approach allows the growth point detection accuracy to be improved without compromising the performance of plant bounding box detection.
The paper is organized as follows: Section Introduction provides an overview of keypoint detection methods in precision agriculture, introduces the motivation behind the proposed EPA strategy, and summarizes the main contributions and innovations of this work. Section Materials and Methods presents the principles of the proposed EPA method in detail and describes the datasets, evaluation metrics, and experimental settings used in this study. Section Experiments and Results validates the effectiveness of the proposed EPA strategy through extensive experiments and provides an analysis of its parameter sensitivity. Section Discussion discusses the experimental results and highlights the key characteristics of the proposed EPA strategy. Section Future Work outlines directions for future work, and Section Conclusion concludes the study.
Materials and methods
Prior factor
Figure 1 presents the plant image in both the RGB and ExG-ExR channels. It was observed that in the ExG-ExR channel, plant pixels stood out distinctly from the soil background, making them easier to identify. This represented a highly valuable piece of information worth further exploration—it not only served as the fourth input channel for plant detection models to assist in learning plant features, but also potentially acted as prior knowledge to guide the localization of plant growth points, thereby enabling the construction of a prior factor.
In crop and weed detection tasks in agricultural fields, the ExG-ExR index was commonly employed to segment plant pixels from the soil background. It was defined by the following Eqs. (1)-(4):
where r, g, and b denote the normalized channel proportions of the red, green, and blue channels, respectively.
After obtaining the ExG-ExR index, a gray-level stretching operation was applied to map the index values into the range of 0-255, facilitating its concatenation with the RGB image as the fourth input channel. Since plant pixels exhibited larger ExG-ExR index values but corresponded to smaller loss weights, prior factor (P) was computed based on the ExG-ExR index, as shown in Eq. (5). In contrast to plant pixels, soil background pixels had relatively large P values, corresponding to larger losses. This property was exploited in conjunction with the keypoint loss function to guide the plant growth point localization task through prior knowledge.
The prior factor establishes a bridge between the prior knowledge and weights of the growth point localization loss, quantifying the respective influences of soil-background pixels and plant pixels on the loss function.
Schedule factor
Beyond prior guidance, another core component of the EPA strategy is the annealing mechanism. Although the ExG-ExR index effectively constrains the search region for growth points within plant pixels, it does not provide a strict one-to-one correspondence with the actual growth point locations. Therefore, the model requires increased training flexibility in the later stages to accurately locate growth points. The schedule factor, a function of the training epoch, enables dynamic adjustment of the prior knowledge throughout training.
At the outset of training, the model could place the predicted growth point at any location in the image. Predictions located in soil areas with low ExG-ExR indices were clearly erroneous and therefore incurred a larger loss penalty. Conversely, predictions within plant regions—where ExG-ExR indices are relatively high—received a smaller loss, reflecting proximity to the ground truth and requiring only minor spatial adjustments. Motivated by this intuition, P (Eq. (5)) was introduced to incorporate ExG-ExR information into the keypoint regression process, effectively guiding the model toward plant regions during the early learning phase. However, the ExG-ExR index served as a weak feature, providing only a soft constraint: it could prevent predicted points from falling on soil background pixels but could not precisely guide the exact position of the growth point. Therefore, P was applied only during the early training phase and gradually annealed out in later stages, allowing keypoints to converge accurately to their true locations. To facilitate the dynamic modulation of P throughout training, a schedule factor (S) was incorporated, calculated as shown in Eq. (6).
where T denoted the epoch threshold that controlled the involvement of prior knowledge in the keypoint regression process. When the number of epochs exceeded T during training, S approached zero, indicating that the ExG-ExR prior had been effectively annealed out of the training pipeline. \(\beta\) controlled the slope of the sigmoid-shaped schedule. During the initial three epochs, the model underwent a warm-up phase, during which the prior guidance was intentionally disabled. The evolution of the schedule curve with respect to training epochs was illustrated in Fig. 2
Decay of the schedule factor as training progresses.
Model structure
YOLOv8-Pose, developed by Ultralytics, is a keypoint detection model that extends the original object detection architecture to support the localization of multiple keypoints for each object instance. The model employs an anchor-free design and a decoupled detection head, enabling simultaneous prediction of object bounding boxes and keypoints for efficient end-to-end pose estimation. It is well-suited for various keypoint regression tasks, and its architecture is illustrated in Fig. 3.
YOLOv8-pose architecture diagram.
Given that plant growth points can also be regarded as keypoints with spatial structural characteristics, YOLOv8-Pose provides a foundational general framework for their automatic detection. By defining growth points in plant images analogously to “skeleton joints” in human pose estimation and applying appropriate adaptations, YOLOv8-Pose can be generalized to the task of localizing plant growth points in agricultural images. Due to hardware constraints, the nano version of the YOLOv8 model was selected as the baseline model for experiments.
The contribution of the EPA training strategy to the model’s overall parameter and computational cost originates solely from the first convolutional layer connected to the input. The parameters \(paras\) and computational cost \(FLOPS\) of a standard convolutional layer can be calculated using Eqs. (7) and (8).
where k denotes the convolution kernel size, \({C_{in}}\) and \({C_{out}}\) represent the number of channels in the input and output feature maps, respectively, and W and H denote the width and height of the output feature map. The EPA training strategy adds the ExG-ExR index as the fourth channel of input. Calculations show that the resulting increase in model parameters and computational cost accounts for only 0.0047% and 0.35% of the total, respectively, and can therefore be considered negligible.
Loss
The loss function of the YOLOv8-Pose model consists of five components in total, as shown in Eq. (9), with the keypoint loss specifically defined in Eq. (10).
where \(\sigma\) was set to 0.107, a value derived from pose keypoint detection; \(area\) represented the area of the target bounding box; and d denoted the Euclidean distance between the predicted point and the ground-truth point; \(los{s_{box}}\), \(los{s_{cls}}\), \(los{s_{dfl}}\), and \(los{s_{kobj}}\) denote other loss functions involved in bounding box regression, classification, and keypoint regression tasks, which are not modified in this study. This function accurately characterized the discrepancy between predicted and ground-truth points. When d decreases, \(los{s_{pose}}\) correspondingly becomes smaller, enabling effective guidance of keypoint regression.
As a loss function originally developed for human pose estimation, \(los{s_{pose}}\) includes both the weights of keypoints and the Euclidean distance between predicted and ground-truth points, making it applicable to all keypoint detection tasks. For the specific task of plant growth point detection, since growth points are part of the plant and typically exhibit higher ExG-ExR values than the soil background, this characteristic was leveraged to improve the loss function for this task.
In the EPA mechanism, two meaningful parameters, P and S jointly formed the guidance weight C (Eq. (11)). By multiplying the keypoint regression loss \(los{s_{pose}}\) with W, the epoch-based prior annealing mechanism was integrated into the plant growth point regression loss function \(los{s_{gp}}\) (Eq. (12)).
where \(\alpha\) represented the gain factor, which served to globally scale the prior-guided weight, mitigating the influence of \(los{s_{gp}}\) on other components of the loss function. This ensured that the magnitude of the guidance term remained within a reasonable range, thereby preserving the balance among multiple loss terms during training.
Datasets
This research utilized the CropAndWeed dataset31, which includes a diverse range of crops: bean (Vicia faba L.), maize (Zea mays L.), pea (Pisum sativum L.), potato (Solanum tuberosum L.), pumpkin (Cucurbita pepo L.), soy (Glycine max (L.) Merr.), sugar beet (Beta vulgaris L.), and sunflower (Helianthus annuus L.). This dataset provided extensive information on the early growth stages of various crops and weeds, with image dimensions of 1920 × 1088 pixels.
A total of 7,705 images were collected using a semi-professional single-lens reflex (SLR) camera equipped with a full-frame sensor. To ensure robustness across different environmental conditions, the dataset was collected over four years (March to July), capturing variations in lighting, weather, and soil types. All images were taken manually in auto-exposure mode with a constant 50 mm focal length from a top-down perspective, approximately 1.1 m above ground level. All images were split into training and validation sets at a ratio of 8:2. In total, the dataset comprised 20,765 crop instances and 57,523 weed instances, with the weed category encompassing a broad range of species that exhibited substantial variability in morphology and growth stages.
Weed Stem Detection Dataset (WSD) was also used in this study32. The standard RGB images are collected by a custom-built autonomous vehicle equipped with Teledyne FLIR BFS-U3 123S6C-C, a high-resolution imagery sensor. Each image has a resolution of 2048 × 2048. The sensor is embedded in the autonomous vehicle, making the sensor at a relatively fixed height above the surface, which is one meter for the prototype vehicle. The bounding boxes of crop and weed and weed stem locations were annotated. In this work, only weed boxes and stem locations were used.
The WSD dataset contains 511 images and was first split into training and validation sets with an 8:2 ratio. Data augmentation techniques were then applied to enlarge the dataset and mitigate the risk of overfitting33. By reducing brightness by 50%, horizontal mirroring, vertical mirroring, and rotating 180°, WSD dataset was expanded to 5 times the original.
Evaluation metrics
mAP50 and mAP50:95 were employed to evaluate model performance on the CropAndWeed dataset. Specifically, mAP50 represented the total category average accuracy when intersection over union (IoU) was equal to 0.5, whereas mAP50:95 represented the average accuracy over multiple IoU thresholds ranging from 0.5 to 0.95 with a step size of 0.05, making it a more stringent evaluation metric compared to mAP50. Since plant and growth point detection constituted a multi-task object detection task, both mAP50 and mAP50:95 were reported for bounding box detection and keypoint localization to ensure that improvements in keypoint accuracy did not come at the expense of bounding box precision.
Experimental settings
This experiment was conducted on a personal computer. The detailed computer software, hardware configuration, and training environment settings were presented in Table 1.
The batch size was set to 32, epochs to 150, lr0 = 0.01, lrf = 0.01, and the input images were resized to 640 × 640 pixels.
Experiments and results
Experiments were designed along four dimensions to thoroughly evaluate the effectiveness of the proposed EPA training strategy. The ablation experiment aimed to assess the contribution of each component of the strategy. The second dimension experimentally analyzed the sensitivity of three parameters—prior-guidance duration, prior-guidance strength, and scheduling-factor slope—and determined the optimal parameter configuration. The third dimension consisted of cross-version evaluations, in which the strategy was applied to three YOLO-based models to verify its generalizability across models. The fourth dimension evaluated the effectiveness of the EPA strategy on an additional dataset and under varying environmental conditions.
Module ablation experiments
In this experiment, the contribution of each component of the strategy was evaluated under the default settings of \(\alpha\)= 0.06, T = 70, and \(\beta\)= 0.2. The results are shown in Table 2.
Experiments 1 and 2 demonstrated that incorporating the ExG-ExR index as the fourth input channel resulted in minor fluctuations in bounding box mAP50, while keypoint mAP50 and mAP50:95 increased by 0.007 and 0.008, respectively. Experiment 4 using only the standard RGB three-channel input, isolating the prior and annealing mechanisms. Compared to Experiment 4, Experiment 5, which added the fourth ExG-ExR channel, increased the keypoint localization metrics, with mAP50 and mAP50:95 improving by 0.010 and 0.006, respectively. These results highlight the essential positive contribution of the ExG-ExR channel to plant growth point detection.
Compared with experiment 2, experiment 3 showed a decline in keypoint detection accuracy. This decrease was primarily attributed to the continuous involvement of prior knowledge throughout training, which resulted in erroneous guidance during the later stages. Consequently, the model lacked sufficient flexibility to fine-tune the localization of growth points within plant pixels.
Compared with experiment 3, experiment 5 achieved a comparable performance in bounding box detection (mAP50 and mAP50:95), but demonstrated a substantial improvement in growth point detection accuracy, with mAP50 increasing by 0.024. This improvement was attributed to the guidance provided by prior knowledge, which enabled the model to rapidly focus its learning on plant pixels during the early training stages. Furthermore, the prior annealing mechanism allowed the model to progressively refine growth point localization within plant regions during the later stages, ultimately resulting in a significantly higher keypoint detection accuracy.
The detection of crops and weeds, as well as the localization of growth points before and after applying the EPA training strategy, is illustrated in Fig. 4.
Visualization of plant growth point detection results. (a–g) correspond to different crop or weed instances.
Parameter sensitivity
Effect of T on model performance
In this experiment, the guidance duration parameter T was varied to investigate its impact on model performance. The parameter \(\alpha\) was fixed at 0.06, and \(\beta\) was fixed at 0.2. The results on the validation set are shown in Table 3.
The experimental results indicated that, when T was set to any of the five tested values, the model consistently outperformed the baseline, demonstrating that the prior guidance duration was within a reasonable range and effectively directed the model to learn features from plant regions. Optimal performance was achieved at T= 70. When T was decreased, keypoint detection accuracy deteriorated correspondingly, suggesting that a shorter guidance period weakened the model’s ability to learn keypoint features. Conversely, increasing T led to greater fluctuations in performance, reflecting instability in keypoint feature learning. These findings indicated that both excessively short and excessively long guidance durations were detrimental to model training. Figure 5 provides an intuitive illustration of how the mAP50 metric varies with the parameter T, and compares the EPA enhanced model with the baseline. The trends observed in the figure are consistent with the results discussed above.
Effect of \(\:T\) on mAP50.
Effect of \(\alpha\) on model performance
In this experiment, the guidance strength \(\alpha\) was varied to investigate its impact on model training outcomes, while T was consistently set to 70, and \(\beta\) to 0.2. The model’s performance on the validation set was summarized in Table 4.
The experimental results showed that the model achieved the highest keypoint mAP50 when \(\alpha\)= 0.06, and the highest keypoint mAP50:95 when \(\alpha\)= 0.03. Across the seven tested \(\alpha\) values, model performance exhibited some fluctuations, reflecting instability in learning plant features. Nevertheless, all configurations outperformed the baseline, demonstrating the effectiveness of the proposed training strategy.
Figure 6 illustrates the variation of the mAP50 metric under different values of the parameter \(\alpha\). As \(\alpha\)increases beyond \(\alpha\)= 0.06, the mAP50 score decreases, suggesting that excessively large prior weights may mislead the model in localizing growth points. Conversely, when \(\alpha\) decreases, the mAP50 value initially declines and then rises, reflecting the sensitivity of the EPA training strategy to the choice of \(\alpha\). The baseline model achieves an mAP50 of 0.677, and it is notable that the mAP50 values obtained under all \(\alpha\) settings remain consistently above this baseline, further demonstrating the effectiveness of the proposed EPA strategy.
Effect of \(\:\alpha\:\) on mAP50.
Effect of \(\beta\) on model performance
\(\beta\) controls the slope of the schedule factor: smaller values result in a slower decay, while larger values lead to a faster decline. In this experiment, \(\alpha\) and T were set to 0.06 and 70, respectively, in order to investigate the effect of varying \(\beta\) on the overall model performance. The results are presented in Table 5.
The experimental results indicate that when \(\beta =0.20\), the model achieves its highest mAP50 score of 0.701 for keypoint localization. However, when \(\beta =0.22\), the model attains its highest mAP50:95 score of 0.475. All five test results surpass the baseline model, further demonstrating the effectiveness of the EPA training strategy. Figure 7 provides an intuitive illustration of how the mAP50 metric for the growth point localization task varies with \(\beta\), indicating that the EPA strategy is also sensitive to this parameter.
Effect of \(\:\beta\:\) on mAP50.
Cross-model experiments
The EPA training strategy did not impose any structural or parameter requirements on the model, as it solely involved modifications to the model input and loss function. Consequently, this strategy can, in principle, be applied to any deep learning model for plant and growth point detection. To validate this property, the strategy was preliminarily applied to the nano versions of YOLOv11-Pose and YOLOv12-Pose. Experimental settings were kept consistent with the ablation experiments. The results were summarized in Table 6.
The results demonstrated that, for all three YOLO-based models, both bounding box and keypoint detection accuracies were substantially and consistently improved after increasing the number of input channels and applying the EPA training strategy. Specifically, improvements in YOLOv8-Pose and YOLOv11-Pose were primarily observed in keypoint detection, whereas YOLOv12-Pose exhibited enhancements in both bounding box and keypoint detection tasks. This phenomenon could be attributed to the model’s training process, which optimized the overall loss, where factors \(los{s_{box}}\) and \(los{s_{gp}}\) dynamically interact. The consistent performance gains across multiple models validated the generalizability of the proposed strategy.
Evaluation of model performance across different datasets and environments.
The WSD dataset was used to evaluate the effectiveness of the EPA training strategy. As illustrated in Fig. 8, both the mAP50 and mAP50:95 metrics for growth point localization exhibit substantial improvements without compromising the bounding box detection performance, thereby confirming the effectiveness of the proposed EPA strategy.
Experimental results of WSD dataset before and after the use of the EPA mechanism.
In real-world applications of crop and weed detection and growth point localization, it is challenging to eliminate the influence of environmental factors such as different lighting, shadowing, soil moisture, and soil type. To further assess the robustness of the model trained with the EPA strategy under different environmental conditions, this study incorporates the environmental annotations provided in the CropAndWeed dataset and conducts validation experiments. The validation set of the CropAndWeed was divided based on lighting conditions and soil moisture levels, resulting in new validation sets. The performance of the YOLOv8-Pose models trained with and without the EPA strategy was compared under these conditions. The mAP50 and mAP50:95 results for the growth point localization task are presented in Tables 7 and 8.
Table 7 shows that, under both sunny light and diffuse light, the model trained with the EPA strategy exhibits a clear and consistent performance improvement, demonstrating the robustness of the EPA strategy to lighting conditions. Table 8 further shows that the performance gains vary considerably across different levels of soil moisture. Specifically, the model achieves substantial improvements when the soil is either dry or wet, whereas its performance deteriorates under moderate moisture conditions, indicating that the environmental adaptability of the EPA training strategy still requires enhancement.
Discussion
Motivated by the need for improved localization accuracy in plant growth point detection, the EPA training strategy was proposed. Experimental results demonstrate that the EPA training strategy effectively balances multiple loss components in plant and its growth point detection tasks, consistently improving growth point localization accuracy. The ExG-ExR index used in this study can be efficiently computed from standard RGB channels and serves as a guidance signal without requiring additional detectors. Furthermore, it introduces minimal additional model parameters and computational overhead, making the approach suitable for deployment in resource-constrained scenarios.
The proposed loss function annealing mechanism mitigates over-reliance on prior information, enabling a smooth transition from early-stage guided learning to late-stage autonomous learning. This training strategy is based on two key insights: (1) the strong discriminative capability of the ExG-ExR index in distinguishing plant pixels from soil background; and (2) the inherent physiological characteristic that growth points are typically located at apical meristems or stem-node junctions. The addition of the fourth input channel combined with the prior annealing mechanism not only provides interpretability but also maintains a model-agnostic design, allowing the strategy to be easily transferred to other keypoint detection models. This property was furthervalidated through cross-model experiments conducted on three YOLO variants capable of human pose estimation, providing strong evidence for the portability of the proposed EPA strategy.
The parameter sensitivity experiments indicate that, although the proposed EPA training strategy effectively raises the upper bound of model performance, different parameter combinations result in substantial fluctuations in accuracy, reflecting a high sensitivity of the strategy to training parameters. This sensitivity arises because adjustments of \(\alpha\), \(\beta\), and T alter the overall gradient flow, leading to nonlinear changes in the training trajectory and potentially affecting model convergence. For instance, excessively strong prior guidance may bias growth point predictions toward pixels with high ExG-ExR values, which do not correspond to the actual growth point locations, whereas excessively weak guidance fails to sufficiently constrain the learning region, allowing growth points to be erroneously predicted in soil background pixels. Similar effects are observed when adjusting the guidance duration. Furthermore, the losses involved in model training exhibit dynamic interactions—for example, between \(los{s_{gp}}\) and \(los{s_{box}}\)—where over- or under-weighting one component can disrupt the balance among losses, hindering overall training performance. These observations suggest a trade-off between early guidance and late-stage model autonomy. Although the model achieves improved performance through parameter tuning, the EPA training strategy exhibits sensitivity to its hyperparameters. As a result, the optimal settings identified in this study may not be directly transferable to other tasks, necessitating additional effort to search for task-specific parameter combinations.
Future work
Although the ExG-ExR index has demonstrated considerable value in vegetation analysis, it is susceptible to environmental factors such as illumination34. This limitation suggests the need for developing more robust color indices or incorporating illumination-invariant representations. For example, Image phase congruency—a dimensionless measure invariant to changes in image brightness or contrast—offers a promising direction for mitigating illumination effects in plant feature extraction35. Because the EPA strategy does not depend on any specific form of prior knowledge, it can incorporate diverse types of priors that describe local characteristics of plant growth points—such as spectral, physiological, structural, or geometric cues. This flexibility grants the EPA framework strong prior substitutability, providing methodological extensibility for future multimodal plant growth point localization systems. The effectiveness of the EPA training strategy has been well demonstrated within the YOLO family of models. Future work will extend its validation to a broader range of keypoint localization architectures.
The sensitivity of the EPA strategy’s hyperparameters imposes certain limitations on its practical application. Therefore, further research is required to explore adaptive scheduling mechanisms that can systematically adjust to different training tasks. Furthermore, other forms of prior knowledge, such as ExG36,37 and near infra-red (NIR)38, remain to be explored in the context of plant and growth point detection, warranting further investigation into their potential contributions.
Although the EPA training strategy consistently increases the upper-bound accuracy of plant growth point localization models, certain challenging scenarios—such as ambiguous cases, occlusions, overlapping leaves, and tiny weed seedlings—still pose difficulties for correct prediction. Improving accuracy is an ongoing pursuit, and future work will continue to refine the model to further enhance prediction performance and better support emerging weed-control technologies such as laser weeding.
Conclusion
In this study, we proposed EPA training strategy, for the precise localization of plant growth points in field environments. The approach incorporates the ExG-ExR vegetation index as a fourth input channel, and employs a dynamic epoch-based annealing mechanism to guide keypoint regression. Ablation studies demonstrated the importance and contributions of the fourth-channel input, prior guidance, and the annealing mechanism. Parameter sensitivity analyses identified the optimal training configuration, resulting in improvements of 0.024 in mAP50 and 0.011 in mAP50:95 for growth point detection, without compromising bounding box detection performance. Cross-model evaluations further validated the generalizability and robustness of the proposed strategy across multiple YOLO-Pose variants.
Inspired by the curriculum learning principle of “easy-to-hard,” the EPA strategy leverages the weak correlation between ExG-ExR values and growth point locations to guide the model during early training, while gradually phasing out prior influence to allow fine-grained keypoint refinement. Overall, the proposed approach provides a rational and effective learning process, achieving enhanced localization performance.
Beyond the current application, this strategy holds strong potential for a wide range of precision agriculture tasks, including growth stage monitoring, autonomous field operations, and large-scale phenotyping. Its lightweight, adaptability to different crop types, and broad model generalizability suggest considerable extensibility for future research in intelligent farming systems and data-driven plant management.
Data availability
You can access the dataset used in this study at the following links: https://github.com/cropandweed/cropandweed-dataset.
References
Wiles, L. J. Beyond patch spraying: site-specific weed management with several herbicides. Precis Agric. 10, 277–290 (2009).
Machleb, J., Peteinatos, G. G., Kollenda, B. L., Andújar, D. & Gerhards, R. Sensor-based mechanical weed control: present state and prospects. Comput. Electron. Agric. 176, 105638 (2020).
Andreasen, C., Scholle, K. & Saberi, M. Laser weeding with small autonomous vehicles: friends or foes? Front. Agron. 4, 841086 (2022).
Slaven, M. J., Koch, M. & Borger, C. P. D. Exploring the potential of electric weed control: a review. Weed Sci. 71, 403–421 (2023).
Fatima, H. S. et al. Formation of a lightweight, deep learning-based weed detection system for a commercial autonomous laser weeding robot. Appl. Sci. 13, 3997 (2023).
Ma, C., Chi, G., Ju, X., Zhang, J. & Yan, C. YOLO-CWD: A novel model for crop and weed detection based on improved YOLOv8. Crop Prot. 192, 107169 (2025).
Wang, Q. et al. A deep learning approach incorporating YOLO v5 and attention mechanisms for field real-time detection of the invasive weed solanum rostratum Dunal seedlings. Comput. Electron. Agric. 199, 107194 (2022).
Zhou, Y. A YOLO-NL object detector for real-time detection. Expert Syst. Appl. 238, 122256 (2024).
Ascard, J. Effects of flame weeding on weed species at different developmental stages. Weed Res. 35, 397–411 (1995).
Norouzi, R. et al. A dual-feature two-sided attention network for anticancer natural products detection. Comput. Biol. Med. 194, 110442 (2025).
Abbasi, K. et al. Computational drug design in the artificial intelligence era: A systematic review of molecular representations, generative architectures, and performance assessment. Pharmacol. Rev. 78, 100095 (2026).
Xue, C. et al. Similarity-guided layer-adaptive vision transformer for UAV tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 6730–6740 (2025).
lv, X. et al. Dynamic whole-life cycle measurement of individual plant height in oilseed rape through the fusion of point cloud and crop root zone localization. Comput. Electron. Agric. 236, 110505 (2025).
Li, Q., Du, Q., Tian, L., Shao, Y. & Lu, G. Enhancing point cloud feature representation via historical node state increments in graph neural networks. Pattern Recognit. 172, 112603 (2026).
Li, P., Wen, M., Zeng, Z. & Tian, Y. Cherry tomato bunch and picking point detection for robotic harvesting using an RGB-D sensor and a StarBL-YOLO network. Horticulturae 11, 949 (2025).
Shuai, L. et al. An improved YOLOv5-based method for multi-species tea shoot detection and picking point location in complex backgrounds. Biosyst Eng. 231, 117–132 (2023).
Wu, X., Tian, Y., Zeng, Z. & LEFF-YOLO: A lightweight Cherry tomato detection YOLOv8 network with enhanced feature fusion. In Advanced Intell. Comput. Technol. Applications 474–488 (2025).
Cai, Y. et al. Cherry tomato detection for harvesting using multimodal perception and an improved YOLOv7-Tiny neural network. Agronomy 14, 2320 (2024).
Güldenring, R., Andersen, R. E. & Nalpantidis, L. Zoom in on the plant: fine-grained analysis of leaf, stem, and vein instances. IEEE Robot Autom. Lett. 9, 1588–1595 (2024).
Lac, L., Costa, D., Donias, J. P., Keresztes, M., Bardet, A. & B. & Crop stem detection and tracking for precision hoeing using deep learning. Comput. Electron. Agric. 192, 106606 (2022).
Xiang, W., Wu, D. & Wang, J. Enhancing stem localization in precision agriculture: A Two-Stage approach combining YOLOv5 with effistemnet. Comput. Electron. Agric. 231, 109914 (2025).
Li, J., Güldenring, R. & Nalpantidis, L. Real-time joint-stem prediction for agricultural robots in grasslands using multi-task learning. Agronomy 13, 2365 (2023).
Xue, Y. et al. FMTrack: Frequency-aware interaction and Multi-Expert fusion for RGB-T tracking. IEEE Trans. Circuits Syst. Video Technol. 1–1 (2025).
Chai, S., Wen, M., Li, P., Zeng, Z. & Tian, Y. DCFA-YOLO: A dual-channel cross-feature-fusion attention YOLO network for Cherry tomato bunch detection. Agriculture 15, 271 (2025).
Chen, X., Lin, F., Ma, F. & Du, C. Effects of long-term input of controlled-release Urea on maize growth monitored by UAV-RGB imaging. Agronomy 15, 716 (2025).
Le, V. N. T., Ahderom, S. & Alameh, K. Performances of the LBP based algorithm over CNN models for detecting crops and weeds with similar morphologies. Sensors 20, 2193 (2020).
Meyer, G. E. & Neto, J. C. Verification of color vegetation indices for automated crop imaging applications. Comput. Electron. Agric. 63, 282–293 (2008).
Zou, K., Chen, X., Zhang, F., Zhou, H. & Zhang, C. A field weed density evaluation method based on UAV imaging and modified U-Net. Remote Sens. 13, 310 (2021).
Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning 41–48 (2009).
Loshchilov, I. & Hutter, F. SGDR: stochastic gradient descent with warm restarts. https://arxiv.org/abs/1608.03983 (2016).
Steininger, D., Trondl, A., Croonen, G., Simon, J. & Widhalm, V. The cropandweed dataset: a multi-modal learning approach for efficient crop and weed manipulation. 2023 IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV). 3718–3727 (2023).
Liu, D. et al. Towards efficient and intelligent laser weeding: method and dataset for weed stem detection. Proc. AAAI Conf. Artif. Intell. 39, 28204–28212 (2025).
Zheng, T., Jiang, M., Li, Y. & Feng, M. Research on tomato detection in natural environment based on RC-YOLOv4. Comput. Electron. Agric. 198, 107029 (2022).
Wang, Y., Yang, Z., Kootstra, G. & Khan, H. A. The impact of variable illumination on vegetation indices and evaluation of illumination correction methods on chlorophyll content Estimation using UAV imagery. Plant. Methods. 19, 51 (2023).
Tian, Y., Wen, M., Lu, D., Zhong, X. & Wu, Z. Biological basis and computer vision applications of image phase congruency: a comprehensive survey. Biomimetics 9, 422 (2024).
Han, X. et al. A rapid segmentation method for weed based on CDM and exg index. Crop Prot. 172, 106321 (2023).
Yang, B., Zhu, Y. & Zhou, S. Accurate wheat lodging extraction from multi-channel UAV images using a lightweight network model. Sensors 21, 6826 (2021).
Lottes, P., Behley, J., Chebrolu, N., Milioto, A. & Stachniss, C. Joint stem detection and crop-weed classification for plant-specific treatment in precision farming. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 8233–8238 (2018).
Funding
This research was funded by the National Natural Science Foundation of China (NSFC), grant number 62505316.
Author information
Authors and Affiliations
Contributions
C.M. writing—original draft preparation. Z.Z. and F.T. writing—review and editing. Y.H. funding acquisition. C.Y. supervision. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ma, C., Zhang, Z., Tian, F. et al. Plant growth point localization via epoch-based prior annealing. Sci Rep 16, 4994 (2026). https://doi.org/10.1038/s41598-026-35009-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-35009-3










