OR-FCOS: an enhanced fully convolutional one-stage approach for growth stage identification of Oudemansiella raphanipes

Fang, Runze; Huang, Huamao; Guo, Nuoyan; Wei, Haichuan; Wang, Shiyi; Hu, Haiying; Liu, Ming

doi:10.1038/s41598-025-09303-5

Download PDF

Article
Open access
Published: 15 July 2025

OR-FCOS: an enhanced fully convolutional one-stage approach for growth stage identification of Oudemansiella raphanipes

Runze Fang^1,2,
Huamao Huang³,
Nuoyan Guo⁴,
Haichuan Wei¹,
Shiyi Wang¹,
Haiying Hu⁵ &
…
Ming Liu²

Scientific Reports volume 15, Article number: 25576 (2025) Cite this article

757 Accesses
Metrics details

Subjects

Abstract

Accurate identification of Oudemansiella raphanipes growth stages is crucial for understanding its development and optimizing cultivation. However, deep learning methods for this task remain unexplored. This paper introduces OR-FCOS, an enhanced fully convolutional one-stage (FCOS) approach designed to improve accuracy and efficiency in identifying these growth stages. We constructed the ORaph8K dataset, containing 8,000 images of Oudemansiella raphanipes at different growth stages, used for training and validation. The OR-FCOS uses the MobileNetV3-Large backbone with an efficient multi-scale attention (EMA) module, improving feature extraction efficiency without sacrificing accuracy. A neural architecture search (NAS)-enhanced FCOS decoder replaces both the traditional feature pyramid networks (FPN) and prediction head in FCOS, optimizing feature fusion and prediction. Integrating the complete intersection over union (CIoU) loss function addresses standard IoU limitations by factoring in aspect ratio and bounding box center distance. Channel pruning further reduces the decoder’s parameters, decreasing model size and computational requirements while maintaining precision. The enhanced algorithm achieved a mean average precision (mAP) of 89.4% ($\hbox {mAP}_{50}$) and 78.3% ($\hbox {mAP}_{50:95}$), while the number of model parameters was reduced to 9.9 M, the model size was reduced to 40.1 MB, and the number of floating point operations per second (FLOPs) was reduced to 31.2 G. These results show that OR-FCOS accurately and efficiently identifies the growth stages of Oudemansiella raphanipes. By installing cameras in cultivation facilities, our algorithm enables automated and real-time monitoring, thereby supporting large-scale factory-based production of the fungus.

Outdoor RGB and Point Cloud Depth Dataset for Palm Oil Fresh Fruit Bunch Ripeness Classification and Localization

Article Open access 24 April 2025

Oil Palm Fruits Dataset in Plantations for Harvest Estimation Using Digital Census and Smartphone

Article Open access 10 June 2025

Vegetable disease detection using an improved YOLOv8 algorithm in the greenhouse plant environment

Article Open access 21 February 2024

Introduction

With improvements in socioeconomic levels, edible fungi have become widely recognized by consumers in the Chinese market because of their unique nutritional value and taste¹. In recent years, Oudemansiella raphanipes has become an important pillar industry for increasing the income of farmers in many regions. As consumer demand for Oudemansiella raphanipes continues to increase, factory-based cultivation has gradually become mainstream. The growth cycle of Oudemansiella raphanipes is relatively short (three to four months)², necessitating prompt harvesting upon reaching maturity. However, traditional methods for identifying the growth stage of Oudemansiella raphanipes rely mainly on manual monitoring, and this approach is inadequate in terms of accuracy and efficiency, thereby failing to meet the needs of modern large-scale agricultural production. Moreover, deep learning techniques have not yet been applied to identify the growth stages of Oudemansiella raphanipes. Therefore, the goal of this study was to develop an efficient model for identifying the growth stage of Oudemansiella raphanipes.

The identification of the growth stage of Oudemansiella raphanipes can be considered an object detection task. With the advancement of deep learning, anchor-based detection algorithms have been extensively applied to a variety of crops, including grains, fruits, and vegetables^3,4,5,6. Zhang et al. (2022) utilized the improved Faster region-based convolutional neural network (R-CNN) to detect the developmental stages of rice spikes³. Yadav et al. (2022) deployed the YOLOv5 algorithm to detect volunteer cotton plants within cornfields across critical growth phases⁴. Zha et al. implemented a model based on the RetinaNet-based adaptive training sample selection (ATSS) model to recognize different growth stages of grass mushrooms⁵. Almalky and Ahmed (2023) employed deep learning architectures, such as RetinaNet with ResNet-101-FPN, to detect and classify the growth stages of Consolida regalis weed⁶. However, anchor-based algorithms, such as Faster R-CNN, YOLOv5 and RetinaNet, have notable limitations. These methods depend on predefined anchor boxes, complicating the model and poorly accommodating diverse object sizes and shapes. This approach also results in a disparity between positive and negative samples, with most anchors covering merely the background, skewing the training process⁷. Additionally, manually adjusting anchor parameters requires substantial expertise, prolonging the tuning process⁸.

In recent years, anchor-free object detection methods have been widely applied in the field of object detection. These methods, which do not require predefined anchor boxes, offer reduced model complexity and enhanced detection stability⁸. For example, Liu et al. (2022) developed TomatoDet for the robust detection of tomatoes at different growth stages in greenhouses by integrating an attention mechanism and a novel circle representation into the CenterNet backbone⁹. Xie et al. (2023) developed FCOS-FL, a model for detecting various categories of litchi leaf diseases and pests. This model is particularly effective with hard-to-detect pests such as Mayetiola sp. and litchi algal spots¹⁰. Among the anchor-free object detection methods, the fully convolutional one-stage object detector (FCOS) is an innovative algorithm that enables direct object classification and location regression on feature maps¹¹. This approach enhances both efficiency and accuracy. In this model, ResNet-50 is deployed as the backbone, and three parallel subnetworks are incorporated as heads for classification, centerness prediction, and bounding box regression, streamlining the structure. In addition, a three-level FPN structure is implemented in FCOS. The FPN effectively integrates low-dimensional global information with high-dimensional local details and is responsible for distinguishing overlapping predicted bounding boxes¹¹. The introduction of a centerness prediction subnetwork improves localization accuracy by focusing on the central areas of targets¹¹. The loss function combines the classification loss, which uses the focal loss to handle class imbalance, the centerness loss, which implements the binary cross-entropy loss to improve the accuracy of the predicted centerness, and the regression loss, which utilizes the intersection over union (IoU) to increase the precision of the predicted bounding boxes¹¹. This comprehensive approach optimizes performance and precision in predicting object locations. FCOS stands out among anchor-free models for its simplified design, reduced number of hyperparameters, and ease of tuning. Moreover, FCOS demonstrates superior flexibility and generalizability and effectively handles targets of various sizes and shapes, with notable performance in detecting small objects. Overall, FCOS shows outstanding performance and efficiency in object detection owing to its unique design and optimization strategies.

However, despite the effectiveness of FCOS and other deep learning-based object detection methods, these models often require significant computational resources, making them challenging to deploy on devices with limited processing power. This limitation is particularly relevant in agricultural applications, as onsite real-time analysis is crucial. To address this issue, model compression techniques such as pruning have become indispensable tools in optimizing deep learning models for practical deployment^12,13. For example, Liu et al. (2022) explored discrimination-aware network pruning, focusing on retaining discriminative features during compression to ensure minimal performance loss with deep models¹⁴. Furthermore, Guo et al. (2021) developed a progressive channel pruning approach, enabling gradual model compression to achieve the desired compression ratios with minimal impact on model performance¹⁵.

Targeting existing challenges, it is necessary to accurately identify the growth stages of Oudemansiella raphanipes under constrained conditions. This study developed the ORaph8K image dataset with comprehensive annotations. We also proposed the OR-FCOS model, a lightweight framework for precise growth stage identification. Our OR-FCOS algorithm is specifically optimized for deployment on edge devices such as Jetson Nano. The main contributions of this article are as follows: Firstly, the OR-FCOS model incorporates the MobileNetV3-Large backbone with the EMA module, effectively replacing the more computationally demanding ResNet-50 in the FCOS framework. Secondly, The model substitutes the conventional FPN with NAS-FPN and introduces NAS-FCOS-Head, refining the processes of feature fusion and prediction. Thirdly, By integrating the CIoU loss function, the model takes into account the aspect ratio consistency and the distance between bounding box centers. Fourthly, A channel pruning technique is utilized to remove redundant parameters from the NAS-FPN and NAS-FCOS-Head components, reducing model size and computational requirements without sacrificing accuracy.

Materials and methods

Data collection

The image data were sourced from Oudemansiella raphanipes grown under real environmental conditions. Smartphones with cameras (iPhone 12¹⁶, OPPO A3¹⁷, Honor 9i¹⁸ and Huawei Nova 5 Pro¹⁹) were used for sampling. The sampled images of Oudemansiella raphanipes were taken from random angles (images were manually captured from various positions around the plant to ensure random shooting angles), against diverse backgrounds, and under various lighting conditions (we took photos of the plants at different times of the day to create a diverse range of illumination scenarios) at different growth stages to increase the diversity of the collected data. Every image contains multiple Oudemansiella raphanipes samples at different stages of growth. After eliminating similar images, the dataset includes a total of 8,000 images. We then used a Python program to randomly split the images into training, validation, and test sets with an 8:1:1 ratio. We present some representative images of Oudemansiella raphanipes at different growth stages within the dataset (Fig. 1). The dataset was annotated via the ISAT_with_segment_anything tool available on GitHub²⁰, and the images were classified into four stages of growth on the basis of their developmental progression: stage 1 (seedling stage), stage 2 (growing stage), stage 3 (maturity stage), and stage 4 (flowering stage). In the agricultural production process, Oudemansiella raphanipes is typically harvested during the third stage (maturity stage). Harvesting too early (e.g., in the first and second stages) can result in lower quality and reduced yield, while harvesting too late (e.g., in the fourth stage) may lead to degradation of the product. Understanding and accurately determining these growth stages are crucial for optimizing both quality and productivity. The resolution of the images in the collected dataset varied. Table 1 presents the properties of the collected dataset.

Table 1 Collected dataset properties.

Full size table

In real cultivation environments, Oudemansiella raphanipes is typically planted densely, which often results in the occlusion of the mushroom caps or stipes. To address this issue, we distinguished between the caps and stipes during the dataset annotation process. This differentiation enhances the model’s performance in scenarios where occlusion occurs, allowing for more accurate monitoring and analysis of the growth stages of Oudemansiella raphanipes.

Data augmentation

In the data augmentation phase of our preprocessing pipeline, we enhanced the robustness and generalizability of our model by employing three specific techniques tailored for Oudemansiella raphanipes: HSV random augmentation, random choice resizing, and random flipping. HSV random augmentation was applied with the parameters H_delta=5, S_delta=30, and V_delta=30, introducing subtle variations in colour that mimic natural differences. Random choice resize was utilized with scales set to [(1600, 960), (1333, 640), (1333, 800), (800, 800), (640, 640)], allowing for the simulation of various distances and perspectives. Finally, random flips were implemented with a probability of 0.5 to reflect the natural randomness of orientations in the wild. These methods, selected for their effectiveness in preliminary experiments, enabled the creation of a robust and versatile dataset conducive to developing a generalized model that is less prone to overfitting.

OR-FCOS model

The network architecture of the OR-FCOS model is illustrated (Fig. 2).

In complex agricultural environments, traditional object detection algorithms such as FCOS often encounter challenges including high computational demands, limited adaptability to varying object scales, and reduced detection accuracy. These limitations hinder the efficient detection of the growth stage of Oudemansiella raphanipes. To address these issues, we propose the OR-FCOS algorithm tailored for accurately identifying the growth stage of Oudemansiella raphanipes. The OR-FCOS model introduces several key innovations: (1) It integrates the MobileNetV3-Large backbone with EMA module, replacing the computationally intensive ResNet-50 within the FCOS framework, thereby enhancing feature extraction efficiency and maintaining high accuracy. (2) The traditional FPN is supplanted with NAS-FPN, and NAS-FCOS-Head is introduced, optimizing the feature fusion process and prediction heads. (3) The CIoU loss function is incorporated within NAS-FCOS-Head structure, addressing the shortcomings of standard IoU loss by considering aspect ratio consistency and the distance between bounding box centers. (4) A channel pruning strategy is employed to eliminate redundant parameters in the NAS-FPN and NAS-FCOS-Head components, reducing the model size and computational demands without compromising detection accuracy. These modifications enable the OR-FCOS model to achieve a balanced performance, ensuring efficient and accurate detection of the growth stage of Oudemansiella raphanipes in complex environments.

MobilenetV3 backbone with EMA module

In the practical production process for the efficient detection of the growth stage of Oudemansiella raphanipes, computational resources are often limited. Therefore, algorithms with fewer parameters and lower computational overhead are more suitable for such applications. To address the challenges associated with the high computational demand of the FCOS algorithm and its limitations in real-time execution on edge computing devices, we selected MobileNetV3-Large to replace ResNet-50²² within the FCOS framework. As a lightweight convolutional neural network architecture, MobileNetV3 is designed to achieve faster and more efficient feature extraction on mobile devices²³. It offers a good trade-off between computational efficiency and accuracy, enhancing the performance of its predecessors, MobileNet²⁴ and MobileNetV2²⁵. The adoption of MobileNetV3-Large in the FCOS framework improves the efficiency of the model, reducing the need for high computational resources.

However, simply replacing ResNet-50 with MobileNetV3 may lead to a decrease in accuracy. To maintain detection performance, we introduce the EMA module²⁶ into the original MobileNetV3’s inverted residual block, replacing the existing squeeze-and-excitation (SE) attention module. The selection of the EMA module over the SE module is underpinned by several theoretical considerations. Firstly, the EMA module is specifically designed to capture both local and global feature interactions without dimensionality reduction. This enhance the model’s ability to extract comprehensive features from the data²⁶. In contrast, the SE module primarily focuses on channel-wise attention, which may limit its capacity to model complex spatial dependencies essential for accurately identifying diverse growth stages of Oudemansiella raphanipes. Secondly, the multi-scale architecture of the EMA module allows for efficient processing of features at various spatial resolutions²⁶, which is crucial for recognizing subtle morphological variations across different growth stages. This multi-scale processing capability ensures that the model remains robust and sensitive to nuanced patterns that may be overlooked by single-scale attention mechanisms like SE. Additionally, the EMA module achieves this enhanced feature extraction without a substantial increase in parameter count or computational overhead. This modification aligns with our objective of constructing a lightweight, efficient, and accurate model suitable for the identification of growth stage of Oudemansiella raphanipes.

NAS-FPN and NAS-FCOS-head

To accurately identify the growth stages of Oudemansiella raphanipes, it is crucial to precisely predict mushroom instances across different scales. We utilize the architecture of NAS-FPN and NAS-FCOS-Head to increase the detection accuracy and robustness of our model across different environmental conditions and mushroom sizes.

Traditional FPNs enhance detection performance by merging high-level, semantically rich features with low-level, detailed features. NAS-FPN improves object detection by optimizing the integration of multi-scale features²⁷. It uses deformable convnets v2 (DCNv2)²⁸, which adjusts convolutional kernels to better fit the geometric variations of objects, enhancing the network’s ability to capture complex shapes and sizes. Additionally, NAS-FPN employs skip connections to preserve information across feature levels, which is crucial for maintaining details. NAS-FPN is thus particularly effective in capturing subtle changes in mushroom growth.

In FCOS, the prediction head maps each feature in the pyramid to the output using four 3$\times$3 convolutions. The NAS-FCOS-Head offers structural improvements over the traditional FCOS-Head²⁷. The inclusion of DCNv2 allows the NAS-FCOS-Head to dynamically adapt its convolutional filters to the shape and orientation of target objects. Additionally, the integration of 1$\times$1 convolution adjusts channel capacities and adds non-linearity, enriching feature representation without substantial computational overhead. This architecture can provide more accurate and robust performance.

Improved loss function

For the precise detection of the growth stage of Oudemansiella raphanipes, accurate prediction of the bounding box is essential. In FCOS, the IoU loss function is adopted in its head structure for bounding box regreesion. The IoU refers to the ratio of the intersection to the union between the predicted bounding box and the ground truth bounding box. The IoU value ranges from 0 to 1, where 1 indicates perfect overlap and 0 indicates no overlap. The IoU loss is 1 minus the IoU value, that is:

$$\begin{aligned} IoU \, Loss = 1 - IoU \end{aligned}$$

(1)

However, the IoU loss has several limitations. For example, when there is no overlap between the predicted and ground truth boxes, the IoU value is 0, leading to a vanishing gradient, which prevents the model from learning from such incorrect predictions. Additionally, the IoU loss is not sufficiently sensitive to changes in bounding box size and does not account for the distance between the centres of the bounding boxes, which is crucial for localization accuracy.

To address these issues, the CIoU loss was introduced in the NAS-FCOS-Head structure. In the CIoU loss function, two terms are added to the IoU loss function, the aspect ratio consistency and the distance between centre points, with the formula²⁹:

$$\begin{aligned} CIoU \, Loss = 1 - IoU + \frac{\rho ^2(b, b^{gt})}{c^2} + \alpha v \end{aligned}$$

(2)

where $\rho (b, b^{gt})$ represents the Euclidean distance between the centroids of the predicted bounding box $b$ and the ground truth bounding box $b^{gt}$, $c$ denotes the diagonal length of the smallest box that encloses both bounding boxes, $v$ represents the consistency of the aspect ratios between the predicted and actual bounding boxes, and $\alpha$ serves as a weighting factor to equilibrate the consistency of the aspect ratios with the central point distance. Moreover, v and $\alpha$ are calculated as follows:

$$\begin{aligned} v= & \frac{4}{\pi ^2}\left( \arctan \frac{w^{gt}}{h^{gt}}-\arctan \frac{w}{h}\right) ^2 \end{aligned}$$

(3)

$$\begin{aligned} \alpha= & \frac{v}{(1-IoU)+v} \end{aligned}$$

(4)

where w and h represent the width and height of the predicted bounding box, respectively, whereas $w^{gt}$ and $h^{gt}$ represent those of the ground truth bounding box. The CIoU loss function is more comprehensive in its design than the IoU loss function, focusing not only on the degree of overlap between the bounding boxes but also on ensuring that the centre of the predicted box is as close as possible to the centre of the ground truth box while maintaining a similar aspect ratio as the predicted box. In this way, the CIoU loss can still provide effective gradients when there is no overlap, helping the model correct the position and shape of the predicted bounding box, enabling more precise bounding box regression even in complex agricultural environments.

Channel pruning strategy

In object detection, optimizing the balance among model size, computational efficiency, and accuracy is crucial, especially for deployment on resource-constrained devices. In traditional model designs, either efficiency or accuracy is often compromised. MobileNet²⁴ introduced a width multiplier parameter that uniformly scales the network’s width (i.e., feature channels) to obtain a trade-off between accuracy and computational cost. By carefully adjusting the width multiplier, MobileNet achieves efficient inference with markedly fewer parameters and lower computational requirements compared to standard architectures.

Inspired by the design of MobileNet, we propose a targeted compression methodology that compresses the model by specifically reducing the feature channel count in NAS-FPN and NAS-FCOS-Head. By decreasing the number of feature channels in these components, which are responsible for a large portion of the model’s parameters and computational complexity, we can effectively reduce the overall resource demands. This approach ensures that the model maintains its operational performance with minimal degradation in accuracy while achieving greater efficiency suitable for deployment on resource-constrained devices.

Evaluation metrics

To accurately evaluate the performance of the model, this study adopts commonly used evaluation metrics in object detection algorithms: $\text {mAP}_{50}$ and $\text {mAP}_{50:95}$. Their definitions are as follows:

$$\begin{aligned} \text {mAP}_{50}= & \frac{1}{C} \sum _{c=1}^{C} \text {AP}_{50}^{(c)} \end{aligned}$$

(5)

$$\begin{aligned} \text {mAP}_{50:95}= & \frac{1}{10} \sum _{i=0}^{9} \text {AP}_{50 + 5i}^{(c)} \end{aligned}$$

(6)

where C is the number of categories, $\text {AP}_{50}^{(c)}$ represents the average precision for category c at an IoU threshold of 0.50, and $\text {mAP}_{50:95}$ is the mean average precision averaged over IoU thresholds from 0.50 to 0.95 in increments of 0.05. A higher mAP value indicates better model performance, as it reflects the model’s ability to accurately detect and classify objects in various conditions.

Additionally, the complexity of the model is evaluated using the following metrics:

$$\begin{aligned} \text {FLOPs}= & 2 \sum _{l=1}^{L} \left( C_{\text {in}, l} \times C_{\text {out}, l} \times K_l^2 \times H_l \times W_l \right) \end{aligned}$$

(7)

$$\begin{aligned} \text {Number of Parameters}= & \sum _{l=1}^{L} \left( C_{\text {in}, l} \cdot C_{\text {out}, l} \cdot K_l^2 + C_{\text {out}, l} \right) \end{aligned}$$

(8)

where L is the number of network layers, $K_l$ is the size of the convolution kernel, $C_{\text {in}, l}$ and $C_{\text {out}, l}$ represent the number of input and output channels of the l-th layer, respectively, and $H_l$, $W_l$ are the height and width of the l-th layer’s output feature map, respectively. The term $C_{\text {out}, l}$ accounts for bias parameters. In terms of model complexity, fewer FLOPs and a smaller number of parameters generally indicate a more efficient model, which can lead to faster inference times and lower computational costs.

Furthermore, the model’s deployment efficiency is assessed by measuring the size of the trained model weights and the GPU memory consumed during the inference phase to ensure compatibility with hardware constraints. Detection speed is evaluated in FPS. A higher FPS means faster processing, which is beneficial for real-time applications.

Training details

In our experiment, we employed the AdamW³⁰ optimizer with a learning rate of 0.0001 and a weight decay of 0.05, incorporating gradient clipping³¹ with a maximum norm of 0.01 for stability. Compared to the traditional Adam³² optimizer, AdamW more effectively handles weight decay, reduces overfitting, and enhances the model’s generalization capabilities. We utilized a cosine annealing strategy³³ to schedule both the learning rate and momentum.

During the first 10 epochs, the learning rate increased from 0.0001 to 0.001, while momentum rose from 0 to a range between 0.85 and 0.95. This process enables the model to rapidly learn features during the initial training phase. Over the next 14 epochs, the learning rate was reduced to $5 \times 10^{-7}$, and the momentum was adjusted to 1. This phase aims to fine-tune the model parameters, reduce fluctuations in the loss function and enhance the model’s stability. From epoch 24 onwards until epoch 100, we kept the learning rate constant. This decision is based on experimental observations of the model’s performance stability. It helps prevent potential training instability or overfitting.

The computing resources for this experiment are described in Table 2. MMDetection³⁴ and MMPretrain³⁵ were employed to train and test our model. The metrics for the experiments in this section are calculated on the test set.

Table 2 Computational resources.

Full size table

Comparison with other methods

We selected several leading models to compare their performance with our improved model. These models have been chosen due to their proven effectiveness and widespread adoption. The selected models include Faster R-CNN⁷, RetinaNet³⁶, YOLOv11n³⁷, RetinaNet-based ATSS³⁸, CenterNet³⁹, FCOS-based AutoAssign⁴⁰, FCOS and YOLOv8n⁴¹.

Faster R-CNN is a widely adopted two-stage detector known for its high accuracy in object detection tasks⁷. RetinaNet introduced focal loss to effectively address class imbalance in single-stage detectors³⁶. YOLOv11n offers real-time detection capabilities with a lightweight architecture suitable for speed-critical applications³⁷. ATSS builds on RetinaNet by incorporating adaptive training sample selection to enhance bounding box regression³⁸. CenterNet utilizes keypoint estimation for precise object localization, improving detection accuracy³⁹. AutoAssign enhances the FCOS framework with dynamic assignment mechanisms for better training adaptability⁴⁰. FCOS represents anchor-free approaches, providing simplicity and computational efficiency in object detection. YOLOv8n advances the YOLO series with improved performance and flexibility for various detection scenarios⁴¹.

Our model is expected to outperform these models in the task of identifying the growth stages of Oudemansiella raphanipes for several reasons. Firstly, our model is meticulously designed, integrating NAS-FPN and NAS-FCOS-Head, and enhances detection accuracy by employing techniques such as EMA attention and CIoU Loss. Additionally, our model utilizes a more lightweight backbone and further reduces the number of parameters and overall size through channel pruning, making it more lightweight compared to other models. These advantages of our model will be validated in the subsequent Results section.

Results

Ablation test

To enhance the performance of identifying the growth stage of Oudemansiella raphanipes, three improvements were applied to the original FCOS algorithm (A: introduce MobileNetV3 with EMA module as the backbone; B: incorporate NAS-FPN and NAS-FCOS-Head; C: replace IoU loss function with CIoU loss function). However, the specific contributions of the improvement mechanisms to the detection performance of the OR-FCOS model remain unclear. To assess the distinct influence of each element on the model’s performance, a series of ablation tests was conducted, and the findings are detailed in Table 3. We present the model’s performance during training of different tests (Fig. 3).

Table 3 Ablation tests.

Full size table

Without MobileNetV3 and using ResNet-50 instead, the $\hbox {mAP}_{50}$ and $\hbox {mAP}_{50:95}$ achieve 89.4% and 78.3%, respectively, but the parameter count increases to 32.4M and the FLOPs to 94.8G. The inclusion of the lightweight EMA module slightly improves precision, as seen in the full model’s $\hbox {mAP}_{50}$ of 89.4% and $\hbox {mAP}_{50:95}$ of 78.3% compared to 89.2% and 78.1% when this component is removed. The removal of NAS-FPN and NAS-FCOS-Head results in a noticeable decline in performance, with $\hbox {mAP}_{50}$ dropping to 87.3% and $\hbox {mAP}_{50:95}$ to 74.9%. The CIoU loss function slightly boosts performance without adding extra Params and FLOPs, with the full OR-FCOS model achieving a $\hbox {mAP}_{50}$ of 89.4% and $\hbox {mAP}_{50:95}$ of 78.3% compared to 88.8% and 77.5% when CIoU is excluded.

Comparison of channel pruning strategies

To verify the effectiveness of the pruning strategy, comparative experiments on decoders with different channel count were conducted, where the decoder specifically includes the neck network and head network. By implementing different pruning ratios in our improved FCOS network, we aimed to find an optimal pruning rate to achieve a better balance between reducing computational resource consumption and maintaining model performance. We present the comparative results of the distinct pruning strategies (Table 4). The model’s $\hbox {mAP}_{50}$ and number of parameters for different feature channel count are shown (Fig. 4).

Table 4 Effect of decoder channel count on model precision and efficiency.

Full size table

As the decoder feature channel count decreases, the model’s mAP value slightly decreases, whereas the number of parameters, model weight size, and number of FLOPs substantially decrease. The model achieves the highest mAP value without pruning but also has the highest resource consumption.

When the feature channel count reaches 192, the model’s mAP value remains close to that of the model with a 256-channel width, but the number of parameters, model weight size, and number of FLOPs decrease, indicating that with a 192-channel width, the model achieves a better balance between performance and resource consumption. However, when the feature channel count is less than 160, the model’s performance begins to decline dramatically. In particular, when the feature channel count reaches 64, the decline in model performance becomes very apparent, and the advantage of saving resources is not sufficient to compensate for the performance loss. In summary, under the conditions of this experiment, the results suggest that setting the decoder feature channel count to 192 can enable a better trade-off between substantially reducing the model’s computational resource consumption and maintaining high detection accuracy. Through this method, we successfully achieve effective model compression while minimizing the impact on performance.

Comparison with other methods

We compared the performance of various leading models with the performance of our improved model. The results are concisely presented in Table 5. The inference speed and the CUDA memory usage during inference of each model was tested on a single NVIDIA RTX A4000 GPU.

Table 5 Comparison of Oudemansiella raphanipes identification results of different models.

Full size table

Our model achieved an $\hbox {mAP}_{50}$ of 89.4% and an $\hbox {mAP}_{50:95}$ of 78.3%, outperforming alternative models such as Faster R-CNN, RetinaNet, RetinaNet-based ATSS, CenterNet, FCOS-based AutoAssign, and FCOS in accuracy. In addition, our model has an efficient design, with only 9.9 M parameters and a size of 40.1 MB, demonstrating notable efficiency. In terms of processing speed, our model reached 24.0 FPS, with only YOLOv8n and YOLOv11n showing a higher speed. However, it is crucial to note that YOLOv8n and YOLOv11n, despite their higher speed of 102.0 FPS and 75.8 FPS, exhibited much lower accuracy than our model. Faster R-CNN and CenterNet also demonstrated good accuracy, with $\hbox {mAP}_{50}$ scores of 85.0% and 84.5%, respectively, and $\hbox {mAP}_{50:95}$ scores of 72.0% and 73.7%. However, they require more computational resources than our model, as highlighted by Faster R-CNN’s 41.8 million parameters and a 167.3 MB weight file. This comparison underscores our model’s efficiency, offering a superior trade-off of high accuracy and rapid processing with substantially lower resource consumption.

We illustrate the performance of various anchor-free detection models (Fig. 5). Regarding the results obtained from the CenterNet algorithm, incorrect identifications occur at the locations marked by the blue arrow, diverging from the expected empirical outcomes (Fig. 5a). Notably, for the FCOS-based AutoAssign algorithm, a blue arrow in the top left corner indicates a false detection result (Fig. 5b). In contrast, the FCOS algorithm and our proposed algorithm successfully detects all instances of Oudemansiella raphanipes in the image without any false detections (Fig. 5c and Fig. 5d), demonstrating the efficacy of our improved detection method. More visual comparison is illustrated in Fig. 6.

The OR-FCOS model exhibit lower confidence score than the FCOS model in Fig. 5. The main reason for the difference in confidence scores lies in the optimization of the model architecture. Our OR-FCOS builds upon the original FCOS architecture by incorporating optimized feature extraction modules and enhanced loss functions. These improvements increase the model’s localization precision and classification accuracy while making the distribution of confidence scores more conservative to reduce the likelihood of false positives. Through this architectural optimization, OR-FCOS is able to maintain high detection accuracy while effectively controlling the false positive rate, thereby outperforming the traditional FCOS detector in overall performance.

Discussion

In this study, the dataset exhibits a certain degree of class imbalance, such as some classes having fewer instances than other classes. Such imbalance can lead the model to be biased towards the majority class, diminishing its ability to accurately identify minority classes. Consequently, it is essential to compare the precision metrics across different categories.

The precision of each category at a 95% IoU threshold is shown in Table 6. The identification performance varies across different growth stages and categories. Specifically, within each growth stage, category a (caps) consistently exhibits higher precision than category b (stipes). For example, in the first growth stage, the precision for the cap (1a) reaches 0.851, whereas the stipe (1b) is only 0.667. This pattern remains consistent in subsequent growth stages, with the cap precision in the second stage at 0.905 and the stipe at 0.703; in the third stage, the figures are 0.884 versus 0.705; and in the fourth stage, 0.871 versus 0.674. The persistently low precision in stipe identification indicates inherent challenges in accurately identifying the growth stages of stems across all stages.

Table 6 Model precision for different categories.

Full size table

The confusion matrix for our proposed OR-FCOS is shown in Fig. 7. From the matrix, we observe that the misclassification rate is high when classifying the stipes. For example, there is a 10% misclassification rate of class 1b being identified as 2b, and a 9% rate of class 3b being misclassified as 2b. This difficulty arises from the visual similarity of stems at different growth stages, which increases the complexity of the feature extraction process.

Additionally, the imbalance in the number of stipe samples across various growth stages in the training dataset may lead to decreased model performance in distinguishing between different growth stages.

The practical impact of these misclassifications is notable. Erroneously identifying the stipe of a particular growth stage as belonging to another stage (for example, classifying 1b as 2b, 3b, or 4b) may result in incorrect assessments of the growth process, thereby affecting critical decisions such as determining the harvest time. Our error analysis highlights that the primary causes of misclassification are the visual similarities between stems at different growth stages and the imbalance in sample numbers across growth stages in the training data.

To mitigate these challenges, future work plans to adopt multiple strategies, including addressing class imbalance by increasing the number of stipe samples in each growth stage, employing advanced feature extraction techniques to better distinguish stems across different growth stages, and implementing class balancing methods such as oversampling or weighted loss functions.

Conclusion

Identifying the growth stage of Oudemansiella raphanipes in complex agricultural environments is crucial for optimizing cultivation practices and enhancing yield management. In this study, we constructed an Oudemansiella raphanipes image dataset named ORaph8K with annotations of different growth stages. We developed the OR-FCOS model to address issues such as high computational demands and accuracy reduction faced by traditional detection algorithms. By integrating MobileNetV3 with EMA module as the backbone, incorporating NAS-FPN and NAS-FCOS-Head, adding CIoU loss function, and utilize channel pruning strategy, the OR-FCOS model substantially improved the accuracy and efficiency of the identification of Oudemansiella raphanipes growth stages. The experimental results demonstrate that the OR-FCOS model not only achieves higher accuracy, with mAP scores of 89.4% for $\hbox {mAP}_{50}$ and 78.3% for $\hbox {mAP}_{50:95}$, but also enhances operational efficiency. This is evidenced by a reduction in the number of model parameters to 9.9 M, model size to 40.1 MB, and CUDA memory usage to 255 MB. Moreover, the number of FLOPs was decreased to 31.2 G, and the inference speed was improved to 24.0 FPS. These enhancements make the model highly suitable for resource-constrained environments, aligning with production practices.

Data availibility

The full dataset for the current study is available in the OR-FCOS repository, accessed at: https://github.com/ftfrz/OR-FCOS/tree/main.

Code availability

The code for the current study is available on reasonable request.

References

Zhao, Y., Wang, Y., Li, K. & Mazurenko, I. Effect of Oudemansiella raphanipies powder on physicochemical and textural properties, water distribution and protein conformation of lower-fat pork meat batter. Foods 11, 2623. https://doi.org/10.3390/foods11172623 (2022).
Article CAS PubMed Central PubMed Google Scholar
Du, N., Hu, H., Xie, Y., Yong, T., Mo, W., Liang, X. & Zhuo, L. Research Progress on Oudemansiella raphanipes. Edible Fungi of China 39, 1-5+10, https://doi.org/10.13629/j.cnki.53-1054.2020.10.001 (2020).
Zhang, Y., Xiao, D., Liu, Y. & Wu, H. An algorithm for automatic identification of multiple developmental stages of rice spikes based on improved faster R-CNN. Crop J. 10, 1323–1333. https://doi.org/10.1016/j.cj.2022.06.004 (2022).
Article CAS Google Scholar
Yadav, P. et al. Assessing the performance of YOLOv5 algorithm for detecting volunteer cotton plants in corn fields at three different growth stages. Artif. Intell. Agric. 6, 292–303. https://doi.org/10.1016/j.aiia.2022.11.005 (2022).
Article Google Scholar
Zha, L. et al. An anchor-free network for detection of volvariella volvacea growth status. Acta Edulis Fungi 29, 31–38, https://doi.org/10.16488/j.cnki.1005-9873.2022.02.004 (2022).
Almalky, A. & Ahmed, K. Deep learning for detecting and classifying the growth stages of consolida regalis weeds on fields. Agronomy 13, 934. https://doi.org/10.3390/agronomy13030934 (2023).
Article Google Scholar
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031 (2017).
Article PubMed Google Scholar
Liu, S., Zhou, H., Li, C. & Wang, S. Analysis of anchor-based and anchor-free object detection methods based on deep learning. In 2020 IEEE International Conference on Mechatronics and Automation (ICMA), 1058–1065, https://doi.org/10.1109/ICMA49215.2020.9233610 (2020).
Liu, G. et al. Tomatodet: Anchor-free detector for tomato detection. Front. Plant Sci. 13, 942875. https://doi.org/10.3389/fpls.2022.942875 (2022).
Article PubMed Central PubMed Google Scholar
Xie, J. et al. Detection of Litchi leaf diseases and insect pests based on improved fcos. Agronomy 13, 1314. https://doi.org/10.3390/agronomy13051314 (2023).
Article Google Scholar
Tian, Z., Shen, C., Chen, H. & He, T. FCOS: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1922–1933. https://doi.org/10.1109/TPAMI.2020.3032166 (2022).
Article PubMed Google Scholar
Ghosh, S., Srinivasa, S., Amon, P., Hutter, A. & Kaup, A. Deep network pruning for object detection. In 2019 IEEE International Conference on Image Processing (ICIP), 3915–3919, https://doi.org/10.1109/ICIP.2019.8803505 (2019).
Muhawenayo, G. & Gkioxari, G. Compressed object detection. https://doi.org/10.48550/arXiv.2102.02896 (2021). ArXiv:2102.02896 [cs.CV].
Liu, J. et al. Discrimination-aware network pruning for deep model compression. IEEE Trans. Pattern Anal. Mach. Intell. 44, 4035–4051. https://doi.org/10.1109/TPAMI.2021.3066410 (2022).
Article PubMed Google Scholar
Guo, J., Zhang, W., Ouyang, W. & Xu, D. Model compression using progressive channel pruning. IEEE Trans. Circuits Syst. Video Technol. 31, 1114–1124. https://doi.org/10.1109/TCSVT.2020.2996231 (2021).
Article Google Scholar
Contributors. iPhone 12 (2020). https://support.apple.com/en-hk/111876.
Contributors. Oppo A3 (2023). https://www.oppo.com/en/smartphones/series-a/a3/specs/.
Contributors. HONOR 9i (2018). https://www.fonearena.com/huawei-honor-9i_8599.html#google_vignette.
Contributors. Nova 5 Pro (2019). https://www.gizmochina.com/product/huawei-nova-5-pro/.
Ji, S. & Zhang, H. Isat with segment anything: An interactive semi-automatic annotation tool (2023). https://github.com/yatengLG/ISAT_with_segment_anything. Updated on 2023-06-03.
Contributors. Paddlepaddle easydata (2022). https://github.com/PaddlePaddle/EasyData.
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778, https://doi.org/10.1109/CVPR.2016.90 (2016).
Howard, A. et al. Searching for MobileNetV3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 1314–1324, https://doi.org/10.1109/ICCV.2019.00140 (2019).
Howard, A. et al. MobileNets: Efficient convolutional neural networks for mobile vision applications, https://doi.org/10.48550/arXiv.1704.04861 (2017).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L. MobileNetV2: Inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4510–4520, https://doi.org/10.1109/CVPR.2018.00474 (2018).
Ouyang, D. et al. Efficient multi-scale attention module with cross-spatial learning. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5, https://doi.org/10.1109/ICASSP49357.2023.10096516 (2023).
Wang, N. et al. NAS-FCOS: Fast neural architecture search for object detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11940–11948, https://doi.org/10.1109/CVPR42600.2020.01196 (2020).
Zhu, X., Hu, H., Lin, S. & Dai, J. Deformable convnets V2: More deformable, better results, https://doi.org/10.48550/arXiv.1811.11168 (2018).
Zheng, Z. et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 52, 8574–8586. https://doi.org/10.1109/TCYB.2021.3095305 (2022).
Article PubMed Google Scholar
Loshchilov, I. & Hutter, F. Fixing weight decay regularization in Adam, https://doi.org/10.48550/arXiv.1711.05101 (2017).
Zhang, J., He, T., Sra, S. & Jadbabaie, A. Why gradient clipping accelerates training: A theoretical justification for adaptivity, https://doi.org/10.48550/arXiv.1905.11881 (2020).
Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization, https://doi.org/10.48550/arXiv.1412.6980 (2017).
Loshchilov, I. & Hutter, F. SGDR: Stochastic gradient descent with restarts, https://doi.org/10.48550/arXiv.1608.03983 (2016).
Chen, K. et al. Mmdetection: Open mmlab detection toolbox and benchmark, https://doi.org/10.48550/arXiv.1906.07155 (2019).
Contributors, M. Openmmlab’s pre-training toolbox and benchmark. https://github.com/open-mmlab/mmpretrain (2023).
Lin, T., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 318–327. https://doi.org/10.1109/TPAMI.2018.2858826 (2020).
Article PubMed Google Scholar
Jocher, G. et al. Ultralytics YOLO, version 8.0.0. https://github.com/ultralytics/ultralytics (2023).
Zhang, S., Chi, C., Yao, Y., Lei, Z. & Li, S. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9756–9765, https://doi.org/10.1109/CVPR42600.2020.00978 (2020).
Duan, K. et al. Centernet: Keypoint triplets for object detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 6568–6577, https://doi.org/10.1109/ICCV.2019.00667 (2019).
Zhu, B. et al. AutoAssign: Differentiable label assignment for dense object detection, https://doi.org/10.48550/arXiv.2007.03496 (2020).
Varghese, R. et al. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), 1–6, https://doi.org/10.1109/ADICS58448.2024.10533619 (2024).

Download references

Funding

Guangzhou Key Research and Development Program (2024B03J1359).

Author information

Authors and Affiliations

School of Future Technology, South China University of Technology, Guangzhou, 511442, Guangdong, China
Runze Fang, Haichuan Wei & Shiyi Wang
Guangdong Key Laboratory for New Technology Research of Vegetables, Vegetable Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640, Guangdong, China
Runze Fang & Ming Liu
School of Physics and Optoelectronics, South China University of Technology, Guangzhou, 510641, Guangdong, China
Huamao Huang
School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510641, Guangdong, China
Nuoyan Guo
School of Civil Engineering and Transportation, South China University of Technology, Guangzhou, 510641, Guangdong, China
Haiying Hu

Authors

Runze Fang
View author publications
Search author on:PubMed Google Scholar
Huamao Huang
View author publications
Search author on:PubMed Google Scholar
Nuoyan Guo
View author publications
Search author on:PubMed Google Scholar
Haichuan Wei
View author publications
Search author on:PubMed Google Scholar
Shiyi Wang
View author publications
Search author on:PubMed Google Scholar
Haiying Hu
View author publications
Search author on:PubMed Google Scholar
Ming Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, R.F., H.M.H., and M.L.; methodology, R.F. and H.M.H.; software, R.F.; validation, R.F., N.G., H.W., and S.W.; formal analysis, R.F. and H.M.H.; investigation, R.F. and N.G.; resources, H.M.H., H.Y.H., and M.L.; data curation, R.F. and N.G.; writing—original draft preparation, R.F.; writing—review and editing, H.M.H., H.Y.H., and M.L.; visualization, R.F. and N.G.; supervision, H.M.H.; project administration, H.M.H.; funding acquisition, H.M.H. and M.L. All the authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Huamao Huang or Ming Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Fang, R., Huang, H., Guo, N. et al. OR-FCOS: an enhanced fully convolutional one-stage approach for growth stage identification of Oudemansiella raphanipes. Sci Rep 15, 25576 (2025). https://doi.org/10.1038/s41598-025-09303-5

Download citation

Received: 19 January 2025
Accepted: 26 June 2025
Published: 15 July 2025
DOI: https://doi.org/10.1038/s41598-025-09303-5

Subjects

Abstract

Similar content being viewed by others

Outdoor RGB and Point Cloud Depth Dataset for Palm Oil Fresh Fruit Bunch Ripeness Classification and Localization

Oil Palm Fruits Dataset in Plantations for Harvest Estimation Using Digital Census and Smartphone

Vegetable disease detection using an improved YOLOv8 algorithm in the greenhouse plant environment

Introduction

Materials and methods

Data collection

Data augmentation

OR-FCOS model

MobilenetV3 backbone with EMA module

NAS-FPN and NAS-FCOS-head

Improved loss function

Channel pruning strategy

Evaluation metrics

Training details

Comparison with other methods

Results

Ablation test

Comparison of channel pruning strategies

Comparison with other methods

Discussion

Conclusion

Data availibility

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethics approval and consent to participate

Consent for publication

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links