Abstract
The importance of aquatic plants in aquatic ecosystems is drawing growing attention, and accurate species identification is essential for advancing intelligent and precise ecological monitoring. Traditional methods fall short in large-scale, real-time monitoring, and while YOLOv8 is effective, it lacks sufficient lightweight optimization for mobile devices, limiting its practical application. Existing lightweight models also face challenges in balancing accuracy and speed in complex environments, such as dense growth, similar species, and occlusions. This paper introduces APlight-YOLOv8n, an enhanced YOLOv8n-based approach designed to address these challenges using the Faster Detect and Universal Inverted Bottleneck (UIB) modules. Evaluated on an aquatic plant dataset, APlight-YOLOv8n outperforms YOLOv8n: the mean average precision (mAP50) increased to 74.4%, a 1.9% improvement; the number of parameters (Params) was reduced to 2.74 M, a 13.3% decrease; floating-point operations (FLOPs) dropped to 5.5G, a 32.9% reduction; and the inference speed (FPS) remained stable at 32.70. This model enables fast, accurate recognition in complex environments, providing efficient support for real-time field monitoring. In conclusion, APlight-YOLOv8n demonstrates superior performance in balancing accuracy and computational efficiency for aquatic plant detection and offers new insights for mobile ecological monitoring and broader smart environmental applications.
Introduction
The increasing global emphasis on sustainable ecosystem development and ecological governance has brought renewed attention to the ecological functions of aquatic plants1,2,3. Through photosynthesis, aquatic plants release oxygen and absorb carbon dioxide and other harmful gases, improving both water and air quality. They also absorb nutrients and organic pollutants from water bodies, which enhances water quality and provides habitats and food sources for fish, aquatic insects, and other organisms, supporting biodiversity and ecological stability4,5,6. However, the rapid spread of invasive aquatic plant species is becoming a major threat to aquatic ecosystems. These non-native species displace native plants, block waterways, disturb ecological balance, and cause long-term harm to aquatic communities7,8,9. Therefore, accurate identification of aquatic plant species is crucial for ecological conservation, effective water resource management, and the sustainable development of aquatic ecosystems.
Traditional identification of aquatic plants often depends on experienced botanical taxonomists, which is time-consuming and labor-intensive10. Its low monitoring frequency and delayed data updates make it difficult to meet the real-time decision-making needs of ecological conservation and water resource management11,12,13. In addition, climate change and evolving aquatic environments are causing rapid shifts in aquatic plant species and distributions, increasing the demand for convenient and efficient identification technologies. In recent years, advances in deep learning for object detection have made automated aquatic plant identification possible. For instance, Kabir et al. and Hao et al. successfully used deep learning models combined with remote sensing images and drone data to detect invasive aquatic plants like Eichhornia crassipes automatically7,14. Garcia-Ruiz et al. and Wang et al. employed models like YOLO and Faster R-CNN for the accurate identification of various aquatic weed species15,16. Bai and Bai proposed the EFL-DenseNet model, which uses web crawler technology to identify 51 species of aquatic plants17. D. Wang et al. developed the APNet-YOLOv8s model, achieving real-time classification and detection of 12 aquatic plant species in complex environments18. These studies show the strong application potential of deep learning in aquatic plant monitoring. However, most existing work focuses on improving model accuracy, while giving little attention to the deployment needs of lightweight models on mobile devices. Compared with fixed installations, deploying lightweight object detection models on mobile devices enables wider use and practical application. Therefore, developing lightweight aquatic plant recognition models for mobile platforms has important practical value.
Currently, the development of lightweight recognition models is still in an exploratory stage. MobileNet is one of the most widely used lightweight models and uses separable convolutions to simplify the network and reduce computation19,20,21,22. This lowers hardware requirements, making it suitable for deployment on mobile devices. The YOLO series has also introduced several lightweight detection models, including YOLOv3-Tiny23, YOLOv4-Tiny24, YOLOv5n25, and YOLOv8n26. However, lightweight models often face a trade-off between accuracy and efficiency. For example, on the COCO dataset, YOLOv8 achieves a mAP of 62.0%, while the lightest model in the series, YOLOv8n, reaches only 37.3%27,28. To overcome this limitation, many researchers have explored ways to balance accuracy and computational cost in lightweight models. For example, Zhang et al. proposed the RTSD-Net model based on YOLOv4-tiny, which enables real-time strawberry detection in field environments by progressively reducing network complexity and the number of FLOPs29. Lan et al. proposed the RICE-YOLO model, which significantly reduces Params by introducing a new network module, enabling real-time detection of rice ears from an unmanned aerial vehicle’s perspective30. Wan et al. proposed the DSC-YOLOv8n model, based on YOLOv8n, which reduces FLOPs by progressively replacing traditional convolution layers, facilitating real-time detection of flooded vehicles in urban environments31. Although previous research has improved the performance of lightweight models, most methods still improve one metric at the cost of another. For example, some models reduce FLOPs but increase the number of Params, while others compress Params but raise FLOPs. This makes it difficult to achieve a balance between computational efficiency and model size. Previous studies suggest that true lightweight optimization should reduce both FLOPs and Params count at the same time. Therefore, this paper introduces a new optimization approach that maintains detection accuracy while reducing both computational complexity and model size, improving the real-time performance of aquatic plant detection on mobile devices. This approach avoids the trade-offs caused by optimizing a single metric and provides a practical basis for applying lightweight models in complex ecological monitoring.
To address the aforementioned challenges, this study improves the YOLOv8n network structure to better meet the specific requirements of aquatic plant detection, using the dataset provided by Wang et al.18. The original YOLOv8n model has limitations in feature fusion and real-time detection, making it difficult to meet the dual requirements of high accuracy and high speed in aquatic plant recognition. This study improves YOLOv8n and develops the APlight-YOLOv8n model. By reducing both FLOPs and Params, the model improves inference efficiency while maintaining detection accuracy. The optimized model is suitable for lightweight deployment on mobile devices and meets the requirements for real-time and accurate detection in ecological monitoring and water management. In practical applications, higher detection accuracy improves the reliability of monitoring results, while faster detection speed determines whether the model can respond promptly in dynamic aquatic environments. Both factors are essential for intelligent water ecosystem management. By jointly optimizing accuracy and efficiency, this study provides an effective solution for real-time aquatic plant detection. It also promotes the application of deep learning in water ecosystem conservation and supports the sustainable management of aquatic environments. In summary, the key contributions of this paper include:
-
(1)
An enhanced lightweight aquatic plant detection model based on YOLOv8n was developed, achieving reductions in FLOPs and Params while improving detection accuracy.
-
(2)
APlight-YOLOv8n was deployed on a smartphone, demonstrating its effectiveness and robustness in real-time aquatic plant detection.
-
(3)
APlight-YOLOv8n outperformed YOLOv8n and other reference models in comparisons, validating its practical advantages in real-world applications.
The remainder of the paper is organized as follows: "Proposed approach" section provides a detailed description of the proposed method, including the flowchart and network architecture. "Experimental setup" section outlines the experimental setup, including the collated dataset, detection scenario, performance metrics, and implementation details. "Results" section presents the experimental results. "Discussions" section provides the discussion. Finally, "Conclusions" section offers the conclusions.
Proposed approach
The workflow of this research: Step. 1 dataset construction, Step. 2 model improvement and training, Step. 3 performance comparison, Step. 4 practical applications.
The workflow of this study, shown in Fig. 1, comprises four main steps. In Step 1, an aquatic plant dataset was compiled, containing images of various aquatic plant species in their natural environments. In Step 2, several model variations were developed using this dataset, including YOLOv8n, the C2f-UIB module, the Faster Detect module, and reference models. The Faster Detect and C2f-UIB modules replaced the standard modules in YOLOv8n, and the dataset was used to train the optimized models. The optimized models were trained and evaluated to identify the best-performing model, APlight-YOLOv8n. Simultaneously, YOLOv8n and the reference models were trained on the same dataset for comparison with APlight-YOLOv8n. In Step 3, a systematic comparison of APlight-YOLOv8n, YOLOv8n, and the reference models was conducted by analyzing training details, evaluation metrics, and model visualization results. Finally, in Step 4, APlight-YOLOv8n was deployed in real-world detection scenarios to assess its effectiveness in practical environments. The following section provides a detailed introduction to YOLOv8n, the Faster Detect module, the C2f-UIB module, and APlight-YOLOv8n.
Overview of YOLOv8n
YOLO models achieve an exceptional balance between speed and accuracy, making them a preferred choice among various target detection algorithms. This family of models can identify targets accurately and quickly while maintaining high efficiency. YOLOv8 is one of the most widely used and high-performing target detection models available today, with its network structure shown in Fig. 2.
Detailed structure of the YOLOv8.
The YOLOv8 network architecture consists of three main components: the backbone, the neck, and the head. The backbone is responsible for extracting features from the input image using a bottom-up approach. The neck serves as a bridge between the backbone and the head, focusing on feature fusion and processing. The head network is responsible for predicting target semantic features, ultimately providing the final target details and category confidences.
YOLOv8 is available in five variants, differing in depth and width. YOLOv8n, a lightweight variant, is optimized for mobile device deployment, with 3.2 M Params, a size of 6.1 MB, and a computational complexity of 8.2 GFLOPs. Detailed specifications and the source code for YOLOv8 can be found in Glenn, J26.
Faster detect model
To enhance the accuracy of aquatic plant detection, this paper begins by analyzing the original structure of YOLOv8n. It was observed that the Detect layer in the head network is designed to detect general objects, balancing detection across large, medium, and small targets. However, aquatic plants are primarily small to medium-sized, which makes them prone to occlusion by other objects32,33. Therefore, this paper analyzes the characteristics of aquatic plants and optimizes the Detect structure in the head network to better meet the specific requirements of aquatic plant detection.
As shown in Fig. 3, this paper presents an enhanced Faster Detect structure. Faster Detect utilizes a dual-branch structure to achieve feature reduction and enhancement, with feature map sizes of 512 × 40 × 40, 1024 × 20 × 20, and 1024 × 20 × 20, respectively. These sizes are selected to match the characteristics of aquatic plants, which are typically small to medium in size. For example, a 40 × 40 feature map is more effective for capturing the details of small and medium-sized targets. Experimental results confirm that these sizes achieve an optimal balance between data flow and computational efficiency, making them particularly effective for detecting small- to medium-sized aquatic plant targets. Specifically, the Faster Detect structure processes the feature map by first adjusting the feature channels through two parallel branches. Each branch uses 1 × 1 convolutions for dimensionality reduction, minimizing computational overhead. The feature map then passes through several 3 × 3 convolutional layers for further spatial feature extraction and semantic enhancement, ultimately outputting the target’s bounding box and labeled parameters.
Detailed structure of the Faster Detect.
C2f-UIB module
To further improve the accuracy and speed of aquatic plant detection, it was noted that the C2f module in the original model’s neck network introduces additional convolutional operations and computational overhead during feature fusion, increasing network redundancy34,35. Furthermore, the original C2f module does not effectively handle the complex backgrounds and species similarities in the aquatic plant detection environment. To address these limitations, this paper introduces a lightweight C2f-UIB module. This module improves detection accuracy and reduces model complexity through network structure optimization and weight migration. Additionally, it enhances the model’s ability to handle complex backgrounds and species similarities by incorporating depthwise separable convolution and an inverted bottleneck structure, enabling efficient feature fusion. The network structure is depicted in Fig. 421.
The C2f-UIB module significantly improves the model’s detection performance by enabling high-quality feature extraction and cross-layer feature fusion. The module consists of two main components: the UIB for feature extraction and the C2f for feature fusion. The UIB module extracts spatial features using 3 × 3 or 5 × 5 depthwise separable convolutions, placed before the extension layer and between the projection layers. This design expands the network’s receptive field by incorporating a spatial information mixing mechanism, enhancing its ability to capture global contextual information21. Additionally, the additional depthwise convolutions and inverted bottleneck structures within the UIB module enhance the coherence of contextual information capture, improve computational efficiency, and generate multi-scale, high-quality feature maps that provide rich data for subsequent feature fusion21. The C2f module is designed for cross-layer feature fusion. It integrates multi-layer feature information by concatenating high-quality features from the UIB with detailed features from lower layers. To control computational complexity, the C2f module uses 1 × 1 convolutions for dimensionality reduction, followed by 3 × 3 convolutions to reconstruct features, thus enhancing the integration of spatial information and feature representation. This fusion strategy seamlessly integrates low-level detailed features with high-level semantic features, optimizing the flow of multi-scale information. The C2f-UIB module enhances the model’s ability to detect occluded targets and complex backgrounds while operating within limited computational resources, improving detection accuracy and the network’s generalization capability.
Detailed structure of the C2f-UIB.
APlight-YOLOv8n
This paper integrates the C2f-UIB and Faster Detect modules into YOLOv8n to enhance detection accuracy, improve computational speed, and reduce redundancy in the model’s network structure. Specifically, the neck network of YOLOv8n contains four C2f modules, which are gradually replaced by C2f-UIB modules during optimization. Additionally, the Faster Detect module is incorporated into the head network, replacing the original Detect module. After testing various replacement schemes, the configuration that best balances detection accuracy and computational efficiency was chosen, referred to as APlight-YOLOv8n.
Figure 5 shows the overall structure of APlight-YOLOv8n. In this structure, the first three conventional C2f modules in the YOLOv8n neck network are replaced with C2f-UIB modules, while the Detect layer in the head network is substituted with the Faster Detect module. The source code and pre-trained weights for APlight-YOLOv8n are publicly available and can be accessed at https://github.com/MAOSAN199511/APlight-YOLOv8n.
Detailed structure of the APlight-YOLOv8n.
Experimental setup
Dataset and dataset quality
In this study, all models were trained on the aquatic plant dataset provided by Wang et al.18. The dataset was collected using a Canon 90D digital camera with a 24–70 mm standard zoom lens, and contains 2153 images and 11,180 annotated instances. The images have a resolution of 3840 × 2160 and were captured between May 2022 and May 2023. The dataset includes a range of complex conditions, such as visually similar aquatic plants, cluttered natural backgrounds, and partial occlusion by surrounding objects. The images were collected with the assistance of researchers from the Institute of Natural Ecology and the Institute of Environmental Planning and Design, Nanjing University. The collection environment includes natural water bodies, such as rivers, wetlands, and fish farms. The labeled aquatic plant examples are categorized into four life forms and 12 species, including common invasive species such as Eichhornia crassipes (for details, see Table 1 in Wang et al.18).
Examples of four different life forms of aquatic plants.
To ensure the accuracy of the dataset and the reliability of the assessment results, all bounding boxes and aquatic plant instances were validated and cross-checked by three experienced plant experts. Each expert independently reviewed the annotations based on the grading criteria in Table 1 of D. Wang et al.18, assessing annotation completeness, species correctness, and bounding box accuracy. The three experts then cross-checked their evaluations and resolved any discrepancies through discussion, resulting in a validated and corrected dataset. After validation, the dataset was randomly split into a training set comprising 1,637 images and a validation set containing 460 images. Figure 6 illustrates examples of challenging scenarios from the dataset.
To better understand the size distribution of aquatic plants in detection tasks, this study conducted a statistical analysis of the annotated data based on normalized area. Medium-sized targets were the most prevalent, totaling 8,446 (79.1%), followed by small targets, with 1,802 instances (16.8%), while large targets were the least common, with only 427 instances (4.1%). The number of aquatic plants across four distinct life forms was also analyzed in relation to target size, as shown in Table 1. Floating plants were the most abundant, totaling 3,875 (36.3%), followed by floating-leaved plants (3,234; 30.3%) and emergent plants (2,017; 18.9%). Submerged plants were relatively rare, with only 1,549 instances (14.5%). This distribution reflects the varying frequency and detection difficulty of different aquatic plant types in the image dataset, providing a foundation for subsequent model optimization.
Implementation details
The input image size for this study was set to 640 × 640 pixels, with a batch size of 16. To optimize the network, a learning rate of 0.001 was employed, with a momentum (α) of 0.9 for stochastic gradient descent. The network was trained for 100 epochs, with the first 90 epochs employing the self-contained mosaic data augmentation technique. All other parameters were set to their default values. To ensure comparability and fairness, the same pre-trained weights were used to train all models in both the ablation and comparison experiments. Furthermore, each improvement in the ablation experiments was retrained independently. All experiments were performed on a workstation running Windows 11, equipped with an Intel Core i7-13700 K CPU (16 cores, 24 threads, up to 5.4 GHz turbo frequency), 64 GB DDR5 RAM, and a single NVIDIA GeForce RTX 3090 Ti GPU with 24 GB GDDR6X VRAM. The deep learning environment was constructed using Anaconda 3 and Python 3.8.7. The primary deep learning framework was PyTorch 2.1.0, with the Ultralytics YOLOv8 package (version 8.3.18) serving as the core implementation. GPU acceleration was enabled via CUDA 11.8 and cuDNN 8.8.0.
Evaluation metrics
To comprehensively evaluate the performance of the APlight-YOLOv8n model in aquatic plant detection tasks, a series of widely adopted evaluation metrics from the target detection field are employed. These metrics include Precision (P), Recall (R), mAP50 and mAP50:95, Number of Params, FLOPs, and FPS.
Precision measures the proportion of correctly detected targets among all detected targets, and is calculated as:
where TP denotes the number of aquatic plant samples correctly detected, and FP denotes the number of samples incorrectly identified as aquatic plants.
Recall measures the proportion of actual aquatic plant targets successfully detected, and is calculated as:
where FN denotes the number of missed aquatic plant samples.
The mAP is the average of precision (AP) across multiple categories, used to evaluate the overall performance of the model, and is calculated as:
P and R are the key evaluation metrics. The mean Average Precision (mAP) represents the average precision of the model across different recall rates, providing insight into its ability to detect multiple target categories. mAP@50 refers to the value of mAP when the Intersection over Union (IoU) threshold is set to 0.5. This metric primarily evaluates detection performance at lower IoU thresholds, offering a measure of effectiveness under more relaxed conditions. mAP@50:95, on the other hand, calculates the average mAP across IoU thresholds from 0.5 to 0.95 in steps of 0.05, allowing a more comprehensive evaluation of the model’s performance across various thresholds.
FPS measures the number of image frames processed by the model per second. A higher FPS value indicates better processing efficiency, enabling the model to complete more detection tasks in less time. The FPS is calculated using the following formula:
where the inference time per frame refers to the time taken by the model to process a single frame.
The number of Params reflects the complexity of the model, representing the total count of trainable parameters in the network. A smaller number of parameters typically indicates a more lightweight model, making it better suited for deployment on resource-constrained devices.
FLOPs are a key measure of a model’s computational effort, indicating the number of floating-point operations required for a single forward pass. A lower FLOPs value indicates reduced computational resource demands, making the model more suitable for environments with limited computational capacity. In conclusion, these metrics offer an objective and comprehensive assessment of the model’s performance during training.
Results
This section replaces the conventional modules in YOLOv8n with the C2f-UIB and Faster Detect modules, followed by experimental analysis to identify the optimal model, APlight-YOLOv8n. It also provides a detailed evaluation of the training process for APlight-YOLOv8n and compares its performance with the baseline YOLOv8n and other reference models. To further validate the advantages of APlight-YOLOv8n in aquatic plant detection, visualizations of the model’s prediction process are presented, comparing it with the baseline YOLOv8n and demonstrating the detection results on a smartphone. These experiments collectively demonstrate the robustness and effectiveness of APlight-YOLOv8n for detecting aquatic plants.
Ablation study
This section optimizes model performance through a series of YOLOv8n-based ablation experiments designed to identify the most effective improvements. First, the conventional Detect module in the head network was replaced with the Faster Detect module. The traditional Detect module (15/18/21) processes targets at various scales by effectively fusing low- and high-level features. However, further analysis revealed that the 16/19 configuration could also effectively fuse low- and high-level features while retaining more valuable information, making it particularly suitable for detecting targets with intricate details and complex backgrounds. Since aquatic plants are typically small- and medium-sized targets, we adjusted the output layer to enhance detection performance for these scales.
The experimental results in Table 2 show that as the detection scales are adjusted across different combinations (small-medium-large, small-medium, and medium-large), the model’s performance metrics exhibit different fluctuation patterns. Specifically, the values of Params, MB, and FLOPs initially decrease and then increase, while P and R first increase and then decrease. mAP50 reaches its peak after fluctuating increases and then stabilizes in an oscillatory pattern.
At the small-medium-large scales, the model’s Params and MB remain nearly constant, with mean P and R values of 99.1% and 91.0%, respectively, FLOPs averaging 7.2G, and a 0.6% increase in mAP50. At small and medium scales, Params and MB gradually decrease, with mean values of 2.94 M and 5.87 MB, respectively. P and R have mean values of 98.8% and 90.0%, respectively, FLOPs are 7.1G, and mAP50 increases by 0.8%. At the medium-large scale, while the mean FLOPs decrease to 6.8G, Params and MB increase significantly to 3.67 M and 7.36 MB, respectively. This results in a notable decrease in P and R to 96.5% and 89.2%, respectively, alongside a 0.8% increase in mAP50. Notably, despite the relatively low FLOPs at medium and large scales, this change is mainly attributed to the increase in Params. Further comparison of detection scale combinations revealed that the optimal configuration for small-medium-large scales was 16/18/21, with model Params and MB of 3.01 M and 5.79 MB, respectively. This configuration yielded 99.8% and 92.0% for P and R, respectively, 6.7G for FLOPs, and a mAP50 of 72.9%. The optimal configuration for small and medium scales is 16/19/19, with 2.87 M and 5.78 MB for Params and MB, 99.5% and 90.0% for P and R, 6.1G for FLOPs, and 73.9% for mAP50. The optimal configuration for medium and large scales was 19/19/19, with 3.50 M and 7.01 MB for Params and MB, respectively. This configuration resulted in 97.8% and 89.0% for P and R, 6.2G for FLOPs, and 73.6% for mAP50. In a combined comparison, the 16/19/19 configuration outperformed all other configurations. Compared to the original model, Params and MB were reduced by 9.2% and 5.1%, respectively, while P and R remained almost unchanged. FLOPs were reduced by 25.6%, and mAP50 improved by 1.4% with this configuration. These results show that the output layer configuration of 16/19/19 significantly improves the overall performance of the model in aquatic plant detection, as it balances the detection needs of small and medium-sized targets with computational efficiency.
Secondly, based on the optimal performance achieved at the detection scale, the C2f modules in the neck network are replaced sequentially with C2f-UIB modules. Table 3 provides a performance comparison of all replacement options. The numbers (1–4) in the table correspond to the four C2f modules in the YOLOv8n neck network, with 1 representing the first C2f module, 2 the second, and so on, with 4 representing the last. The exact positions of these modules are shown in Fig. 5. A checkmark in the column corresponding to a number indicates that the C2f module at that position was replaced with a C2f-UIB module, or vice versa.
As shown in Table 3, Params and MB fluctuated but generally showed a decreasing trend, while P and R exhibited no clear pattern. FLOPs consistently decreased, but mAP50 exhibited no clear trend. When only one C2f module was replaced, mAP50 achieved its best accuracy of 73.2% when the 3rd position was replaced. Replacing two C2f modules resulted in a mAP50 of 74.2%, which remained unchanged when the 3rd and 4th positions were replaced. mAP50 improved to 74.4% when three C2f modules were replaced, with the same accuracy achieved when the 1st, 2nd, and 3rd positions were replaced. When all four C2f modules were replaced, mAP50 reached the highest accuracy of 73.7%. Consequently, replacing only two or three C2f modules led to the required improvement.
The optimal configuration for replacing two C2f modules is to replace positions 3 and 4. In this case, Params and MB are 2.58 M and 5.24 MB, representing reductions of 10.1% and 9.3%, respectively. P and R remain almost unchanged, FLOPs decrease to 5.7G (a 6.6% reduction), and mAP50 increases by 0.4%, reaching 74.2%. The optimal configuration for replacing three C2f modules is to replace positions 1, 2, and 3. In this case, Params and MB are 2.74 M and 5.54 MB, representing decreases of 4.5% and 4.2%, respectively. P and R remain largely unchanged, FLOPs decrease to 5.5G (a 9.8% reduction), and mAP50 increases by 0.7%, reaching 74.4%. In summary, replacing the C2f module in positions 1, 2, and 3 proves to be the optimal solution, as it significantly reduces computational load while improving detection accuracy. This is achieved by optimizing the efficiency of feature fusion and minimizing redundant computations, making it especially effective for complex aquatic plant detection scenarios.
By combining the optimal detection scale with the most effective C2f module replacement strategy, the improved APlight-YOLOv8n model achieves optimal performance. Experimental results show that the proposed lightweight improvements significantly enhance YOLOv8n’s performance in the automatic detection of aquatic plant species, validating the approach’s feasibility and advantages.
Baseline model comparison
To evaluate the performance of APlight-YOLOv8n, a comparative analysis was performed against the baseline YOLOv8n model. The baseline model was trained using the same aquatic plant dataset and under identical experimental conditions.
As shown in Table 4, APlight-YOLOv8n significantly outperforms the baseline YOLOv8n model in terms of overall performance. Specifically, compared to the baseline model, APlight-YOLOv8n achieved reductions of 13.3% in Params, 9.0% in MB, and 32.9% in FLOPs, while mAP50 improved by 1.9%.
As shown in Table 5, APlight-YOLOv8n achieves improved accuracy in most lifestyle categories. Specifically, it achieves improvements of 5.0% and 3.7% in the emergent plants and floating plants categories, respectively, indicating its stronger expressive capability in categories with less distinct features or complex morphologies. In contrast, the detection accuracy of the improved model slightly decreased in the submerged plants category, which may be attributed to factors such as blurred features, slender morphology, and severe occlusion in this category.
Comparison of metrics changes between YOLOv8n and APlight-YOLOv8n during training. (A) The precision changes, (B) The recall changes, (C) The mAP50 changes, (D) The mAP50:95 changes.
Figure 7 presents a detailed comparison of APlight-YOLOv8n and the baseline YOLOv8n models during training, with a focus on four key metrics: Precision, Recall, mAP50, and mAP50:95. While both models ultimately converge during training, their trajectories exhibit notable differences. Specifically, YOLOv8n achieves higher metric values in the early stages of training, while APlight-YOLOv8n surpasses it in the later stages.
As shown in Fig. 7, the training curves for Precision, Recall, mAP50, and mAP50:95 for APlight-YOLOv8n and the baseline YOLOv8n model exhibit distinct differences in their training progress. While both models eventually converge, notable variations are observed in their training trajectories. Specifically, YOLOv8n achieves higher values for these metrics in the early training phases, but APlight-YOLOv8n surpasses it in the later stages. As shown in Fig. 7A, the Precision training curves reveal significant differences between the two models. During the first 40 epochs, YOLOv8n significantly outperforms APlight-YOLOv8n. However, after this point, the curves begin to alternate, with APlight-YOLOv8n catching up and ultimately surpassing YOLOv8n in accuracy. As illustrated in Fig. 7B, the Recall training curves for both models largely overlap, indicating similar trends in this metric during training. Furthermore, in Fig. 7C and D, the mAP50 and mAP50:95 curves follow similar patterns. In the first 50–60 epochs, YOLOv8n consistently outperforms APlight-YOLOv8n. However, during the subsequent epochs, APlight-YOLOv8n demonstrates substantial growth, surpassing the baseline model in both mAP50 and mAP50:95. These results suggest that APlight-YOLOv8n exhibits higher optimization efficiency and greater performance potential in the later stages of training. Although it lags initially due to its lightweight design, APlight-YOLOv8n demonstrates faster learning and optimization as training progresses, ultimately achieving higher accuracy.
In conclusion, the evaluation of both the baseline YOLOv8n model and the APlight-YOLOv8n model using these metrics highlights the superior performance of the improved model, thereby validating the success of the proposed improvements.
Analysis of the prediction process
This section offers a visual analysis of the prediction processes of APlight-YOLOv8n and the baseline YOLOv8n model using Gradient-weighted Class Activation Mapping (Grad-CAM). Grad-CAM is a technique that visualizes the model’s focus on specific features in an image through a heatmap, highlighting regions of interest in the predicted image. Both models were trained on the same aquatic plant dataset and evaluated using the same test images for this evaluation. Grad-CAM visualizations were generated from the network layers of both models during the image prediction process.
Detailed comparison of Grad-CAM. (A) Emergent plants, (B) floating plants, (C) floating-leaved plants, (D) submerged plants, (E) emergent plants, (F) floating plants, (G) floating-leaved plants, (H) submerged plants.
Figure 8 illustrates the Grad-CAM visualizations of the prediction processes for both the baseline YOLOv8n model and APlight-YOLOv8n, applied to aquatic plants, floating plants, floating-leaved plants, and submerged plants. As shown in Figs. 8(A–D), the baseline YOLOv8n model fails to comprehensively capture the relevant features of aquatic plants, floating plants, and floating-leaved plants, with particularly poor detection of submerged plants. In contrast, Figs. 8E–H show that APlight-YOLOv8n significantly improves feature focus, clearly highlighting the key characteristics of different plant types, with exceptional performance in detecting submerged plants, demonstrating clear superiority and accuracy. These visualizations clearly demonstrate that APlight-YOLOv8n detects aquatic plant features more accurately than the baseline YOLOv8n model. The Grad-CAM visualizations offer valuable insights into the model’s prediction process, further validating the superior performance of APlight-YOLOv8n in detecting aquatic plants.
Comparison experiments with different object detection models
This section compares the APlight-YOLOv8n model with traditional YOLO series target detection models to further validate the performance advantages of the proposed method for aquatic plant detection.
As shown in Table 6, APlight-YOLOv8n significantly outperforms the standard models YOLOv5s and YOLOv8s in terms of Params, MB, and FLOPs. It also shows a notable improvement in the key metric, mAP50. Compared to classic lightweight models such as YOLOv3-Tiny and YOLOv5n, APlight-YOLOv8n shows a substantial performance boost, achieving a significant increase in detection accuracy and outperforming these traditional models. When compared with newer-generation lightweight models (e.g., YOLOv7-Tiny and YOLOv10n), APlight-YOLOv8n has the advantage of lower computational requirements while maintaining higher detection accuracy, demonstrating its unique ability to balance efficiency and performance. This makes it an ideal solution for scenarios with lightweight requirements. Furthermore, APlight-YOLOv8n shows a clear parameter advantage over the APNet-YOLOv8s model proposed by D. Wang et al. (2024). Specifically, APlight-YOLOv8n achieves similar detection accuracy with only 2.5% of the parameters, 25.0% of the model size, and 23.5% of the FLOPs of APNet-YOLOv8s. This suggests that APlight-YOLOv8n strikes an optimal balance between accuracy and efficiency, significantly enhancing the detection of aquatic plants.
Practical case applications
This section validates the performance and versatility of the APlight-YOLOv8n and YOLOv8n models in real-world applications. Both models were deployed on an Android mobile phone (model: OnePlus NE2210, running Android 14, with a Snapdragon 8 Gen1 octa-core processor and 12 GB of RAM). Model performance was evaluated using two methods. The first method involved real-time detection of field samples, with aquatic plant scenes captured in Suzhou City, Jiangsu Province, and Taizhou City, Zhejiang Province. The results are shown in Fig. 9. The second method involved real-time detection of web images, where aquatic plant images from a nature lovers’ community website were selected as validation samples. The results are shown in Fig. 10.
Representative real-time detection results of aquatic plants from Suzhou, Jiangsu Province, and Taizhou, Zhejiang Province, China. (A) Emergent plants, (B) floating plants, (C) floating-leaved plants, (D) submerged plants, (E) emergent plants, (F) floating plants, (G) floating-leaved plants, (H) submerged plants.
As shown in Fig. 9, APlight-YOLOv8n successfully detects various types of aquatic plants and demonstrates exceptional performance, maintaining a stable FPS of approximately 32.70. This underscores its superior real-time processing capabilities and detection accuracy. In contrast, YOLOv8n exhibits varying levels of omission and misdetection when handling aquatic plants in complex scenes, with an average FPS of around 29.28, indicating lower processing efficiency and stability.
Representative detection results of network image aquatic plants. (A) Emergent plants, (B) floating plants, (C) floating-leaved plants, (D) submerged plants, (E) emergent plants, (F) floating plants, (G) floating-leaved plants, (H) submerged plants.
Furthermore, as shown in Fig. 10, with increasing complexity in detection scenarios, APlight-YOLOv8n consistently maintains high accuracy, particularly in Figs. 10E and G. Figure 10E depicts a scene with a complex, cluttered background, where APlight-YOLOv8n successfully detects aquatic plants despite the background noise. Similarly, Fig. 10G shows APlight-YOLOv8n accurately identifying mixed aquatic plants, demonstrating remarkable robustness in a typical rainy environment with water droplets and background clutter, which complicate detection. In contrast, YOLOv8n struggled under these challenging conditions. As seen in Figs. 10E and G, YOLOv8n exhibited omissions and misdetections, even misidentifying raindrops as aquatic plants, highlighting its limited adaptability to complex backgrounds and adverse weather conditions.
In conclusion, these detection results validate the versatility and effectiveness of APlight-YOLOv8n in addressing diverse and complex aquatic plant scenarios. The model consistently achieves high accuracy and efficiency, even under challenging conditions.
Discussions
This study highlights the significant differences between the lightweight APlight-YOLOv8n model and the original YOLOv8n by comparing and analyzing their performance across various stages, including training, validation, and inference. APlight-YOLOv8n outperforms YOLOv8n across all metrics during both the training and validation phases. It also shows superior performance during inference on smartphones, maintaining high robustness and accuracy even in challenging conditions, such as complex environments, cloudy and rainy weather, and object occlusion. In contrast, YOLOv8n, with its more complex network structure, larger number of parameters, and increased FLOPs, underperformed at all stages. It especially struggled with the complex task of detecting aquatic plants.
The differences in YOLOv8n’s performance in aquatic plant detection can be attributed to the design of the deep learning model and the dataset used for training. Since the introduction of AlexNet, most deep learning object detection models, including YOLO, have been trained and optimized using the COCO dataset36,37. The COCO dataset includes a wide variety of common objects but lacks detailed classifications for certain object types38. For example, the vehicle category does not differentiate between brands. In contrast, the diverse shapes of aquatic plants and interference from complex backgrounds present a greater challenge for general models. Aquatic plants vary in form, with subtle differences in leaf shapes, sizes, and textures. Furthermore, factors such as water currents, lighting variations, and obstructions in natural water environments further complicate detection. Therefore, while YOLOv8n excels in general object detection, its design based on the COCO dataset, along with its relatively redundant network structure, makes it prone to false positives, false negatives, and misdetections, particularly in tasks involving small to medium-sized objects and complex scenes.
To address these challenges, current research explores enhancements through architectural innovations and attention mechanisms aimed at improving detection performance in complex ecological scenarios. For example, Durgut and Ünsalan proposed the Swin Transformer Detect module, which improves mAP by 1.5% by fine-tuning spatial location information to enhance object localization and classification39. Su et al. introduced the MODL-Head module, which mitigates channel loss during feature dimensionality reduction, preserves more feature layer information before prediction, and improves mAP by 2.9%40. Solimani et al. proposed the Squeeze-and-Excitation attention module, which enhances the model’s recognition ability by focusing more on the target class, thereby improving overall detection performance41. However, unlike these methods, this paper achieves a similar performance improvement by adjusting the YOLOv8n network structure without introducing a new module. Additionally, this study leverages the advantages of UIB modules, enabling low-cost spatial mixing with larger convolutional kernels21. This approach increases network depth and receptive fields while reducing computational resource consumption. For example, Zhang and Chen combined all C2f modules in the YOLOv8n backbone network with UIB to reduce the computational burden, improving mAP by 1.2%35. Hu et al. similarly combined all C2f modules in the YOLOv8n neck network with UIB to address redundant feature extraction, resulting in a 0.5% improvement in mAP34. This paper combines the first three C2f modules in the YOLOv8n neck network with UIB, leaving the C2f modules in the backbone network unchanged. With these adjustments, APlight-YOLOv8n achieves outstanding performance in aquatic plant detection.
Although APlight-YOLOv8n shows strong overall detection performance, its accuracy still differs across aquatic plant categories. The model performs well on emergent and floating plants but shows weaker results for floating-leaved and submerged plants, with the greatest decline occurring in submerged plant detection. This difference is mainly due to two factors. First, different categories exhibit distinct visual features. Emergent and floating plants usually have clear outlines and concentrated structures, making feature extraction easier for the model. In contrast, submerged plants occur underwater and often have fragmented leaves, weak textures, and blurred edges, leading to low target saliency and greater detection difficulty. Second, visual information from submerged plants is more easily affected by environmental factors such as light attenuation, water turbidity, suspended particles, and surface reflections. These conditions reduce image contrast and produce unstable features, making feature extraction and classification more challenging. In addition, the lightweight feature-fusion strategy in APlight-YOLOv8n tends to emphasize high-contrast and high-saliency regions, which improves performance in scenes with clear structures but limits sensitivity in weak-texture environments. As a result, the detection of submerged plants—targets strongly affected by environmental conditions and showing weak feature information—still has room for improvement. This also indicates that in more complex and dynamic aquatic environments, further optimization is needed to strengthen the model’s robustness in recognizing low-saliency aquatic plants.
In summary, while this study offers meaningful contributions, certain limitations remain. First, the aquatic plant dataset provided by Wang et al.18 contains a limited number of species, which restricts the model’s diversity and generalization ability. Future work could expand the dataset to include plants from different environments and with diverse morphologies to improve species diversity. Second, submerged plant samples are affected by reflections, turbidity, and other factors, leaving room to improve detection of weakly salient categories. Future work could explore low-level texture enhancement, deformable convolutions, or category-adaptive attention mechanisms to improve feature extraction for weakly salient targets. Additionally, APlight-YOLOv8n has so far only been deployed on smartphones and has not been tested on other mobile platforms such as drones or Jetson Nano. Extending the method to more mobile platforms could improve its robustness and applicability in diverse real-world scenarios.
Conclusions
This study introduces APlight-YOLOv8n, a lightweight model designed for real-time aquatic plant detection on smartphones in complex environments. The Faster Detect module refines the recognition of small-to-medium targets, reducing network redundancy and computational load. Meanwhile, the C2f-UIB module broadens the network’s view, suppresses background noise, and enhances both high- and low-level feature fusion, all while minimizing overhead. Together, these enhancements elevate APlight-YOLOv8n to a mAP50 of 74.4% and a steady 32.70 FPS, surpassing traditional methods and meeting real-time requirements. This tool provides water managers with an efficient, cost-effective solution for monitoring aquatic plants, supporting weed control, invasive species management, and water ecosystem protection. However, challenges remain. The model’s reliance on a limited species dataset limits its versatility across different aquatic plants. Future efforts will expand its species coverage, ensuring robustness across diverse flora. Additionally, optimizing APlight-YOLOv8n for other lightweight platforms, such as UAVs or embedded systems, will enhance its utility, driving smarter and more adaptable aquatic environmental solutions.
Data availability
The data used in this study are confidential and cannot be shared publicly. However, they are available from the first author, Daoli Wang, upon reasonable request (contact: 230301010003@hhu.edu.cn).
References
Hama Aziz, K. H. et al. Heavy metal pollution in the aquatic environment: Efficient and low-cost removal approaches to eliminate their toxicity: A review. RSC Adv. 13, 17595–17610 (2023).
Okereafor, U. et al. Toxic metal implications on agricultural soils, plants, animals, aquatic life and human health. IJERPH 17, 2204 (2020).
Vymazal, J. & Březinová, T. The use of constructed wetlands for removal of pesticides from agricultural runoff and drainage: A review. Environ. Int. 75, 11–20 (2015).
Gu, B. et al. Cost-effective mitigation of nitrogen pollution from global croplands. Nature 613, 77–84 (2023).
Huang, J. et al. Characterizing the river water quality in china: recent progress and on-going challenges. Water Res. 201, 117309 (2021).
Mukhopadhyay, A., Duttagupta, S. & Mukherjee, A. Emerging organic contaminants in global community drinking water sources and supply: A review of occurrence, processes and remediation. J. Environ. Chem. Eng. 10, 107560 (2022).
Hao, Z., Lin, L., Post, C. J. & Mikhailova, E. A. Monitoring the spatial–temporal distribution of invasive plant in urban water using deep learning and remote sensing technology. Ecol. Ind. 162, 112061 (2024).
Palai, S. P. et al. A review on exploring pyrolysis potential of invasive aquatic plants. J. Environ. Manag. 371, 123017 (2024).
Patel, M., Jernigan, S., Richardson, R., Ferguson, S. & Buckner, G. Autonomous robotics for identification and management of invasive aquatic plant species. Appl. Sci. 9, 2410 (2019).
Wang, Z., Cui, J. & Zhu, Y. Review of plant leaf recognition. Artif. Intell. Rev. 56, 4217–4253 (2023).
Iqbal, Z. et al. An automated detection and classification of citrus plant diseases using image processing techniques: A review. Comput. Electron. Agric. 153, 12–32 (2018).
Lønborg, C. et al. Submerged aquatic vegetation: Overview of monitoring techniques used for the identification and determination of spatial distribution in European coastal waters. Integr. Envir Assess. Manag. 18, 892–908 (2022).
Xie, G. et al. FlowerMate 2.0: identifying plants in China with artificial intelligence. Innov. 5, 100636 (2024).
Kabir, H., Juthi, T., Islam, M. T., Rahman, M. W. & Khan, R. WaterHyacinth: A comprehensive image dataset of various water hyacinth species from different regions of Bangladesh. Data Brief. 52, 109872 (2024).
Garcia-Ruiz, F., Campos, J., Llop-Casamada, J. & Gil, E. Assessment of map based variable rate strategies for copper reduction in hedge vineyards. Comput. Electron. Agric. 207, 107753 (2023).
Wang, P. et al. Weed25: A deep learning dataset for weed identification. Front. Plant. Sci. 13, 1053329 (2022).
Bai, Y. & Bai, X. Deep learning-based aquatic plant recognition technique and natural ecological aesthetics conservation. Crop Prot. 184, 106765 (2024).
Wang, D. et al. APNet-YOLOv8s: A real-time automatic aquatic plants recognition algorithm for complex environments. Ecol. Ind. 167, 112597 (2024).
Howard, A. et al. Searching for MobileNetV3. Preprint at (2019). https://doi.org/10.48550/arXiv.1905.02244.
Howard, A. G. et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Preprint at (2017). https://doi.org/10.48550/arXiv.1704.04861.
Qin, D. et al. MobileNetV4 -- Universal Models for the Mobile Ecosystem. Preprint at (2024). https://doi.org/10.48550/arXiv.2404.10518.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L. C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. Preprint at (2019). https://doi.org/10.48550/arXiv.1801.04381.
Redmon, J. & Farhadi, A. YOLOv3: An incremental improvement. Preprint at (2018). http://arxiv.org/abs/1804.02767.
Bochkovskiy, A., Wang, C. Y. & Liao, H. Y. M. YOLOv4: Optimal speed and accuracy of object detection. Preprint at (2020). http://arxiv.org/abs/2004.10934.
Glenn, J. YOLOv5 release v6.1. (2022). https://github.com/ultralytics/yolov5/releases/tag/v6.1.
Glenn, J. Ultralytics YOLOv8. (2023). https://github.com/ultralytics/ultralytics.
Ma, B. et al. Using an improved lightweight YOLOv8 model for real-time detection of multi-stage Apple fruit in complex orchard environments. Artif. Intell. Agric. 11, 70–82 (2024).
Wang, C. Y., Yeh, I. H. & Liao, H. Y. M. YOLOv9: Learning What you want to learn using programmable gradient information. Preprint at (2024). https://doi.org/10.48550/arXiv.2402.13616.
Zhang, Y. et al. Real-time strawberry detection using deep neural networks on embedded system (rtsd-net): an edge AI application. Comput. Electron. Agric. 192, 106586 (2022).
Lan, M. et al. RICE-YOLO: In-Field rice Spike detection based on improved YOLOv5 and drone images. Agronomy 14, 836 (2024).
Wan, J. et al. DSC-YOLOv8n: an advanced automatic detection algorithm for urban flood levels. J. Hydrol. 643, 132028 (2024).
Aboah, A., Wang, B., Bagci, U. & Adu-Gyamfi, Y. Real-time Multi-Class Helmet Violation Detection Using Few-Shot Data Sampling Technique and YOLOv8. in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 5350–5358 (IEEE, Vancouver, BC, Canada, 2023). https://doi.org/10.1109/CVPRW59228.2023.00564.
Tu, W. et al. Farmed fish detection by improved YOLOv8 based on channel non-degradation with spatially coordinated attention. J. DALIAN OCEAN. Univ. 38, 717–725 (2023).
Hu, H., Chen, M., Huang, L. & Guo, C. BHI-YOLO: A lightweight instance segmentation model for strawberry diseases. Appl. Sci. 14, 9819 (2024).
Zhang, Q. & Chen, S. Research on improved lightweight fish detection algorithm based on Yolov8n. JMSE 12, 1726 (2024).
Kaur, R. & Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Proc. 132, 103812 (2023).
Tong, K., Wu, Y. & Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 97, 103910 (2020).
Pont-Tuset, J. & Gool, L. V. Boosting Object Proposals: From Pascal to COCO. in IEEE International Conference on Computer Vision (ICCV) 1546–1554 (IEEE, Santiago, Chile, 2015). https://doi.org/10.1109/ICCV.2015.181.
Durgut, O., Ünsalan, C. A. & Swin Transformer YOLO, and Weighted Boxes Fusion-Based Approach for Tree Detection in Satellite Images. in 32nd Signal Processing and Communications Applications Conference (SIU) 1–4 (IEEE, Mersin, Turkiye, 2024). https://doi.org/10.1109/SIU61531.2024.10601134.
Su, P., Han, H., Liu, M., Yang, T. & Liu, S. MOD-YOLO: rethinking the YOLO architecture at the level of feature information and applying it to crack detection. Expert Syst. Appl. 237, 121346 (2024).
Solimani, F. et al. Optimizing tomato plant phenotyping detection: boosting YOLOv8 architecture to tackle data complexity. Comput. Electron. Agric. 218, 108728 (2024).
Funding
This work was supported by the National Key Research and Development Program of China (Grant No. 2023YFC3206800).
Author information
Authors and Affiliations
Contributions
D.W.: Writing—original draft, Software, Methodology, Investigation, Conceptualization. Z.D.: Writing—review & editing, Validation, Supervision, Methodology, Funding acquisition. G.Y.: Writing—review & editing, Validation, Supervision. Z.Z.: Supervision. R.L.: Supervision. J.Z.: Supervision. W.W.: Writing—review & editing, Supervision. Y.Q.: Writing—review & editing, Validation, Supervision.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval and consent to participate
All authors agreed to publish this manuscript.
Consent for publication
Consent and approval for publication was obtained from all authors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, D., Dong, Z., Yang, G. et al. A real-time mobile aquatic plant recognition algorithm based on deep learning for intelligent ecological monitoring. Sci Rep 16, 5075 (2026). https://doi.org/10.1038/s41598-026-35310-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-35310-1









