Abstract
The diversity in fruit posture has become the key factor that limits improvements in stem recognition precision during cherry tomato harvesting. To effectively enhance the recognition of small target features in cherry tomato stems, data augmentation strategies are employed to expand the dataset selectively, improving the model’s adaptability to complex scenarios. First, based on the YOLO11n model, the Large Separable Kernel Attention (LSKA) mechanism is integrated into the Spatial Pyramid Pooling-Fast (SPPF) to construct the SPPL module. This design effectively improves detection accuracy and model robustness while expanding the receptive field and enhancing feature extraction capabilities. This reduces computational complexity and enhances model robustness. Second, the Spatial and Channel Reconstruction Convolution (ScConv) module is embedded into the Bottleneck architecture to replace the original C3K2 module, thereby reducing feature redundancy and improving the extraction of fine-grained features. Finally, the BAFPN module was designed, which integrates the Asymptotic Feature Pyramid Network (AFPN) module to enhance the perception capability for small objects. Experimental results indicate that the YOLO-LSBA model achieves a precision of 97.1% and a recall of 78.3% for fruit stem recognition, with an AP of 92.4%. These metrics show improvements of 3.9%, 0.6%, and 2.2%, respectively, compared to the baseline model. Field trials further demonstrate that this model outperforms baseline models in detecting fruit stems under real agricultural conditions. This method offers new insights for intelligent harvesting.
Data availability
The data used in this study were obtained from publicly available sources. The dataset is available via the Scientific Data Bank (ScienceDB) at https://doi.org/10.57760/sciencedb.05228. Additional data supporting this study are available from the corresponding author upon reasonable request.
References
Agius, C., von Tucher, S. & Rozhon, W. The Effect of Salinity on Fruit Quality and Yield of Cherry Tomatoes. Horticulturae 8, 59. https://doi.org/10.3390/horticulturae8010059 (2022).
Chang, Y. et al. Fruit Quality Analysis and Flavor Comprehensive Evaluation of Cherry Tomatoes of Different Colors. Foods 13, 1898. https://doi.org/10.3390/foods13121898 (2024).
Miao, Z. et al. Efficient tomato harvesting robot based on image processing and deep learning. Precis Agric. 24, 254–287. https://doi.org/10.1007/s11119-022-09944-w (2022).
Zheng, H., Wang, G. & Li, X. YOLOX-Dense-CT: a detection algorithm for cherry tomatoes based on YOLOX and DenseNet. J. Food Meas. Charact. 16, 4788–4799. https://doi.org/10.1007/s11694-022-01553-5 (2022).
Arad, B. et al. Development of a sweet pepper harvesting robot. J. Field Robot. 37, 1027–1039. https://doi.org/10.1002/rob.21937 (2020).
Gao, J., Zhang, J., Zhang, F. & Gao, J. L. A. C. T. A. A lightweight and accurate algorithm for cherry tomato detection in unstructured environments. Expert Syst. Appl. 238, 122073. https://doi.org/10.1016/j.eswa.2023.122073 (2024).
Lawal, O. M. Development of tomato detection model for robotic platform using deep learning. Multimed Tools Appl. 80, 26751–26772. https://doi.org/10.1007/s11042-021-10933-w (2021).
Zhang, F. et al. Multi-class detection of cherry tomatoes using improved YOLOv4-Tiny. Int. J. Agric. Biol. Eng. 16, 225–231. https://doi.org/10.25165/j.ijabe.20231602.7744 (2023).
Wu, J. et al. Automatic Recognition of Ripening Tomatoes by Combining Multi-Feature Fusion with a Bi-Layer Classification Strategy for Harvesting Robots. Sensors 19, 612. https://doi.org/10.3390/s19030612 (2019).
Yang, Q., Chang, C., Bao, G., Fan, J. & Xun, Y. Recognition and localization system of the robot for harvesting Hangzhou White Chrysanthemums. Int. J. Agric. Biol. Eng. 11, 88–95. https://doi.org/10.25165/j.ijabe.20181101.3683 (2018).
Yamamoto, K., Guo, W., Yoshioka, Y. & Ninomiya, S. On Plant Detection of Intact Tomato Fruits Using Image Analysis and Machine Learning Methods. Sensors 14, 12191–12206. https://doi.org/10.3390/s140712191 (2014).
Xiao, F. et al. Object Detection and Recognition Techniques Based on Digital Image Processing and Traditional Machine Learning for Fruit and Vegetable Harvesting Robots: An Overview and Review. Agronomy 13, 639. https://doi.org/10.3390/agronomy13030639 (2023).
Wang, C. et al. Uncertainty estimation for stereo matching based on evidential deep learning. Pattern Recogn. 124, 108498. https://doi.org/10.1016/j.patcog.2021.108498 (2022).
Xu, Z. F., Jia, R. S., Liu, Y. B., Zhao, C. Y. & Sun, H. M. Fast Method of Detecting Tomatoes in a Complex Scene for Picking Robots. IEEE Access. 8, 55289–55299. https://doi.org/10.1109/access.2020.2981823 (2020).
Zheng, X. et al. Fruit growing direction recognition and nesting grasping strategies for tomato harvesting robots. J. Field Robot. 41, 300–313. https://doi.org/10.1002/rob.22263 (2023).
Li, A., Wang, C., Ji, T., Wang, Q. & Zhang, T. D3-YOLOv10: Improved YOLOv10-Based Lightweight Tomato Detection Algorithm Under Facility Scenario. Agriculture 14, 2268. https://doi.org/10.3390/agriculture14122268 (2024).
Jia, H. et al. DPDB-YOLO: A lightweight YOLOv13 cherry tomato ripeness detection method with adaptive extraction module and multi-scale feature fusion architecture. Ind. Crops Prod. 238, 122419. https://doi.org/10.1016/j.indcrop.2025.122419 (2025).
Badgujar, C. M., Poulose, A. & Gan, H. Agricultural object detection with You Only Look Once (YOLO) Algorithm: A bibliometric and systematic literature review. Comput. Electron. Agric. 223, 109090. https://doi.org/10.1016/j.compag.2024.109090 (2024).
Lv, L., Li, J. & Zhao, Y. DMP-YOLO: Dense multi-scale perception for complex scenes YOLO algorithm Prunus humilis small target detection. Smart Agric. Technol. 12, 101461. https://doi.org/10.1016/j.atech.2025.101461 (2025).
Abudukelimu, H. et al. Cotton leaf disease detection model focusing on small targets and comprehensive feature extraction. Sci. Rep. 15, 41125. https://doi.org/10.1038/s41598-025-24898-5 (2025).
Liu, Q., Zhang, Y. & Yang, G. Small unopened cotton boll counting by detection with MRF-YOLO in the wild. Comput. Electron. Agric. 204, 107576. https://doi.org/10.1016/j.compag.2022.107576 (2023).
Saif, D., Askr, H., Sarhan, A. M. & Hassanien, A. E. Enhanced YOLO12 with spatial pyramid pooling for real-time cotton insect detection. Sci. Rep. 16, 4806. https://doi.org/10.1038/s41598-026-35747-4 (2026).
Xue, Y. et al. Target-Distractor Aware UAV Tracking via Global Agent. IEEE Trans. Intell. Transp. Syst. 26, 16116–16127. https://doi.org/10.1109/tits.2025.3581391 (2025).
Li, J. et al. A small-scale target enhancement framework for aerial pineapple images on accurate agricultural information. Comput. Electron. Agric. 239, 110874. https://doi.org/10.1016/j.compag.2025.110874 (2025).
Wang, J., Qi, Z., Wang, Y. & Liu, Y. A lightweight weed detection model for cotton fields based on an improved YOLOv8n. Sci. Rep. 15, 457. https://doi.org/10.1038/s41598-024-84748-8 (2025).
Zhao, J. et al. YOLO-Granada: a lightweight attentioned Yolo for pomegranates fruit detection. Sci. Rep. 14, 16848. https://doi.org/10.1038/s41598-024-67526-4 (2024).
Lan, M. et al. RICE-YOLO: In-Field Rice Spike Detection Based on Improved YOLOv5 and Drone Images. Agronomy 14, 836. (2024). https://doi.org/10.3390/agronomy14040836
Xue, Y. et al. Frequency-Aware Interaction and Multi-Expert Fusion for RGB-T Tracking. IEEE Trans. Circuits Syst. Video Technol. 36, 1655–1667. https://doi.org/10.1109/tcsvt.2025.3601598 (2026). FMTrack.
Song, G. et al. A dataset of greenhouse cluster tomato fruit in the Tomato Town of Jinzhong National Agricultural High-tech Industries Demonstration Zone in 2022. China Sci. Data. 8, 434–441. https://doi.org/10.57760/sciencedb.05228 (2023).
Shorten, C. & Khoshgoftaar, T. M. A survey on Image Data Augmentation for Deep Learning. J. Big Data. 6, 1–48. https://doi.org/10.1186/s40537-019-0197-0 (2019).
Lau, K. W., Po, L. M. & Rehman, Y. A. U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 236, 121352 (2024).
Li, J., Wen, Y., He, L. & Scconv Spatial and channel reconstruction convolution for feature redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6153–6162 (2023).
Yang, G. et al. AFPN: Asymptotic feature pyramid network for object detection. arXiv preprint arXiv :230615988 (2023).
Horuz, C. C. et al. The Resurrection of the ReLU. arXiv preprint arXiv:2505.22074 (2025).
Funding
This research was supported by the Science and Technology Special Envoys’ Agricultural Material Technology and Equipment Challenge Project (2022296906020001), the Anhui Provincial University Collaborative Innovation Project (GXXT-2023-110), and the University Key Discipline Development Program (XK-XJJC002).
Author information
Authors and Affiliations
Contributions
Conceptualization, Q.L.; methodology, Q.L. and J.Y.; software, Q.L., N.Z., Y.Q. and Z.L.; validation, Q.L. and J.Y.; data curation, Q.L., N.Z. and J.M.; writing original draft preparation, Q.L.; writing review and editing, F.C., B.C., M.C. and H.Z.; supervision, F.C.; project administration, H.Z.; funding acquisition, F.C.; All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Liu, Q., Chen, F., Zhang, H. et al. YOLO-LSBA: A high-precision model for detecting stems of small-sized cherry tomatoes. Sci Rep (2026). https://doi.org/10.1038/s41598-026-46348-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-46348-6