Abstract
Accurate measurement of bladder volume is essential for diagnosing urinary retention and voiding dysfunction. However, finding optimal view can be challenging for less experienced operators, potentially leading to suboptimal imaging and potential misdiagnoses. This study proposes an intelligent guidance system leveraging reinforcement learning (RL) to improve the acquisition of ultrasound images in ultrasound bladder scanning procedure. We introduce a novel pipeline that incorporates a practical variant of Deep Q-Networks (DQN), known as Adam LMCDQN, which is theoretically validated within linear Markov Decision Processes. Our system aims to offer real-time, adaptive feedback to operators, improving image quality and consistency. We also present a novel domain-specific reward design for reinforcement learning (RL), incorporating domain knowledge to enhance performance. Our results demonstrate a promising \(81 \%\) success rate in reaching target points along the transverse direction and \(67 \%\) along the longitudinal direction, significantly outperforming supervised deep learning models, which achieved \(58 \%\) and \(32 \%\), respectively. This work is among the first to apply RL in ultrasound guidance for bladder assessment, demonstrating the technical feasibility of optimal-view localization in a simulated environment and exploring exploration strategies and reward formulations relevant to the guidance task.
Similar content being viewed by others

Data availability
The data supporting the findings of this study are available from the corresponding author, Mohsen Zahiri, upon reasonable request.
References
Kelly, C. E. Evaluation of voiding dysfunction and measurement of bladder volume. Rev. Urol.6, S32 (2004).
Bent, A., Nahhas, D. & McLennan, M. Portable ultrasound determination of urinary residual volume. Int. Urogynecol. J.8, 200–202 (1997).
Coombes, G. M. & Millard, R. J. The accuracy of portable ultrasound scanning in the measurement of residual urine volume. J. Urol.152, 2083–2085 (1994).
Krogh, C. L. et al. Effect of ultrasound training of physicians working in the prehospital setting. Scand. J. Trauma, Resusc. Emerg. Med.24, 1–7 (2016).
Toporek, G. et al. User guidance for point-of-care echocardiography using a multi-task deep neural network. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part V 22, 309–317 (Springer, 2019).
Li, K., Li, A., Xu, Y., Xiong, H. & Meng, M.Q.-H. Rl-tee: Autonomous probe guidance for transesophageal echocardiography based on attention-augmented deep reinforcement learning. IEEE Trans. Autom. Sci. Eng.21, 1526–1538 (2023).
Bi, Y., Qian, C., Zhang, Z., Navab, N. & Jiang, Z. Autonomous path planning for intercostal robotic ultrasound imaging using reinforcement learning. arXiv preprint arXiv:2404.09927 (2024).
Jarosik, P. & Lewandowski, M. Automatic ultrasound guidance based on deep reinforcement learning. In 2019 IEEE International Ultrasonics Symposium (IUS), 475–478 (IEEE, 2019).
Hase, H. et al. Ultrasound-guided robotic navigation with deep reinforcement learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5534–5541 (IEEE, 2020).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature518, 529–533 (2015).
Ishfaq, H. et al. Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo. In The Twelfth International Conference on Learning Representations (2024).
Droste, R., Drukker, L., Papageorghiou, A. T. & Noble, J. A. Automatic probe movement guidance for freehand obstetric ultrasound. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III 23, 583–592 (Springer, 2020).
Jiang, Z. et al. Intelligent robotic sonographer: Mutual information-based disentangled reward learning from few demonstrations. Int. J. Robot. Res.43, 981–1002 (2024).
Milletari, F., Birodkar, V. & Sofka, M. Straight to the point: Reinforcement learning for user guidance in ultrasound. In Smart Ultrasound Imaging and Perinatal, Preterm and Paediatric Image Analysis: First International Workshop, SUSI 2019, and 4th International Workshop, PIPPI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13 and 17, 2019, Proceedings 4, 3–10 (Springer, 2019).
Li, K. et al. Image-guided navigation of a robotic ultrasound probe for autonomous spinal sonography using a shadow-aware dual-agent framework. IEEE Trans. Med. Robot. Bionics4, 130–144 (2021).
Jin, T., Xu, P., Xiao, X. & Anandkumar, A. Finite-time regret of Thompson sampling algorithms for exponential family multi-armed bandits. Adv. Neural Inf. Process. Syst.35, 38475–38487 (2022).
Jin, T., Yang, X., Xiao, X. & Xu, P. Thompson sampling with less exploration is fast and optimal. In International Conference on Machine Learning, 15239–15261 (PMLR, 2023).
Jin, T., Hsu, H.-L., Chang, W. & Xu, P. Finite-time frequentist regret bounds of multi-agent Thompson sampling on sparse hypergraphs. Proc. AAAI Conf. Artif. Intell.38, 12956–12964 (2024).
Xu, P., Zheng, H., Mazumdar, E. V., Azizzadenesheli, K. & Anandkumar, A. Langevin monte carlo for contextual bandits. In International Conference on Machine Learning, 24830–24850 (PMLR, 2022).
Osband, I., Van Roy, B. & Wen, Z. Generalization and exploration via randomized value functions. In International Conference on Machine Learning, 2377–2386 (PMLR, 2016).
Russo, D. Worst-case regret bounds for exploration via randomized value functions. Adv. Neural Inf. Process. Syst.32, 14410–14420 (2019).
Agrawal, P., Chen, J. & Jiang, N. Improved worst-case regret bounds for randomized least-squares value iteration. Proc. AAAI Conf. Artif. Intell.35, 6566–6573 (2021).
Zanette, A., Brandfonbrener, D., Brunskill, E., Pirotta, M. & Lazaric, A. Frequentist regret bounds for randomized least-squares value iteration. In International Conference on Artificial Intelligence and Statistics, 1954–1964 (PMLR, 2020).
Ishfaq, H. et al. Randomized exploration in reinforcement learning with general value function approximation. In International Conference on Machine Learning, 4607–4616 (PMLR, 2021).
Osband, I., Blundell, C., Pritzel, A. & Roy, B. V. Deep exploration via bootstrapped dqn. Advances in neural information processing systems29 (2016).
Osband, I., Aslanides, J. & Cassirer, A. Randomized prior functions for deep reinforcement learning. Advances in neural information processing systems31 (2018).
Karbasi, A., Kuang, N. L., Ma, Y. & Mitra, S. Langevin thompson sampling with logarithmic communication: Bandits and reinforcement learning. In Krause, A. et al. (eds.) Proceedings of the 40th International Conference on Machine Learning, vol. 202 of Proceedings of Machine Learning Research, 15828–15860 (PMLR, 2023).
Hsu, H.-L., Wang, W., Pajic, M. & Xu, P. Randomized exploration in cooperative multi-agent reinforcement learning. Advances in neural information processing systems (2024).
Hsu, H.-L. & Pajic, M. Robust exploration with adversary via langevin monte carlo. In 6th Annual Learning for Dynamics & Control Conference, 1592–1605 (PMLR, 2024).
Brockman, G. et al. Openai gym. arXiv preprint arXiv:1606.01540 (2016).
Bi, Y. et al. Vesnet-rl: Simulation-based reinforcement learning for real-world us probe navigation. IEEE Robot. Autom. Lett.7, 6638–6645 (2022).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4510–4520 (2018).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).
Funding
The study described in this abstract was funded in part with federal funds from the U.S. Department of Health and Human Services (HHS); Administration for Strategic Preparedness and Response (ASPR); Biomedical Advanced Research and Development Authority (BARDA), under contract number 75A50120C00097. The contract and federal funding are not an endorsement of the study results, product or company.
Author information
Authors and Affiliations
Contributions
H.H, M.Z., G.G., R.A.M., H.L., G.Y.L., and B.R. conceived the project idea. M.Z., G.G., and J.G. designed the data collection protocol. J.G. and M.G.W. recruited the subjects for the study and managed the IRB process. M.Z., G.G., J.G., and G.Y.L. performed the ultrasound data acquisition. M.Z., H.H, and S.S. prepared the data for reinforcement learning (RL) and supervised learning training. H.H and M.Z. designed, developed, and trained the RL models, and supervised algorithm development and data analysis. H.H wrote the initial draft of the manuscript, and M.Z., G.Y.L., and R.A.M. critically revised it. All authors reviewed and approved the final manuscript before submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hsu, HL., Zahiri, M., Li, G. et al. Active guidance in ultrasound bladder scanning using reinforcement learning. Sci Rep (2026). https://doi.org/10.1038/s41598-026-35285-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-35285-z

