The effects of multitype prompt engineering for large language models in hypertension treatment decisions

Li, Zeyan; Liu, Henyang; Tan, Wuping; Tang, Du; Duan, Shoupeng; Zhou, Bowen; Tang, Long; Hu, Xuyang; Huang, Liying; Zhao, Peng; Fang, Wenqiang; Wu, Bing; Liu, Jinjun; Wang, Yijun; Wang, Jun

doi:10.1038/s41746-026-02645-y

Download PDF

Article
Open access
Published: 15 April 2026

The effects of multitype prompt engineering for large language models in hypertension treatment decisions

Zeyan Li^1,2^na1,
Henyang Liu³^na1,
Wuping Tan⁴^na1,
Du Tang⁵^na1,
Shoupeng Duan³^na1,
Bowen Zhou⁶,
Long Tang⁷,
Xuyang Hu⁸,
Liying Huang⁹,
Peng Zhao¹,
Wenqiang Fang¹,
Bing Wu¹⁰^na2,
Jinjun Liu^1,11,12,13^na2,
Yijun Wang⁹^na2 &
…
Jun Wang^1,11,12,13^na2

npj Digital Medicine (2026) Cite this article

2899 Accesses
1 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

The effects of various prompt engineering on Large Language Models (LLMs) performance in hypertension decision-making are not yet fully understood. We evaluate the impact of different prompt engineering on LLM performance in hypertension treatment decision-making. We conducted a two-stage validation study using 300 de-identified simulated hypertension cases based on real-world clinical scenarios. ChatGPT-4.1 with Guidance-Self-Consistency achieved optimal performance (91.3% accuracy), nearing expert-level competency, while zero-shot prompting yielded worst results (62.7% with DeepSeek-V3). Optimal LLM assistance consistently enhanced physicians’ average accuracy across all levels (community hospital: 73.4% to 82.5%; county hospital: 84.0% to 87.9%; teaching hospital: 91.5% to 92.0%) and reduced inappropriate regimen rates. The worst LLM configurations decreased physician performance below baseline, increasing inappropriate regimen rates from 26.6% to 35.2% across all levels. Effectively designed prompt strategies enable LLMs to provide reliable hypertension treatment recommendations, thereby supporting physicians’ clinical decisions. This study has been trial-registered (ChiCTR2500099307, March 21, 2025).

A multi-layer retrieval-augmented large language model framework for enhancing hypertension education

Article 07 January 2026

ChatHTN: a consultation model for hypertension

Article Open access 09 April 2026

Benchmarking large language models against clinicians across hospital levels in cardiovascular decision-making: a cross-sectional vignette-based study

Article Open access 15 December 2025

Data availability

The data underlying this article will be shared on reasonable request to the corresponding author.

Code availability

The code underlying this article will be shared on reasonable request to the corresponding author.

References

Zhou, T. et al. Primary care institutional characteristics associated with hypertension awareness, treatment, and control in the China Peace Million Persons Project and Primary Health-Care Survey: a cross-sectional study. Lancet Glob. Health 11, e83–e94 (2023).
Google Scholar
Zhang, M. et al. Prevalence, awareness, treatment, and control of hypertension in China, 2004-18: findings from six rounds of a national survey. Bmj 380, e71952 (2023).
Google Scholar
Li, X. et al. The primary health-care system in China. Lancet 390, 2584–2594 (2017).
Google Scholar
Li, X. et al. Quality of primary health care in China: challenges and recommendations. Lancet 395, 1802–1812 (2020).
Google Scholar
Lu, Y. et al. Barriers to optimal clinician guideline adherence in management of markedly elevated blood pressure: a qualitative study. JAMA Netw. Open 7, e2426135 (2024).
Google Scholar
Wang, Y. et al. Efficacy of a wechat-based multimodal digital transformation management model in new-onset mild to moderate hypertension: randomized clinical trial. J. Med. Internet Res. 25, e52464 (2023).
Google Scholar
Song, J. et al. Learning implementation of a guideline based decision support system to improve hypertension treatment in primary care in China: pragmatic cluster randomised controlled trial. BMJ 386, e79143 (2024).
Google Scholar
Qiu, P. et al. Quantifying the reasoning abilities of LLMs on clinical cases. Nat. Commun. 16, 9799 (2025).
Google Scholar
Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat. Med. 31, 943–950 (2025).
Google Scholar
Yang, X. et al. Application of large language models in disease diagnosis and treatment. Chin. Med. J. 138, 130–142 (2025).
Google Scholar
Freyer, O., Wiest, I. C., Kather, J. N. & Gilbert, S. A future role for health applications of large language models depends on regulators enforcing safety standards. Lancet Digit Health 6, e662–e672 (2024).
Google Scholar
Hager, P. et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat. Med. 30, 2613–2622 (2024).
Google Scholar
Wang, L. et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. Npj Digit. Med. 7, 41 (2024).
Google Scholar
Anh-Hoang, D., Tran, V. & Nguyen, L. Survey and analysis of hallucinations in large language models: attribution to prompting strategies or model behavior. Front Artif. Intell. 8, 1622292 (2025).
Google Scholar
Wang Y. et al. A multi-layer retrieval-augmented large language model framework for enhancing hypertension education. Hypertens Res. 49, 1428–1440 (2026).
Wang Y. et al. Large language model agent for managing patients with suspected hypertension. Hypertension. 83, https://doi.org/10.1161/HYPERTENSIONAHA.125.25305 (2025).
Wang, Y. et al. Hyper-dream, a multimodal digital transformation hypertension management platform integrating large language model and digital phenotyping: multicenter development and initial validation study. J. Med. Syst. 49, 42 (2025).
Google Scholar
Zand J. et al. Performance of large language models in analyzing common hypertension scenarios. Hypertension. https://doi.org/10.1161/HYPERTENSIONAHA.125.25492 (2025).
Aguzzi, G. et al. Rag-enhanced open SLMs for hypertension management chatbots. J. Med. Syst. 49, 159 (2025).
Google Scholar
Shool, S. et al. A systematic review of large language model (LLM) evaluations in clinical medicine. BMC Med. Inf. Decis. Mak. 25, 117 (2025).
Google Scholar
Li, C. et al. Unveiling the potential of large language models in transforming chronic disease management: mixed methods systematic review. J. Med. Internet Res. 27, e70535 (2025).
Google Scholar
Shimbo, D. et al. Transforming hypertension diagnosis and management in the era of artificial intelligence: a 2023 national heart, lung, and blood institute (NHLBI) workshop report. Hypertension 82, 36–45 (2025).
Google Scholar
Lucas, M. M., Yang, J., Pomeroy, J. K. & Yang, C. C. Reasoning with large language models for medical question answering. J. Am. Med. Inf. Assoc. 31, 1964–1975 (2024).
Google Scholar
Kaiser, K. N. et al. Accuracy and consistency of publicly available large language models as clinical decision support tools for the management of colon cancer. J. Surg. Oncol. 130, 1104–1110 (2024).
Google Scholar
Sandmann, S. et al. Benchmark evaluation of DeepSeek large language models in clinical decision-making. Nat. Med. 31, 2546–2549 (2025).
Google Scholar
Bean, A. M. et al. Reliability of LLMs as medical assistants for the general public: a randomized preregistered study. Nat. Med. 32, 609–615 (2026).
Google Scholar
Wang, G. et al. Human-large language model collaboration in clinical medicine: a systematic review and meta-analysis. Npj Digit. Med. 9, 195 (2026).
Google Scholar
Shang, Y. et al. The effectiveness of large language models in medical AI research for physicians: a randomized controlled trial. Cell Rep. Med. 6, 102469 (2025).
Google Scholar
Agweyu A. et al. Safety of a large language model-based clinical decision support system in African primary healthcare. Nature Health (2026).
Goh, E. et al. Large language model influence on diagnostic reasoning: a randomized clinical trial. JAMA Netw. Open 7, e2440969 (2024).
Google Scholar
Pais, C. et al. Large language models for preventing medication direction errors in online pharmacies. Nat. Med. 30, 1574–1582 (2024).
Google Scholar
Costa, F. et al. Artificial intelligence in cardiovascular pharmacotherapy: applications and perspectives. Eur. Heart J. 46, 3616–3627 (2025).
Google Scholar
Shi, X. et al. The effectiveness of digital animation-based multistage education for patients with atrial fibrillation catheter ablation: randomized clinical trial. J. Med. Internet Res. 27, e65685 (2025).
Google Scholar
Zhou, T. et al. The effectiveness of nurse-led multidimensional digital cardiac rehabilitation in patients with unstable angina undergoing percutaneous coronary intervention: emulated target trial. J. Med. Internet Res. 27, e75325 (2025).
Google Scholar
Wang, J. et al. Multimodal data-driven, vertical visualization prediction model for early prediction of atherosclerotic cardiovascular disease in patients with new-onset hypertension. J. Hypertens. 42, 1757–1768 (2024).
Google Scholar
Clinical practice guideline for the management of hypertension in China. Chin. Med. J. (Engl.) 137, 2907–2952 (2024).
Garin, D. et al. Improving large language models accuracy for aortic stenosis treatment via heart team simulation: a prompt design analysis. Eur. Heart J. Digit Health 6, 665–674 (2025).
Google Scholar
Jeon, S. & Kim, H. A comparative evaluation of chain-of-thought-based prompt engineering techniques for medical question answering. Comput. Biol. Med. 196, 110614 (2025).
Google Scholar
Chen, B., Zhang, Z., Langrene, N. & Zhu, S. Unleashing the potential of prompt engineering for large language models. Patterns 6, 101260 (2025).
Google Scholar
Yu, Z. et al. Evaluating large language models for information extraction from gastroscopy and colonoscopy reports through multi-strategy prompting. J. Biomed. Inf. 168, 104844 (2025).
Google Scholar
Liu, J., Liu, F., Wang, C. & Liu, S. Prompt engineering in clinical practice: tutorial for clinicians. J. Med. Internet Res. 27, e72644 (2025).
Google Scholar
O’Sullivan, J. W. et al. A large language model for complex cardiology care. Nat. Med. 32, 616–623 (2026).
Google Scholar
Ghersin, I. et al. Comparative evaluation of a language model and human specialists in the application of European guidelines for the management of inflammatory bowel diseases and malignancies. Endoscopy 56, 706–709 (2024).
Google Scholar
Liu, X. et al. A generalist medical language model for disease diagnosis assistance. Nat. Med. 31, 932–942 (2025).
Google Scholar
Wang, M. et al. Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study. Npj Digit. Med. 9, 187 (2026).
Google Scholar

Download references

Acknowledgements

Clinical and Translational Research Project of Anhui Province (202427b10020086, 202427b10020089, 202427b10020097); Research Funds of Joint Research Center for Regional Diseases of IHM (2024bydik001, 2024bydjk002, 2024bydjk005); Anhui Provincial Health and Health Commission Scientific Research Project (AHWJ2024Aa10053); Science Research Project of Bengbu Medical University (2024byfy008); National Engineering Research Center of Science and Technology Information (2025STI135); The First Affiliated Hospital of Bengbu Medical University for Excellent Young Scholars (2025byyfyyq09).

Author information

These authors contributed equally: Zeyan Li, Henyang Liu, Wuping Tan, Du Tang, Shoupeng Duan.
These authors jointly supervised this work: Bing Wu, Jinjun Liu, Yijun Wang, Jun Wang.

Authors and Affiliations

Department of Cardiology, The First Affiliated Hospital of Bengbu Medical University, Bengbu, China
Zeyan Li, Peng Zhao, Wenqiang Fang, Jinjun Liu & Jun Wang
Department of Cardiology, Guizhou Provincial People’s Hospital, Guiyang, China
Zeyan Li
Department of Cardiology, Renmin Hospital of Wuhan University, Wuhan, China
Henyang Liu & Shoupeng Duan
Department of Cardiology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
Wuping Tan
Division of Cardiology, Department of Internal Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Du Tang
Department of Cardiology, Suzhou First People’s Hospital, Suzhou, China
Bowen Zhou
Department of Cardiology, The Affiliated Xuancheng Hospital of Wannan Medical College, Xuancheng, China
Long Tang
Department of Cardiovascular Medicine, Jieshou City People’s Hospital, Fuyang, China
Xuyang Hu
West China School of Medicine, Sichuan University, Chengdu, China
Liying Huang & Yijun Wang
Institute of Clinical Medicine and Department of Cardiology, Renmin Hospital, Hubei University of Medicine, Shiyan, China
Bing Wu
Joint Research Center for Regional Diseases of IHM, Bengbu Medical University, Bengbu, China
Jinjun Liu & Jun Wang
Joint Research Center for Regional Diseases of IHM, The First Affiliated Hospital of Bengbu Medical University, Bengbu, China
Jinjun Liu & Jun Wang
National Comprehensive Utilization of Science and Technology Information Resources and Public Service Center, Scientific and Technical Information (STI)-Zhilian Research Institute for Innovation and Digital Health, Beijing, China
Jinjun Liu & Jun Wang

Authors

Zeyan Li
View author publications
Search author on:PubMed Google Scholar
Henyang Liu
View author publications
Search author on:PubMed Google Scholar
Wuping Tan
View author publications
Search author on:PubMed Google Scholar
Du Tang
View author publications
Search author on:PubMed Google Scholar
Shoupeng Duan
View author publications
Search author on:PubMed Google Scholar
Bowen Zhou
View author publications
Search author on:PubMed Google Scholar
Long Tang
View author publications
Search author on:PubMed Google Scholar
Xuyang Hu
View author publications
Search author on:PubMed Google Scholar
Liying Huang
View author publications
Search author on:PubMed Google Scholar
Peng Zhao
View author publications
Search author on:PubMed Google Scholar
Wenqiang Fang
View author publications
Search author on:PubMed Google Scholar
Bing Wu
View author publications
Search author on:PubMed Google Scholar
Jinjun Liu
View author publications
Search author on:PubMed Google Scholar
Yijun Wang
View author publications
Search author on:PubMed Google Scholar
Jun Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Z.Y.L., H.Y.L., W.P.T., D.T., and S.P.D. conceived and performed the study. B.W.Z., L.T., X.Y.H., L.Y.H., P.Z. and W.Q.F. contributed to methodological optimisation, data processing, and model evaluation. Z.Y.L., H.Y.L., W.P.T., D.T. and S.P.D. performed data collection, analysis, and manuscript revision. B.W., J.J.L., Y.J.W., and J.W. supervised the clinical components of the study and drafted the article or critically revised it for important intellectual content. All authors approved the final manuscript and consented to its submission.

Corresponding authors

Correspondence to Bing Wu, Jinjun Liu, Yijun Wang or Jun Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Z., Liu, H., Tan, W. et al. The effects of multitype prompt engineering for large language models in hypertension treatment decisions. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02645-y

Download citation

Received: 08 December 2025
Accepted: 06 April 2026
Published: 15 April 2026
DOI: https://doi.org/10.1038/s41746-026-02645-y