Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
Comparative performance of LLMs and machine learning in predicting complications after percutaneous kyphoplasty for osteoporotic vertebral compression fractures
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 01 April 2026

Comparative performance of LLMs and machine learning in predicting complications after percutaneous kyphoplasty for osteoporotic vertebral compression fractures

  • Tianyi Wang1 na1,
  • Ruiyuan Chen1 na1,
  • Minghui Liang1 na1,
  • Han Ke1,
  • Baodong Wang1,
  • Ziqian Ma1,
  • Aobo Wang1,
  • Ning Fan1,
  • Shuo Yuan1 &
  • …
  • Lei Zang1 

npj Digital Medicine , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Diseases
  • Health care
  • Medical research
  • Risk factors

Abstract

Exploring large language models (LLMs) performance in the specific medical domain can help understand their generalizability in real-world application. We assessed the predictive and decision-support value of two state-of-the-art LLMs in predicting bone cement leakage (BCL) and new vertebral fractures (NVF) after percutaneous kyphoplasty (PKP) and to compare them with those of traditional machine learning (TML) and spine surgeon. This study utilized combined retrospective and prospective data at a single tertiary hospital. Two LLMs (GPT-5 and DeepSeek R1) with zero- and few-shot strategy, five TML models, and two spine surgeons with/without exposure to LLM responses, were asked to predict complications based on demographic, perioperative baseline, and radiographic data. We also tested LLMs’ ability to predict complication subtype. For BCL prediction, both LLMs demonstrated acceptable performance (F1-score, 0.857–0.871; MCC, 0.164–0.332) under zero-shot conditions, comparable to TML models (F1-score, 0.758–0.867; MCC, 0.265–0.416), and slightly superior to surgeons alone (F1-score, 0.675–0.684; MCC, 0.074–0.185). Few-shot prompting enhanced specificity but yielded uncertain overall gains. For NVF prediction, the zero-shot LLM performance was poor (F1-score, 0.309; MCC, 0.044) but improved with few-shot learning. The RBF-SVM model showed the best performance for NVF prediction (F1-score, 0.536; MCC, 0.414). LLM explanations enhanced surgeon performance in BCL prediction but not in NVF. LLMs showed poor prediction of complication subtypes. The findings suggest that current LLMs hold diverse predictive performances for different complications after PKP, they are still immature for real clinical applicability and need further improvement.

Similar content being viewed by others

Machine learning algorithms for prediction of cerebrospinal fluid leakage after posterior surgery for thoracic ossification of the ligamentum flavum

Article Open access 03 July 2025

Predicting proximal junctional failure in adult spinal deformity patients using machine learning models based on spinal alignment parameters

Article Open access 20 November 2025

A novel puncture approach via point “O” for percutaneous kyphoplasty in patients with L4 or L5 osteoporotic vertebral compression fracture

Article Open access 07 November 2022

Data availability

The datasets generated and/or analyzed during the current study are not publicly available due to the institutional and participant privacy considerations, but are available from the corresponding author on reasonable request.

Code availability

The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.

References

  1. Alsoof, D. et al. Diagnosis and management of vertebral compression fracture. Am. J. Med. 135, 815–821 (2022).

    Google Scholar 

  2. Ballane, G., Cauley, J. A., Luckey, M. M. & El-Hajj Fuleihan, G. Worldwide prevalence and incidence of osteoporotic vertebral fractures. Osteoporos. Int. 28, 1531–1542 (2017).

    Google Scholar 

  3. Wu, Y. et al. Risk factors for cement leakage after percutaneous vertebral augmentation for osteoporotic vertebral compression fractures: a meta-analysis. Int. J. Surg. 111, 1231–1243 (2025).

    Google Scholar 

  4. Ebeling, P. R. et al. The efficacy and safety of vertebral augmentation: a second asbmr task force report. J. Bone Miner. Res. 34, 3–21 (2019).

    Google Scholar 

  5. Expert Panels on Neurological Imaging, Interventional Radiology, and Musculoskeletal Imaging, et al. ACR Appropriateness Criteria® Management of Vertebral Compression Fractures: 2022 Update. J. Am. Coll. Radiol. 20, S102–S124 (2023).

  6. NASS. Evidence-based clinical guidelines for multidisciplinary spine care: diagnosis & treatment of adults with osteoporotic vertebral compression fractures. (2024), accessed 10 June 2025. [https://www.spine.org/Portals/0/assets/downloads/ResearchClinicalCare/Guidelines /Osteoporotic-Vertebral-Compression-Fractures.pdf].

  7. Hsieh, M. K., Chen, L. H. & Chen, W. J. Current concepts of percutaneous balloon kyphoplasty for the treatment of osteoporotic vertebral compression fractures: evidence-based review. Biomed. J. 36, 154–161 (2013).

    Google Scholar 

  8. Robinson, Y., Heyde, C. E., Försth, P. & Olerud, C. Kyphoplasty in osteoporotic vertebral compression fractures-guidelines and technical considerations. J. Orthop. Surg. Res. 6, 43 (2011).

    Google Scholar 

  9. Li, W. et al. Establishment and validation of a nomogram and web calculator for the risk of new vertebral compression fractures and cement leakage after percutaneous vertebroplasty in patients with osteoporotic vertebral compression fractures. Eur. Spine J. 31, 1108–1121 (2022).

    Google Scholar 

  10. Zhong, B. Y. et al. Nomogram for predicting intradiscal cement leakage following percutaneous vertebroplasty in patients with osteoporotic related vertebral compression fractures. Pain. physician 20, E513–E520 (2017).

    Google Scholar 

  11. Ding, J. et al. Risk factors for predicting cement leakage following percutaneous vertebroplasty for osteoporotic vertebral compression fractures. Eur. Spine J. 25, 3411–3417 (2016).

    Google Scholar 

  12. Tao, W. et al. Predictive factors for adjacent vertebral fractures after percutaneous kyphoplasty in patients with osteoporotic vertebral compression fracture. Pain. Phys. 25, E725–E732 (2022).

    Google Scholar 

  13. Park, J. S. & Park, Y. S. Survival analysis and risk factors of new vertebral fracture after vertebroplasty for osteoporotic vertebral compression fracture. Spine J. 21, 1355–1361 (2021).

    Google Scholar 

  14. Yang, S., Liu, Y., Yang, H. & Zou, J. Risk factors and correlation of secondary adjacent vertebral compression fracture in percutaneous kyphoplasty. Int. J. Surg. 36, 138–142 (2016).

    Google Scholar 

  15. Hu, Y. L. et al. Interpretable machine learning model to predict bone cement leakage in percutaneous vertebral augmentation for osteoporotic vertebral compression fracture based on SHapley Additive exPlanations. Glob. Spine J. 15, 689–701 (2025).

    Google Scholar 

  16. Deng, G. et al. Application of machine learning in prediction of bone cement leakage during single-level thoracolumbar percutaneous vertebroplasty. BMC Surg. 23, 63 (2023).

    Google Scholar 

  17. Li, W. et al. Machine learning applications for the prediction of bone cement leakage in percutaneous vertebroplasty. Front. Public Health 9, 812023 (2021).

    Google Scholar 

  18. Howell, M. D., Corrado, G. S. & DeSalvo, K. B. Three epochs of artificial intelligence in health care. JAMA 331, 242–244 (2024).

    Google Scholar 

  19. Tam, T. Y. C. et al. A framework for human evaluation of large language models in healthcare derived from literature review. NPJ Digit. Med. 7, 258 (2024).

    Google Scholar 

  20. Bedi, S. et al. Testing and evaluation of health care applications of large language models: a systematic review. JAMA 333, 319–328 (2025).

    Google Scholar 

  21. Liu, Y. et al. Functional outcome prediction in acute ischemic stroke using a fused imaging and clinical deep learning model. Stroke 54, 2316–2327 (2023).

    Google Scholar 

  22. Jiang, Y. et al. Predicting peritoneal recurrence and disease-free survival from CT images in gastric cancer with multitask deep learning: a retrospective study. Lancet Digit. Health 4, e340–e350 (2022).

    Google Scholar 

  23. Goedmakers, C. M. W. et al. Deep learning for adjacent segment disease at preoperative MRI for cervical radiculopathy. Radiology 301, 664–671 (2021).

    Google Scholar 

  24. Davis, S. E., Walsh, C. G. & Matheny, M. E. Open questions and research gaps for monitoring and updating AI-enabled tools in clinical settings. Front. Digit. Health 4, 958284 (2022).

    Google Scholar 

  25. Chung, P. et al. Large language model capabilities in perioperative risk prediction and prognostication. JAMA Surg. 159, 928–937 (2024).

    Google Scholar 

  26. Amacher, S. A. et al. Can the large language model ChatGPT-4omni predict outcomes in adult patients with status epilepticus? Epilepsia 66, 674–685 (2025).

    Google Scholar 

  27. Amacher, S. A. et al. Prediction of outcomes after cardiac arrest by a generative artificial intelligence model. Resusc. Plus 18, 100587 (2024).

    Google Scholar 

  28. Gakuba, C. et al. Evaluation of ChatGPT in predicting 6-month outcomes after traumatic brain injury. Crit. Care Med. 52, 942–950 (2024).

    Google Scholar 

  29. Glicksberg, B. S. et al. Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room. J. Am. Med. Inform. Assoc. 31, 1921–1928 (2024).

    Google Scholar 

  30. Huang, X. et al. Predicting glaucoma before onset using a large language model chatbot. Am. J. Ophthalmol. 266, 289–299 (2024).

    Google Scholar 

  31. Brown, K. E. et al. Large language models are less effective at clinical prediction tasks than locally trained machine learning models. J. Am. Med. Inform. Assoc. 32, 811–822 (2025).

    Google Scholar 

  32. Zhu, X. et al. Fully automatic deep learning model for spine refracture in patients with OVCF: a multi-center study. Orthop. Surg. 16, 2052–2065 (2024).

    Google Scholar 

  33. Kanjee, Z., Crowe, B. & Rodman, A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 330, 78–80 (2023).

    Google Scholar 

  34. Chen, R. et al. Deep Learning-Based Prediction for Bone Cement Leakage During Percutaneous Kyphoplasty Using Preoperative Computed Tomography: MODEL Development and Validation. Spine. https://doi.org/10.1097/BRS.0000000000005448 (2025).

  35. Zhang, Z. L., Yang, J. S., Hao, D. J., Liu, T. J. & Jing, Q. M. Risk factors for new vertebral fracture after percutaneous vertebroplasty for osteoporotic vertebral compression fractures. Clin. Interv. Aging 16, 1193–1200 (2021).

    Google Scholar 

  36. Meskó, B. Prompt engineering as an important emerging skill for medical professionals: tutorial. J. Med. Internet Res. 25, e50638 (2023).

    Google Scholar 

  37. Pu, Z. et al. ChatGPT and generative AI are revolutionizing the scientific community: a Janus-faced conundrum. iMeta 3, e178 (2024).

    Google Scholar 

  38. Laymouna, M. et al. Roles, users, benefits, and limitations of chatbots in health care: rapid review. J. Med. Internet Res. 26, e56930 (2024).

    Google Scholar 

  39. Collins, G. S. et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 385, e078378 (2024).

    Google Scholar 

  40. Xi, Y. et al. Deep learning-based multimodal image analysis predicts bone cement leakage during percutaneous kyphoplasty: protocol for model development, and validation by prospective and external datasets. Front. Med. 11, 1479187 (2024).

    Google Scholar 

  41. Charlson, M., Szatrowski, T. P., Peterson, J. & Gold, J. Validation of a combined comorbidity index. J. Clin. Epidemiol. 47, 1245–1251 (1994).

    Google Scholar 

  42. Yang, K. et al. Bone cement distribution patterns in vertebral augmentation for osteoporotic vertebral compression fractures: a systematic review. J. Orthop. Surg. Res. 20, 568 (2025).

    Google Scholar 

  43. Fan, N. et al. A predictive nomogram for intradiscal cement leakage in percutaneous kyphoplasty for osteoporotic vertebral compression fractures combined with intravertebral cleft. Front. Surg. 9, 1005220 (2022).

    Google Scholar 

  44. Wu, J., Wang, Z. & Qin, Y. Performance of DeepSeek-R1 and ChatGPT-4o on the Chinese National Medical Licensing Examination: a comparative study. J. Med. Syst. 49, 74 (2025).

    Google Scholar 

  45. Yu, A., Li, A., Ahmed, W., Saturno, M. & Cho, S. K. Evaluating artificial intelligence in spinal cord injury management: a comparative analysis of ChatGPT-4o and Google Gemini against American College of Surgeons best practices guidelines for spine injury. Glob. Spine J. 15, 3199–3220 (2025).

    Google Scholar 

Download references

Acknowledgements

We sincerely thank DP and LJ for their valuable investigation and evaluation in this research. No funding was received for conducting this study.

Author information

Author notes
  1. These authors contributed equally: Tianyi Wang, Ruiyuan Chen, Minghui Liang.

Authors and Affiliations

  1. Department of Orthopedics, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China

    Tianyi Wang, Ruiyuan Chen, Minghui Liang, Han Ke, Baodong Wang, Ziqian Ma, Aobo Wang, Ning Fan, Shuo Yuan & Lei Zang

Authors
  1. Tianyi Wang
    View author publications

    Search author on:PubMed Google Scholar

  2. Ruiyuan Chen
    View author publications

    Search author on:PubMed Google Scholar

  3. Minghui Liang
    View author publications

    Search author on:PubMed Google Scholar

  4. Han Ke
    View author publications

    Search author on:PubMed Google Scholar

  5. Baodong Wang
    View author publications

    Search author on:PubMed Google Scholar

  6. Ziqian Ma
    View author publications

    Search author on:PubMed Google Scholar

  7. Aobo Wang
    View author publications

    Search author on:PubMed Google Scholar

  8. Ning Fan
    View author publications

    Search author on:PubMed Google Scholar

  9. Shuo Yuan
    View author publications

    Search author on:PubMed Google Scholar

  10. Lei Zang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

T.W., R.C., and M.L. contributed equally to this work. Conceptualization: L.Z., T.W., and R.C.; methodology: T.W., R.C., and M.L.; formal analysis and investigation: T.W., R.C., M.L., H.K., B.W., and Z.M.; writing—original draft preparation: T.W. and R.C.; writing—review and editing: M.L., H.K., B.W., Z.M., A.W., N.F., and S.Y.; resources: L.Z.; supervision: L.Z.

Corresponding author

Correspondence to Lei Zang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Chen, R., Liang, M. et al. Comparative performance of LLMs and machine learning in predicting complications after percutaneous kyphoplasty for osteoporotic vertebral compression fractures. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02588-4

Download citation

  • Received: 21 November 2025

  • Accepted: 17 March 2026

  • Published: 01 April 2026

  • DOI: https://doi.org/10.1038/s41746-026-02588-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Evaluating the Real-World Clinical Performance of AI

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing