Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Performance comparison of large language models in boron neutron capture therapy knowledge assessment
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 16 January 2026

Performance comparison of large language models in boron neutron capture therapy knowledge assessment

  • Shumin Shen1 na1,
  • Shanghu Wang2,3 na1,
  • Mingzhu Gao1,
  • Yucai Yang1,
  • Xiuwei Wu1,
  • Jinjin Wu1,
  • Dachen Zhou4 &
  • …
  • Nianfei Wang1,3 

Scientific Reports , Article number:  (2026) Cite this article

  • 380 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Cancer
  • Oncology

Abstract

Accelerator-based boron neutron capture therapy (BNCT) is a binary radiation therapy that has rapidly developed in recent years. This study systematically evaluated and compared the performance of four mainstream model families [ChatGPT, Bard (Gemini), Claude, and ERNIE Bot] in answering BNCT-related knowledge questions, providing a reference for exploring their potential in BNCT professional education. Forty-seven bilingual BNCT questions covering key concepts, clinical practice, and reasoning tasks were constructed. Four mainstream model families [ ChatGPT, Claude, Bard(Gemini), and ERNIE Bot] were tested across five rounds in two languages and question formats. The accuracy, reasoning ability, uncertainty expression, and version effects were analyzed. ChatGPT (72.8%) and Claude (70.4%) showed significantly higher overall accuracy rates than Bard(Gemini) (62.0%) and ERNIE Bot (55.6%) (p < 0.001). Both high-performance models performed significantly better on reasoning-based questions than on fact-based questions (p < 0.001). The average performance improvement from version updates (7.51 ± 8.46percentage points) was numerically higher than the changes during same-version maintenance (0.61 ± 8.68 percentage points, p = 0.126). Although language and questioning methods showed statistically significant effects, the effect sizes were minimal (η2p < 0.01). Uncertainty acknowledgment rates varied significantly among the model families (4.7%-23.7%, p = 0.003). ChatGPT can provide relatively accurate knowledge for the popularization of BNCT. However, existing general-purpose LLMs still cannot accurately answer all BNCT questions and show significant differences in uncertainty expression.

Data availability

The datasets generated or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Cheng, X., Li, F. & Liang, L. Boron neutron capture therapy: clinical application and research progress. Curr. Oncol. tor. Ont. 29, 7868–7886 (2022).

    Google Scholar 

  2. Shi, Y. et al. Localized nuclear reaction breaks boron drug capsules loaded with immune adjuvants for cancer immunotherapy. Nat. Commun. 14, 1884 (2023).

    Google Scholar 

  3. Mirzaei, H. R. et al. Boron neutron capture therapy: moving toward targeted cancer therapy. J. Cancer Res. Ther. 12, 520–525 (2016).

    Google Scholar 

  4. Sauerwein, W. A. G. Principles and roots of neutron capture therapy. In Neutron Capture Therapy: Principles and Applications (eds Sauerwein, W. et al.) 1–16 (Springer, 2012). https://doi.org/10.1007/978-3-642-31334-9_1.

    Google Scholar 

  5. Matsumoto, Y. et al. A critical review of radiation therapy: from particle beam therapy (proton, carbon, and BNCT) to beyond. J. Pers. Med. 11, 825 (2021).

    Google Scholar 

  6. Matsumura, A. et al. Initiatives toward clinical boron neutron capture therapy in Japan. Cancer Biother. Radiopharm. 38, 201–207 (2023).

    Google Scholar 

  7. Zhou, T. et al. The current status and novel advances of boron neutron capture therapy clinical trials. Am. J. Cancer Res. 14, 429–447 (2024).

    Google Scholar 

  8. Japanese society of neutron capture therapy | home. http://www.jsnct.jp/e/.

  9. Brin, D. et al. How large language models perform on the United States medical licensing examination: a systematic review. 2023.09.03.23294842 Preprint at https://doi.org/10.1101/2023.09.03.23294842 (2023).

  10. Anderson, L. W. & Krathwohl, D. R. A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom’s Taxonomy of Educational Objectives (Longman, 2001).

    Google Scholar 

  11. Mayer, R. E. Rote versus meaningful learning. Theory Pract. 41, 226–232 (2002).

    Google Scholar 

  12. Halpern, D. F. Thought and Knowledge: An Introduction to Critical Thinking 5th edn. (Psychology Press, 2014).

    Google Scholar 

  13. Nitko, A. J. & Brookhart, S. M. Educational Assessment of Students (Pearson/Allyn & Bacon, 2011).

    Google Scholar 

  14. OpenAI et al. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024).

  15. Ii, M. B. & Katz, D. M. GPT takes the bar exam. Preprint at https://doi.org/10.48550/arXiv.2212.14402 (2022).

  16. Suárez, A. et al. Unveiling the ChatGPT phenomenon: Evaluating the consistency and accuracy of endodontic question answers. Int. Endod. J. 57, 108–113 (2024).

    Google Scholar 

  17. Azamfirei, R., Kudchadkar, S. R. & Fackler, J. Large language models and the perils of their hallucinations. Crit. Care 27, 120 (2023).

    Google Scholar 

  18. Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).

    Google Scholar 

  19. Mirzadeh, I. et al. GSM-symbolic: understanding the limitations of mathematical reasoning in large language models. Preprint at https://doi.org/10.48550/arXiv.2410.05229 (2024).

  20. Webb, T., Holyoak, K. J. & Lu, H. Emergent analogical reasoning in large language models. Preprint at https://doi.org/10.48550/arXiv.2212.09196 (2023).

  21. H, Z. et al. Cancer gene identification through integrating causal prompting large language model with omics data-driven causal inference. Brief. Bioinform. 26, (2025).

  22. Qiu, P. et al. Towards building multilingual language model for medicine. Nat. Commun. 15, 8384 (2024).

    Google Scholar 

  23. Qin, L. et al. A survey of multilingual large language models. Patterns 6, (2025).

  24. Lai, V. D. et al. Okapi: instruction-tuned large language models in multiple languages with reinforcement learning from human feedback. Preprint at https://doi.org/10.48550/arXiv.2307.16039 (2023).

  25. Wang, L. et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digital Med. 7, 41 (2024).

    Google Scholar 

  26. Maharjan, J. et al. OpenMedLM: prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Sci. Rep. 14, 14156 (2024).

    Google Scholar 

  27. Asgari, E. et al. A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. NPJ Digital Med. 8, 274 (2025).

    Google Scholar 

  28. Hao, G., Wu, J., Pan, Q. & Morello, R. Quantifying the uncertainty of LLM hallucination spreading in complex adaptive social networks. Sci. Rep. 14, 16375 (2024).

    Google Scholar 

  29. Kim, Y. et al. Medical hallucination in foundation models and their impact on healthcare. 2025.02.28.25323115 Preprint at https://doi.org/10.1101/2025.02.28.25323115 (2025).

  30. Liu, M. et al. Evaluating the effectiveness of advanced large language models in medical knowledge: a comparative study using japanese national medical examination. Int. J. Med. Inf. 193, 105673 (2025).

    Google Scholar 

  31. Introducing the next generation of claude. https://www.anthropic.com/news/claude-3-family.

Download references

Funding

This study was supported by the Natural Science Research Project for Anhui Universities (No. KJ2021A0311) and the Anhui Province Health Research Project (No. AHWJ2022b082). The funding sources were not involved in the research design, data collection, analysis, Writing, or publication decisions.

Author information

Author notes
  1. Shumin Shen and Shanghu Wang contributed equally to this work and are co-first authors.

Authors and Affiliations

  1. Department of Oncology, The Second Hospital of Anhui Medical University, Hefei, 230601, China

    Shumin Shen, Mingzhu Gao, Yucai Yang, Xiuwei Wu, Jinjin Wu & Nianfei Wang

  2. Department of Radiotherapy, Anhui Chest Hospital, Hefei, 230026, China

    Shanghu Wang

  3. Hefei Cancer Hospital of CAS, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences (CAS), Beijing, China

    Shanghu Wang & Nianfei Wang

  4. Department of General Surgery, The Second Hospital of Anhui Medical University, Hefei, 230601, China

    Dachen Zhou

Authors
  1. Shumin Shen
    View author publications

    Search author on:PubMed Google Scholar

  2. Shanghu Wang
    View author publications

    Search author on:PubMed Google Scholar

  3. Mingzhu Gao
    View author publications

    Search author on:PubMed Google Scholar

  4. Yucai Yang
    View author publications

    Search author on:PubMed Google Scholar

  5. Xiuwei Wu
    View author publications

    Search author on:PubMed Google Scholar

  6. Jinjin Wu
    View author publications

    Search author on:PubMed Google Scholar

  7. Dachen Zhou
    View author publications

    Search author on:PubMed Google Scholar

  8. Nianfei Wang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

**Shen Shumin** : Conceptualization, Data Analysis, Writing – original draft, Writing – review & editing. **Wang Shanghu** : Conceptualization, Data collection, Writing – original draft, Writing – review & editing. **Gao Mingzhu** : Data curation, Methodology, Writing – review & editing. **Yang Yuecai** : Methodology, Investigation, Writing – review & editing. **Wu Xiuwei** : Data curation, Investigation, Writing – review & editing **Wu Jinjin** : Data curation, Investigation, Writing – review & editing **Zhou Dacheng** : Data curation, Methodology, Writing – review & editing. **Wang Nianfei** : Conceptualization, Project administration, Supervision, Writing-review & editing.

Corresponding authors

Correspondence to Dachen Zhou or Nianfei Wang.

Ethics declarations

Competing interests

The authors declare that there are no conflicts of interest. There are also no conflicts of interest between the author and the company that tested the LLMs.

Ethical approval

This study used only publicly available Internet data and did not involve human subjects. Therefore, no specific ethical considerations were required in this study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Supplementary Information 2.

Supplementary Information 3.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shen, S., Wang, S., Gao, M. et al. Performance comparison of large language models in boron neutron capture therapy knowledge assessment. Sci Rep (2026). https://doi.org/10.1038/s41598-026-36322-7

Download citation

  • Received: 21 August 2024

  • Accepted: 12 January 2026

  • Published: 16 January 2026

  • DOI: https://doi.org/10.1038/s41598-026-36322-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Boron neutron capture therapy
  • Large language model
  • ChatGPT
  • Bard (Gemini)
  • Claude
  • ERNIE Bot
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer