Reasoning red teaming in healthcare not all paths to a desired outcome are desirable

Sorin, Vera; Korfiatis, Panagiotis; Nadkarni, Girish N.; Klang, Eyal

doi:10.1038/s41746-025-02104-0

Download PDF

Matters Arising
Open access
Published: 12 November 2025

Reasoning red teaming in healthcare not all paths to a desired outcome are desirable

Vera Sorin¹^na1,
Panagiotis Korfiatis¹^na1,
Girish N. Nadkarni^2,3^na2 &
…
Eyal Klang^2,3^na2

npj Digital Medicine volume 8, Article number: 649 (2025) Cite this article

1891 Accesses
2 Altmetric
Metrics details

Subjects

Matters Arising to this article was published on 12 November 2025

The Original Article was published on 07 March 2025

Abstract

Chang et al. showed that large language models can produce unsafe or biased outputs even when superficially accurate. We highlight that LLMs can hide harmful reasoning if only final responses are red-teamed. Monitoring intermediate inference steps, especially in ethically charged clinical scenarios, can reveal manipulative or unethical thought processes. We propose systematic testing of ethically sensitive prompts and thorough chain-of-thought analysis to ensure safe, trustworthy deployment in healthcare.

We read with interest Chang et al.’s “Red teaming ChatGPT in medicine to yield real-world insights on model behavior” in npj Digital Medicine¹. The paper provides a thoughtful, multidisciplinary framework for detecting harmful or inaccurate outputs from large language models (LLMs) in healthcare. Their red teaming analysis revealed that about 20% of all LLM responses were unsafe or contained bias. It highlights the need for deeper oversight before deploying LLMs for clinical use. We suggest that scrutiny of the models’ internal reasoning, beyond final answers, remains necessary.

Recent work by Baker et al.² shows that advanced LLMs may appear superficially compliant, while generating harmful or manipulative reasoning. Their study shows how monitoring intermediate inference steps, also called “chain-of-thought monitoring”, can detect misalignments in the model’s thought process. In some cases, the LLM might “hide” unethical intentions if it has been strongly optimized to avoid detection, suggesting that red teaming of final outputs alone is insufficient. In complex healthcare scenarios, such as end-of-life or resource allocation decisions, a single-step review may overlook harmful reasoning. While the final output might appear reasonable, the model’s underlying rationale could reflect ethically problematic assumptions. For example, a premature recommendation of palliative care in a resource-constrained setting, based on a patient’s age, disability, or socioeconomic status, as revealed by auditing intermediate reasoning steps³.

Modern “reasoning LLMs” rely on multi-step inference^4,5,6,7. OpenAI’s o1 model and DeepSeek’s R1 break problems into smaller tasks and refine partial solutions step by step^4,5. Another approach, the open-source S1 model⁸, uses a small set of carefully curated “reasoning data” for supervised fine-tuning for Chain-of-Thought, which can be length-controlled by a “wait” token. This method has been shown to boost performance on intricate reasoning problems⁸. As these approaches gain traction, we will likely see more inference-time compute solutions for healthcare tasks⁹. However, we may also encounter more subtle or sophisticated failure modes if unsafe reasoning is left unchecked¹⁰.

To address this, we propose two measures. First, similar to Baker et al. approach², systematically vary ethically charged variables (e.g., patient prognosis, resource scarcity) to pinpoint where the model’s reasoning suggests impermissible actions. Second, adopt thorough chain-of-thought analysis to identify manipulative or unethical rationales before they influence the final output. Because LLMs evolve so rapidly, institutions deploying them should formalize ongoing audits of these risk areas. However, there is an open question about who will conduct these audits. Likely, multidisciplinary teams involving clinicians, ethicists, developers, regulatory compliance experts, and patient advocates, supported by automated evaluation and explainability tools. With increased model complexity and rapid iteration, scalable and reliable post-deployment monitoring solutions will be crucial. Additionally, accountability structures are needed, assigning clear oversight and regular reporting responsibilities to maintain ongoing ethical and safety standards.

Chang et al.¹ rightly note that each new model version may regress or introduce new errors, necessitating continuous re-evaluation. We commend their approach and hope future efforts will systematically probe a model’s deeper reasoning. In healthcare, we must not trust the veneer of a final answer alone, rather we must ensure that the process generating that answer is sound.

Data availability

No datasets were generated or analyzed during the current study.

References

Chang, C. T. et al. Red teaming ChatGPT in medicine to yield real-world insights on model behavior. npj Digit. Med. 8, 149 (2025).
Article PubMed PubMed Central Google Scholar
Baker, B. et al. Monitoring reasoning models for misbehavior and the risks of promoting obfuscation. https://doi.org/10.48550/arXiv.2503.11926 (2025).
Sorin, V. et al. Socio-demographic modifiers shape large language models’ ethical decisions. J. Healthc. Inform. Res. https://doi.org/10.1007/s41666-025-00211- (2025).
OpenAI o1 System Card. https://openai.com/index/openai-o1-system-card/.
DeepSeek-AI, et al. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. https://doi.org/10.48550/arXiv.2501.12948 (2025).
Google. Gemini 2.5 Pro. https://deepmind.google/technologies/gemini/pro/.
Anthropic. Claude 3.7 Sonnet. https://www.anthropic.com/claude/sonnet.
Muennighoff, N. et al. s1: Simple test-time scaling. Preprint at https://doi.org/10.48550/arXiv.2501.19393 (2025).
Snell C., Lee J., Xu K. & Kumar A. Scaling llm test-time compute optimally can be more effective than scaling model parameters. Preprint at https://doi.org/10.48550/arXiv.2408.03314.
Klang, E., Tessler, I., Freeman, R., Sorin, V. & Nadkarni, G. N. If Machines Exceed Us: Health Care at an Inflection Point. NEJM AI 1, AIP2400559 (2024).
Article Google Scholar

Download references

Author information

These authors contributed equally: Vera Sorin, Panagiotis Korfiatis.
These authors jointly supervised this work: Girish N. Nadkarni, Eyal Klang.

Authors and Affiliations

Department of Radiology, Mayo Clinic, Rochester, MN, USA
Vera Sorin & Panagiotis Korfiatis
The Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Medical Center, New York, NY, USA
Girish N. Nadkarni & Eyal Klang
The Division of Data-Driven and Digital Medicine (D3M), Icahn School of Medicine at Mount Sinai, New York, NY, USA
Girish N. Nadkarni & Eyal Klang

Authors

Vera Sorin
View author publications
Search author on:PubMed Google Scholar
Panagiotis Korfiatis
View author publications
Search author on:PubMed Google Scholar
Girish N. Nadkarni
View author publications
Search author on:PubMed Google Scholar
Eyal Klang
View author publications
Search author on:PubMed Google Scholar

Contributions

V.S., P.K., G.N.N., and E.K. conceived and designed the letter. V.S. and P.K. drafted the initial version. G.N.N. and E.K. critically revised it. All authors approved the final manuscript.

Corresponding authors

Correspondence to Vera Sorin or Eyal Klang.

Ethics declarations

Competing interests

V.S., P.K., and E.K. declare no financial or non-financial competing interests. G.N.N. serves as associate editor of this journal and had no role in the peer review or decision to publish this manuscript.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Sorin, V., Korfiatis, P., Nadkarni, G.N. et al. Reasoning red teaming in healthcare not all paths to a desired outcome are desirable. npj Digit. Med. 8, 649 (2025). https://doi.org/10.1038/s41746-025-02104-0

Download citation

Received: 02 April 2025
Accepted: 21 October 2025
Published: 12 November 2025
Version of record: 12 November 2025
DOI: https://doi.org/10.1038/s41746-025-02104-0