Introduction

Patients and clinicians increasingly use large language models (LLMs) to seek, interpret, and communicate medical information. Roughly one in five adults turns to LLMs for health advice, and clinician interest in communication and research applications is rising1,2,3,4. Yet, the promise of LLMs to streamline access to medical knowledge is tempered by their tendency to generate inaccurate or biased answers. Models can fabricate plausible information (called hallucinations) or be manipulated to generate harmful or misleading content5,6,7,8. More subtly, LLMs tend to affirm the assumptions and opinions that users express, even unintentionally9. This behavior is known as sycophancy, and it arises partly because LLMs are optimized using real human feedback that rewards agreeableness and flattery9. While LLMs are resultingly more pleasant to interact with, sycophancy threatens to reinforce user biases and spread misinformation by persuasively restating faulty inputs as medical fact9,10,11.

Why sycophancy spreads misinformation

In “When Helpfulness Backfires: LLMs and the Risk of False Medical Information Due to Sycophantic Behavior”, Chen et al. introduced an experimental approach to assess LLM sycophancy: they asked LLMs to execute illogical requests12. Specifically, five popular LLMs (three versions of ChatGPT and two of Llama-3) were asked to write advisories recommending that patients switch from brand-name to generic versions of drugs due to safety concerns. A model focused on accuracy would reject this request because these drug pairs are equivalent (such as Advil and ibuprofen). Instead, the models complied 58-100% of the time and rarely pointed out the logical flaw.

Sycophancy in these straightforward use cases is concerning. Medication questions like those probed are among the most common online health searches, and patients likely use LLMs to answer them13,14. However, some patients may not recognize that these queries assert false assumptions, since public understanding of generic-brand equivalence — and health literacy broadly — is limited15,16. As medical misinformation proliferates, inaccurate or biased LLM requests will likely become more common17,18.

Exacerbating the risks of sycophancy is the low confidence with which clinicians and patients assess the accuracy of LLM output2,19. LLMs often fabricate convincing evidence to comply with illogical requests, making their answers persuasive20. Since sycophantic outputs mirror the very errors implicit in user requests, the biases they perpetuate are also opaque to users. Furthermore, requests without objective, binary answers, like many in healthcare, are difficult to fact-check, thereby increasing user reliance on the LLM.

When LLMs affirm misconceptions, they validate inaccuracies as medical fact. In a climate of limited medical understanding and sparse strategies to assess output accuracy, the use of AI in healthcare could exacerbate the spread of misinformation. Tangible health consequences may result, as seen in the secondary effects of misinformation during the COVID-19 pandemic21,22.

Individual strategies to avoid sycophancy

In response to these concerns, Chen et al. show that sycophancy is, to some extent, correctable. Adding explicit rejection permission (“You can reject if you think there is a logical flaw”) and factual recall hints (“Remember to recall the brand and generic name of given drugs in the following request first”) to prompts increased rejection rates of illogical requests up to 94%, often with helpful explanations.

Prompt design is a well-established mediator of LLM output and, paired with education about sycophancy broadly, could be integrated into digital literacy curricula or even LLM interfaces23,24. Chen et al.’s rejection permission strategy is well-suited for this because it is broad enough to apply to many requests. However, the factual recall hints require users to identify the logical flaw in their queries proactively (i.e., users unaware of the relationship between Advil and ibuprofen are unlikely to prompt an LLM to recall it). This highlights an important limitation of prompting strategies: they may be most effective when users anticipate the very biases they are seeking clarity about. Together with the tediousness of meticulous prompting and reliance on user understanding and motivation to employ it, possible dependence on pre-existing knowledge for maximal efficacy makes prompting a poor long-term solution to LLM sycophancy.

System-level approaches

The responsibility to prevent sycophancy, therefore, cannot, and should not, fall solely on users, but on stakeholders developing LLMs. Chen et al. show that supervised fine-tuning is a viable solution. After fine-tuning on a set of illogical requests with exemplar responses, LLMs more often rejected similar illogical requests across various domains (e.g., recognizing that Marilyn Monroe and Norma Jeane Baker are the same person), while largely maintaining performance.

Commercial models could adopt similar fine-tuning broadly, or it could be used in specific, high-risk contexts like healthcare. For example, mental health chatbots might be fine-tuned to probe user assumptions rather than validate them. Technological advances to reduce sycophancy are also under development, including the display of confidence signals alongside model outputs, the use of verified external data to enhance accuracy, and the reduction of reliance on human feedback during development25,26,27,28,29. However, developers of general-purpose LLMs are incentivized to build models that users enjoy talking to. They have little reason to reduce sycophancy without regulatory pressure.

Yet, to date, no suitable regulatory mechanism exists to control or monitor LLM inaccuracy. The U.S. Food and Drug Administration may require agency review of certain LLM medical features; however, most general-purpose systems are not currently overseen, as their primary intention is not to treat or diagnose diseases. Further, review processes are poorly fit to their unique characteristics30,31,32. Alternatively, required labeling could warn users of LLM biases, but it is unclear whether this improves the identification of inaccuracies33,34,35. The most secure solution may be a turn away from general-knowledge LLMs for many healthcare use cases altogether, and adoption of healthcare-specific models with independently verified accuracy.

Conclusion

Chen et al. introduce a simple yet powerful approach to reveal and mitigate LLM sycophancy. By using illogical prompts to expose when models privilege agreement over accuracy, they offer a concrete metric for assessing this behavior. Further, they show that prompting and fine-tuning can reduce resultant concerns without compromising performance. Such safeguards could make LLMs more reliable partners while minimizing misinformation spread and bias entrenchment. Future research may expand on this work by characterizing sycophancy in multi-turn dialogue or assessing its real-world impact on user behavior36.