In a prior edition of this journal, Ryan and colleagues conducted an experiment to examine the role of AI on physician decision making. They observed that 67% of physicians who initially recommended against treatment, but later viewed AI output recommending treatment, changed their decision. This is an important study because it shows how AI can have a dramatic influence on doctors. Medico-legal consequences are considered.
In their excellent article, Ryan et al.1 examined how Artificial Intelligence (AI) can impact decision-making for Family Medicine or Internal Medicine physicians. Physicians read a vignette about a patient with either attention or depressive symptoms and were asked whether she would be a good candidate for an ADHD referral or anti-depressant medication (henceforth “treatment”), respectively. Next, participants viewed additional information from an ostensible AI program that either did or did not recommend treatment. Physicians were then asked, for a second time after viewing AI, whether they would recommend treatment. They found that 67% of physicians who initially indicated the patient was not a good candidate for treatment but viewed AI output indicating the patient is a good candidate changed their decision. Conversely, 21.5% of physicians who initially indicated the patient was a good candidate for treatment and then viewed AI output indicating the opposite changed their mind.
This important study clearly shows a vital way in which AI can influence healthcare professionals. Moreover, the findings are consistent with a new but growing body of work that demonstrates physicians are influenced by AI results2,3. In fact, some research has observed that physicians seemingly trust AI more than their medical colleagues4,5, consistent with the notion of “automation bias”6.
Trust in AI is not bad, per se. However, it comes with unintended consequences. In one study7, radiologists interpreted 90 chest X-rays without AI, and then with an ostensible AI system that provided 8 False Positive (FP) and 4 False Negative (FN) results (ROC-AUC = 0.87). In the no-AI condition, the FP and FN rate for these cases were 2.7% and 51.4%, respectively; however, that increased to 20.7–33.0% and 80.5–86.0%, respectively, when the cases were interpreted with the support of incorrect AI results. Thus, even though AI may, in general, improve physician’s performance8, it can still lead doctors astray in the inevitable circumstances where it is incorrect. This makes studying the influence of AI on physicians especially prescient.
Several elements of the manuscript are worthy of further elaboration. First physicians greatly under-estimated the extent to which AI influenced their decision making. One clever feature of this experiment is that physicians were asked whether they would have made a different decision without AI. In the aforementioned circumstance where physicians were given AI feedback to treat after they initially (without AI) said the patient was not a good candidate for treatment, only 26.7% said that they would have made a different decision without AI; however, as noted above, 66.7% actually did change their assessment after seeing an AI result that was discrepant with their initial determination in this circumstance. By comparison, only 18.2% changed their mind when the AI feedback was consistent with their initial determination.
While the study was very informative, some features could be improved in future work. First, in the real-world, a physician’s decision to prescribe an SSRI or refer a patient for an ADHD diagnosis is dichotomous. However, Ryan et al. measured this (primary) outcome on a 5-point Likert scale and note that “change in decision” is a binary variable obtained by taking the absolute value of the difference between the physician assessment at [the two time points].” Presumably this means that any change on the 5-point scale indicated a change in the physician’s assessment. If true, that likely indicates an artificially inflated estimate of the percentage of physicians who changed their determination. For instance, consider a physician who initially responded with 1 (strongly disagree) to the primary outcome question “[The patient] would be a good candidate for a prescription for a [SSRI]” Then, after viewing AI feedback saying they would be a good candidate, modified their response to 2 (disagree). In the real-world, this physician would probably not change their behavior, but Ryan and colleagues seem to classify such a scenario as constituting a change.
Second, the authors combined the results of both vignettes into one when conducting data-analysis. This obscure differences in the results if findings were not uniform.
Other aspects of the study may actually be biased towards blunting the effect of the manipulations. The authors correctly note that one limitation is reliance on hypothetical vignettes. While this indeed compromises ecological validity, in the context of real-world medical decisions, physicians will probably be more, not less, influenced by AI because they will be highly incentivized to take in all sources of information.
Chief among those incentives are medico-legal consequences9. Our group10 asked mock jurors to read vignettes describing hypothetical medical malpractice lawsuits where a radiologist failed to find an abnormality and a radiologist was being sued for damages. Across two vignettes with different pathologies, when told that the radiologist FN interpretation occurred despite the fact AI caught the abnormality, participants were more likely to side with the plaintiff compared to when the radiologist FN interpretation occurred without the support of AI. In the study by Ryan, et al., the effect of a conflicting AI recommendation was considerably more pronounced when that recommendation was in favor of treatment than when it was not (67% and 22% changed initial decision, respectively). This apparent pro-treatment bias might be an instantiation of defensive medicine stemming from fear of legal liability.
Additionally, we point readers towards the AI output when AI does not recommend treatment (see supplement). The output states (using only SSRI for simplicity) “Patient Jess Y. is in the 52nd percentile of all patients in terms of SSRI Compatibility. Based on this score, Patient Jess Y. is likely an AVERAGE candidate for an SSRI prescription. It is recommended that Patient Jess Y. is NOT prescribed an SSRI.” Technically, this output is logical: the 52nd percentile certainly indicates Jess is average, and SSRIs should only be given to patients whose mood is much worse than average. Nonetheless, we worry that participants may mis-interpret “average” to indicate that an SSRI prescription would be reasonable, even if not ultimately recommended. Relatedly, it is not clear what “all patients” mean. It could be interpreted as all patients in the practice or all patients being seen for depression symptoms.
Interestingly, this points to another important topic of investigation that has received far less attention than warranted: the human factors of how AI outputs should be presented. Even seemingly subtle differences can change behavior. For instance, in the aforementioned chest x-ray experiment7, radiologists were more likely to incorrectly disagree with AI when told the AI results would be saved in (versus deleted from) a patient’s file. The human factors of AI implementation are also relevant for legal liability. In the mock juror study referenced above10, providing jurors with the error rates of AI generally decreased the radiologist’s legal liability compared to not providing error rates. Future studies should explore how to best frame AI results to physicians.
In summary, the experiment by Ryan et al. provides a much-needed contribution to the AI literature. The field must grapple with the fact that AI can change physicians’ perceptions and decision-making. Understanding the magnitude and moderators of this effect will be critical as AI becomes increasingly utilized across medical specialties. Future research building upon this study could be strengthened by addressing some methodological shortcomings discussed above.
Data availability
No datasets were generated or analysed during the current study.
References
Ryan, K., Yang, H.-J., Kim, B. & Kim, J. P. Assessing the impact of AI on physician decision-making for mental health treatment in primary care. npj Ment. Health Res. 4, 8 (2025).
Brown, R. & Lee, A. The role of AI in emergency medicine decision-making. Emerg. Med. J. 39, 221–227 (2022).
Davis, M. & Patel, R. AI-assisted decision-making in oncology. J. Clin. Oncol. 41, 89–95 (2023).
Smith, J. & Johnson, L. Trust in artificial intelligence vs. human physicians: a comparative study. J. Med. Ethics 47, 150–158 (2021).
Davis, M. & Patel, R. Physicians: a comparative study. J. Med. Ethics 47, 150–158 (2023).
Goddard, K., Roudsari, A. V. & Wyatt, J. C. Decision support and automation bias: methodology and preliminary results of a systematic review. Stud. Health Technol. Inform. 164, 3–7 (2011).
Bernstein, M. H. et al. A multi-reader pilot study of lung cancer detection with chest radiography. Eur. Radiol. 33, 8263–8269 (2023).
Beam, A. L. et al. Artificial intelligence in medicine. N. Engl. J. Med. 388, 1220–1221 (2023).
Banja, J. D. et al. Medical malpractice liability in an era of advanced artificial intelligence. J. Am. Coll. Radiol. 19, 816–820 (2022).
Bernstein, M. H. et al. Randomized study of the impact of AI on perceived legal liability for radiologists. NPJ Digit. Med. 2, 2400785 (2025).
Author information
Authors and Affiliations
Contributions
The project was conceived of by G.L.B. M.H.B. drafted the first version of the manuscript. B.S., M.A.B., and G.L.B. reviewed and provided feedback on the paper. All authors have read and approve the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bernstein, M.H., Sheppard, B., Bruno, M.A. et al. Matters Arising: The importance of understanding AI’s impact on physician behavior. npj Mental Health Res 4, 68 (2025). https://doi.org/10.1038/s44184-025-00180-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s44184-025-00180-4