Abstract
Although artificial intelligence (AI) tools are growing in breast imaging, little research has examined how to communicate AI findings in patient portals. English-speaking US women with ≥1 prior mammogram (n = 1623) were randomized to one of thirteen conditions. All participants viewed a radiologist report negative for breast cancer; twelve conditions included an AI report with one of four AI scores (0, 29 [no suspicion of cancer]; 31, 50 [suspicion of cancer]), presented alone, with an abnormality cutoff threshold, or with both the threshold and the AI’s False Discovery Rate (FDR) or False Omission Rate (FOR). Participants reported whether they would consult an attorney for litigation if advanced cancer were detected one year later (primary outcome). Secondary outcomes included follow-up decisions, concern for cancer, and trust. Litigation was higher when AI was included, especially when AI concluded suspicion of cancer. Providing the FDR/FOR reduced litigation; similar effects were observed for follow-up behaviors.
Similar content being viewed by others
Introduction
Artificial intelligence (AI) tools in breast imaging have rapidly increased, offering significant potential to improve breast cancer detection and management. The Food and Drug Administration (FDA) has approved several AI tools for screening mammography with applications in triaging exams, detecting and classifying lesions, and assessing breast density1. With increasing imaging volume and continued demand for breast cancer screening, radiomics and AI studies have demonstrated that AI-enhanced workflows hold promise for improving diagnostic accuracy and efficiency2.
Nonetheless, successfully implementing AI tools in clinical practice requires participation from all stakeholders, including radiologists, developers, policymakers, and patients3. Given that screening examinations can cause anxiety, understanding patients’ reactions to the incorporation of AI in this space is crucial4. Patient surveys reveal various levels of trust and acceptance toward AI tools in breast imaging, which are impacted by factors such as education level and prior knowledge about AI5,6. While many express positive attitudes toward radiologist and AI collaboration, concerns surrounding trust, transparency, and accountability persist5,6,7,8. Critically, little research has examined patients’ perceptions toward AI in breast imaging—or imaging, more broadly—with respect to how it is communicated in a patient portal setting.
With the movement for transparency and patient-centered care, patient portals have played an increasingly important role in healthcare delivery, enabling patients to send messages, schedule appointments, and access lab and imaging results9. As more AI tools are implemented, it is important to establish best practices for communicating these results to patients. For a traditional radiologist report, writing with patients as the audience—such as by reducing medical jargon—can facilitate communication and comprehension10. For an AI-interpreted imaging report, however, there are additional considerations about effectively presenting information and ensuring patient autonomy11. Put differently, while AI disclosure is ethically essential, inadequate guidance on AI interpretation may inadvertently lead to further patient anxiety and confusion.
The purpose of this study was to illustrate how contextual information can modify patient attitudes; our hope is that these results help inform what information is provided to patients in their portal regarding their screening mammogram. In the US, screening mammograms for average-risk women are conducted annually starting at age 40, with some variations among societal guidelines. We compare participants’ hypothetical decision-making and attitudes when presented with a Breast Imaging Reporting and Data System (BI-RADS) 1 radiologist report (a report concluding no breast cancer was found) and variations of an AI-interpreted report. Our specific hypotheses are listed in Supplemental Table 1.
Results
Primary outcome
Litigation
As seen in Fig. 1 and Supplementary Tables 3 and 4, consideration of a lawsuit after a BI-RADS 1 determination upon discovering breast cancer the following year increased when AI results were included relative to No AI, p = 0.04977. Consideration of a lawsuit increased from 39.7% with No AI (control) to 46.9% when the AI score of 0 was presented alone. However, this decreased back to the control of 40.0% upon providing both the cutoff threshold of 30 and the FOR of 0.2%—all other conditions where AI agreed with the radiologist had higher percentages pursuing litigation compared to control, p = 0.3734. Once AI flagged the case as abnormal, consideration of a lawsuit increased to 60.7% when an AI score of 31 was provided alone, decreasing to 56.1% when paired with the threshold and further to 31.6% when the FDR of 95% was provided, p = 0.0005. Moreover, when the AI score of 50 was presented alone, the consideration of a lawsuit was 48.3%. This increased to 51.9% when paired with the cutoff threshold but decreased to 36.8% when the FDR was provided, p = 0.014.
X-Axis is an experimental condition, broken down by three column panels (No AI, Not Flagged for Abnormality, Flagged for Abnormality). The 12 AI conditions are comprised of four abnormality scores (0, 29, 31, 50) crossed with: 1) the abnormality cutoff threshold of “30” (Cut), and 2) the error rates (FOR, false omission rate, 0.2%; FDR, false discovery rate, 95%). Y-axis is percentage of participants who indicated they would consult an attorney to consider suing the radiologist for the BI-RADS 1 interpretation. The red dashed line is the control reference. Black dots are means with 95% confidence intervals.
Secondary outcomes
Second opinion
As seen in Fig. 2A and Supplementary Tables 3 and 5, desire to follow up with a different radiologist (second opinion) after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0003. Desire for a second opinion increased from 8.3% with No AI (control) to 27.2% when the AI score of 29 was presented alone, which increased to 68.0% when paired with the cutoff threshold of 30, which decreased slightly to 62.5% when the FOR of 0.2% was included—note, the latter two were relatively high even though AI agreed with the radiologist. Once AI flagged the case as abnormal, desire for a second opinion increased to 70.2% when an AI score of 31 was provided alone and increased to 81.3% when paired with threshold, p = 0.0003. This decreased to 61% when the FDR of 95% was provided, p = 0.0008. Likewise, when the AI score of 50 was presented alone, desire for a second opinion was 68.9%, which increased to 74.0% when paired with threshold, which decreased to 60.9% when the FDR was provided, p = 0.0215.
A, B Follow-up Seeking behaviors. X-Axis is experimental condition, broken down by three column panels (No AI, Not Flagged for Abnormality, Flagged for Abnormality). The 12 AI conditions are comprised of four abnormality scores (0, 29, 31, 50) crossed with: 1) the abnormality cutoff threshold of “30” (Cut), and 2) the error rates (FOR, false omission rate, 0.2%; FDR, false discovery rate, 95%). Y-axis is percentage of participants who indicated they would pursue the follow-up behavior (0–100%). Red dashed line is control reference. Black dots are means with 95% confidence intervals.
Second look
As seen in Fig. 2B and Supplementary Tables 3 and 5, desire to follow up with the same radiologist (second look) after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0003. Desire for a second look increased from 14.3% with No AI (control) to 30.7% when the AI score of 29 was presented alone, which increased to 62.1% when paired with the cutoff threshold of 30, which decreased slightly to 57.5% when the FOR of 0.2% was included—note, the latter two were relatively high even though AI agreed with the radiologist. Once AI flagged the case as abnormal, desire for a second look increased to 73.2% when an AI score of 31 was provided alone, increasing to 81.3% when paired with threshold, p = 0.0003. This decreased to 54.1% when the FDR of 95% was provided, p = 0.0002. Likewise, when the AI score of 50 was presented alone, desire for a second opinion was 65.8%, which increased to 71.8% when paired with threshold, which decreased to 63.6% when the FDR was provided, but failed to reach statistical significance, p = 0.09889.
Primary care physician (PCP) follow-up
As seen in Fig. 3A and Supplementary Tables 3 and 5, desire to follow up with the PCP after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0002. Desire for PCP follow-up increased from 24.5% with No AI (control) to 42.9% when the AI score of 29 was presented alone, which increased to 80.4% when paired with the cutoff threshold of 30, which decreased slightly to 71.1% when the FOR of 0.2% was included—note, all three were relatively high even though AI agreed with the radiologist. Once AI flagged the case as abnormal, desire for PCP follow-up increased to 82.8% when an AI score of 31 was provided alone, increasing to 84.9% when paired with the cutoff threshold of 30, p = 0.0002. This decreased to 68.4% when the FDR of 95% was provided, p = 0.0027. Likewise, when the AI score of 50 was presented alone, desire for PCP follow-up was 79.2%, which increased to 80.8% when paired with the cutoff threshold, which decreased to 73.7% when the FDR was provided, but failed to reach statistical significance, p = 0.104.
A, B Follow-up Seeking behaviors. X-Axis is experimental condition, broken down by three column panels (No AI, Not Flagged for Abnormality, Flagged for Abnormality). The 12 AI conditions are comprised of four abnormality scores (0, 29, 31, 50) crossed with: 1) the abnormality cutoff threshold of “30” (Cut), and 2) the error rates (FOR, false omission rate, 0.2%; FDR, false discovery rate, 95%). Y-axis is percentage of participants who indicated they would pursue the follow-up behavior (0-100%). Red dashed line is control reference. Black dots are means with 95% confidence intervals.
Request for more accurate follow-up imaging (MAFI)
As seen in Fig. 3B and Supplementary Tables 3 and 5, desire for MAFI after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0002. Desire for MAFI was 10.8% with No AI (control). This increased to 21.6% when the AI score of 29 was presented alone, 45.6% when paired with the cutoff threshold of 30, and 52.3% when the FOR of 0.2% was included despite AI agreeing with the radiologist. Once AI flagged the case as abnormal, MAFI desire increased to 58.9% when an AI score of 31 was provided alone. This increased to 67.3% when paired with the threshold of 30, p = 0.0002, but decreased to 44.7% when the FDR of 95% was provided, p = 0.0005. Likewise, the AI score of 50 had a 46.3% desire for MAFI, which increased to 64.4% when threshold was provided but decreased to 53.4% when the FDR was provided, p = 0.0497.
Trust
As seen in Fig. 4A and Supplementary Tables 3 and 6, trust in the radiologist and AI after a BI-RADS 1 determination varied depending on whether AI flagged a case and what corresponding information was provided, p = 0.0002. Trust in both the radiologist and AI generally decreased when AI scores increased, p = 0.0002. When the FDR of 95% was provided for both a score of 31 and 50, trust in the radiologist was elevated relative to just providing a score and threshold (12.6, p = 0.000167 and 5.35, p = 0.0498, respectively). Conversely, trust in the AI decreased relative to just providing a score and threshold (-24.8, p = 0.000167 and -23.7, p = 0.000176, respectively). Trust in the radiologist was always higher than AI, regardless of experimental condition, p = 0.0002.
A Trust in Radiologist and AI, B Concern about Breast Cancer. X-Axis is an experimental condition, broken down by column panels (No AI (4B only), Not Flagged for Abnormality, Flagged for Abnormality). The 12 AI conditions are comprised of four abnormality scores (0, 29, 31, 50) crossed with: 1) the abnormality cutoff threshold of “30” (Cut), and 2) the error rates (FOR, false omission rate, 0.2%; FDR, false discovery rate, 95%). Y-axis is a magnitude scale of (A) 0 (Not at all) to 100 (Very Much) and B a Likert scale (-50 greatly decrease, 0 no change, 50 greatly increase). A Blue dots are means with 95% confidence intervals denoting trust in radiologist, and red dots are means with 95% confidence intervals denoting trust in AI. B Black dots are means with 95% confidence intervals. Red dashed line is control reference.
Concern about breast cancer
As seen in Fig. 4B and Supplementary Tables 3 and 6, concern about breast cancer after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0002. Concern about breast cancer increased from -20.9 with No AI (control) to −9.6 when the AI score of 29 was presented alone. This increased to 5.1 when paired with the cutoff threshold of 30 and remained at 3.5 when the FOR of 0.2% was included—note, the latter three were relatively high despite AI agreeing with the radiologist. Once AI flagged the case as abnormal, concern increased to 13.8 when an AI score of 31 was provided alone, increasing to 17.4 when paired with threshold, p = 0.0002. This decreased to 1.8 when the FDR of 95% was provided, p = 0.0002. A similar pattern was observed for an AI score of 50: concern about breast cancer was 12.1 with score alone, which increased to 14.0 when paired with threshold, and decreased to 7.3 when the FDR was provided, p = 0.0146.
Financial Reasoning
As seen in Supplemental Table 7, willingness to get a second opinion, a follow-up with the same radiologist, a follow-up with the ordering physician, and additional imaging varied by AI information (as noted in the sections above). Willingness to pay out of pocket for these services also varied. Interestingly, approximately two-thirds of respondents wanted these follow-up services but not if they had to pay for them indicated unaffordability as the reason.
Discussion
In this experiment, we demonstrate that participants’ hypothetical decision-making behaviors when presented with a BI-RADS 1 radiologist report alongside an AI report vary based on the AI score, flagging status, and contextual information for interpreting AI triage. Specifically, participants’ consideration of a lawsuit increased with higher AI scores, and this was mitigated when the FDR was provided. Furthermore, participants’ requests for additional imaging and follow-up—from the same radiologist, a different radiologist, and the ordering physician—increased when AI was included relative to no AI. Desire for follow-up also increased as the AI score increased, though this was often mitigated by providing the FDR (and sometimes FOR). Of note, the FDR used in this study was very high—likely exceeding what participants would expect—whereas the FOR was low. This difference may explain why FDR had a greater effect on participants’ responses, as its magnitude was more salient. Trust in the radiologist and the AI tool also varied depending on the contextual information provided, though was always higher for the radiologist across all conditions, likely related to algorithm aversion as described by Dietvorst et al.12. The majority of participants who indicated they would request follow-up only if they were not responsible for the costs cited the inability—rather than unwillingness—to pay out of pocket.
These findings are important to consider in the context of prior studies probing patient attitudes toward AI in breast imaging. For example, semi-structured interviews showed that trust in AI relied on clear communication about its application and assurance that a radiologist was involved in the assessment process; participants further expressed they would rather tolerate anxiety from callbacks than risk a missed cancer diagnosis7. Abnormal mammography findings can already be a source of anxiety13 and the integration of AI findings into the patient portal—which 95% of the public reportedly desires14—may heighten concern if the radiologist and AI reports conflict. Indeed, a recent patient survey found that 91.5% were concerned about AI misdiagnosis15, while another study observed that health communication is seen as more reliable when told that the information comes from a physician versus AI16. Another survey found that patients preferred AI to be used as a second reader rather than as a standalone6. Our study demonstrates that contextualizing findings by providing the FDR (and sometimes FOR) can reduce the inclination to pursue follow-up care or a lawsuit. Providing contextual information to interpret AI findings can be one way to reduce uncertainty and concern.
Additionally, Pesapane et al. found that most respondents approved of AI as a supplementary tool but held both software developers and radiologists accountable for AI errors17. Emerging guidelines emphasize the importance of appropriate disclosure: Good Machine Learning Practice (GMLP) principles by the FDA and collaborating agencies encourage that “users are provided ready access to clear, contextually relevant information that is appropriate for the intended audience.”18
Human factors of AI implementation can impact patient outcomes and legal liability. One study demonstrated that incorrect AI results could worsen radiologist performance; critically, however, implementation strategies mitigated those effects19. A similar consideration translates to the interaction between patient and AI, recognizing that how AI information is presented in patient portals holds downstream implications for imaging volume, healthcare expenditure, and—as shown in the current study—litigation. Indeed, another vignette experiment demonstrated that when a radiologist provided a false-negative report despite AI correctly observing a pathology, mock jurors were more likely to side with the radiologist-defendant if AI accuracy data was provided20. Implementation strategies addressing human factors could reduce existing concerns surrounding trust, transparency, accountability, and legal liability5,6,7,8.
Lastly, our study highlights that financial considerations shape participants’ decision-making. This finding is important to contextualize against disparities in imaging acquisition and interpretation21. In breast imaging, marginalized populations—including racial and ethnic minorities and those uninsured or of lower socioeconomic status—bear a disproportionate burden of cancer morbidity and mortality due to social determinants22. For instance, newer advances in breast imaging, such as digital breast tomosynthesis, are not universally available and often have higher costs23. Similarly, incorporating AI tools may introduce new financial barriers and exacerbate existing disparities, as our study demonstrates the mismatch between the desire for follow-up and the inability to afford it.
This study assessed participants’ decision-making behaviors based on hypothetical imaging results bearing no actual consequence on or relationship to the participant, which limits ecological validity. Real-world constraints such as time, financial barriers, and other logistical considerations contribute to patients’ ability to seek second opinions and further imaging. Participants may not be accurate at predicting their behavior. By collapsing the inherently complex process of litigation into a single yes/no item regarding attorney consultation, the overall rates may not be generalizable, although the comparison between conditions still likely applies to real-world medical malpractice litigation. This is because the purpose of this study was to see how attitudes towards litigation changed by these conditions, not to estimate how many people would pursue legal action. The demographic composition of the study population, as well as the inclusion criteria focused on English-speaking women living in the US, limits the generalizability of the study’s findings; it is also unclear to what extent they would generalize to other legal systems outside of the US. In addition, financial reasoning could only be described here without inference, as these results were conditioned on responses to a prior item.
Additionally, the experimental conditions only included a BI-RADS 1 radiologist report with two subthreshold and two supra-threshold AI scores. Other combinations of concordance and discordance between radiologist and AI were not assessed. There was also no condition in which a radiologist was given additional information from a source other than AI. Future studies may benefit by including a condition where a second radiologist provides feedback to the interpreting radiologist, and power analyses for determining the sample size of such research could be based on the effect size(s) observed here. Also, inclusion of additional comprehension check items to ensure that participants understand the interpretation of FDR and FOR would be helpful, as would examining participants’ attitudes towards and experience with AI as potential moderators. Finally, this study was designed only to examine overall effects, rather than to explore potential mechanisms, which would be an avenue for future research.
Our study demonstrates that disclosing AI mammography results will impact patients’ interest in pursuing a medical malpractice claim against a radiologist. However, this can be mitigated by providing greater contextual information about an AI system. Moreover, access to AI results impacts patients’ behaviors pertaining to follow-up and checking. Best practices are needed to engage and inform patients about the application of AI tools in their care24. AI results that include “flags” and specific AI scores in their output should be accompanied by their corresponding error rates so patients can better understand that these scores come with inherent uncertainty.
Materials and methods
Participants and setting
Participants were recruited online through Prolific and were eligible if they identified as a female, English-speaking adult living in the United States (US) with 1+ prior mammogram. In total, n = 1623 participants were included (n = 361 and 802 failed a comprehension and manipulation check, accordingly). Demographic information is shown in Table 1. A target of approximately 80 or more completed observations per condition was based on practical (i.e., budget) constraints. This would allow an approximant effect size of d = 0.65 to be detected, assuming 0.80 power and an alpha of 0.05 with 28 comparisons using Šidák correction.
Procedure
Online participants who screened and provided consent were directed to a Qualtrics survey where they were randomized to one of 13 conditions. Participants were compensated $1.25 for completing the survey and $0.14 if they screened out for never having a mammogram. Participants were prevented from participating more than once. The protocol was approved by the Brown University Health Institutional Review Board (IRB009824).
Vignette design and experimental conditions
All participants were shown a vignette asking them to imagine they had recently received their annual screening mammogram and logged into their patient portal to access the results. In all conditions, the hypothetical results consisted of a standard radiologist’s report with a BI-RADS 1 (Negative) determination and a 2D mammogram. No additional information was provided in the control condition.
In the remaining twelve conditions, participants were provided an additional AI-interpreted report with one of four AI scores: 0 (Not flagged for abnormality), 29 (Not flagged for abnormality), 31 (Flagged for abnormality), or 50 (Flagged for abnormality). Moreover, there were three conditions for each AI score: (1) only the AI score was presented, (2) both (1) and the abnormality cutoff threshold of 30 with a range 0-100 were presented, or (3) both (1) and (2) were presented in addition to either a) False Discovery Rate (FDR, complement of the positive predictive value, 1-PPV) for the AI Score=31 or 50 conditions or b) False Omission Rate (FOR, complement of the negative predictive value, 1-NPV) for the AI Score=0 or 29 conditions (Supplemental Table 2). For all 12 AI conditions, the 2D mammogram included the corresponding AI information (see Appendix II for full vignettes).
To maximize ecological validity, the hypothetical AI’s performance was based on a commercially available AI mammography interpretation software, Lunit INSIGHT MMG version 1.1.7.1 (Lunit, Seoul, South Korea)25. Using the optimal threshold of 30 with 72.38% sensitivity and 92.86% specificity described by Seker et al.—and assuming a breast cancer prevalence of 0.57%—the corresponding FDR and FOR were 95% and 0.2%, respectively. These were the FDR and FOR values given and explained to participants25.
Measures
Checking behavior (i.e., behavior to reduce anxiety)
Four questions probed hypothetical decisions participants would make after receiving the mammogram report: Would you request 1) “a second opinion from another radiologist?” 2) “a follow-up meeting with the radiologist who wrote your original report?” 3) “a follow-up consult with your primary care physician (including OB/GYN)?” and 4) “follow-up imaging that is more accurate even if a physician says it is not necessary for you?” For each question, participants could choose between: 1.) “No”, 2.) “Yes, even if I have to pay for it out of pocket”, or 3.) “Yes, but only if I don’t pay for it (if, for example, insurance pays)”. If they chose item 3, they were asked why: 1.) “I cannot afford to pay” or 2.) “I do not want to pay.”
Litigation
As the primary outcome, to measure participants’ inclination to sue the radiologist, participants were told, “Imagine one year later you receive a new mammogram with a new report from a different radiologist that indicates evidence of potential breast cancer. A biopsy confirms that breast cancer is present and is stage 3 (which indicates advanced cancer). Would you consult an attorney to explore suing the original radiologist for not having found the cancer in the previous year?” (yes/no).
Trust
Two questions measured participants’ trust in the decisions of the radiologist and AI system depicted in the vignette, from 0 (not at all) to 100 (very much): “Do you ultimately trust the…” 1) “radiologist’s decision?” 2) “AI’s decision?”
Concern
One question measured concern for breast cancer: “How, if at all, would these results change your concern about having breast cancer?” (-50-Greatly Decrease; 0-No Change; 50-Greatly Increase).
Comprehension and manipulation check
All participants had to correctly identify “No mammographic evidence of malignancy” as a comprehension check. All participants in AI conditions had to correctly recall the AI score (0, 29, 31, 50) and “(Not)flagged for abnormality” determination. Those in the cutoff conditions had to recall the correct cutoff of “30.” Those provided a FOR or FDR had to correctly recall “The AI had a False Omission Rate of 0.2%” or “The AI had a False Discovery Rate of 95%”, respectively.
Statistical analyses
All analyses were conducted using SAS Software 9.4 (SAS Inc. Cary, NC). Attitudes (the dependent variables) by experimental condition (the independent variable) were modeled using generalized linear modeling assuming a normal or binary distribution, where appropriate with the GLIMMX procedure using contrast comparisons reflecting the hypotheses. Trust in radiologist and AI, an exploratory aim, was modeled as an interaction by condition (i.e., condition X AI vs. Radiologist) using generalized estimating equations with sandwich estimation. All interval estimates were calculated for 95% confidence and alpha was established at the 0.05 level. P-values were adjusted for FDR correction using the Benjamini-Hochberg method. Tests of superiority corresponding with hypotheses were one-tailed, otherwise, two-tailed tests were used.
Data availability
The datasets generated and/or analyzed during the current study are not publicly available due to the IRB application not requesting data be made public, but the data are available from the corresponding author on reasonable request.
Code availability
The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.
References
Bahl, M. Updates in artificial intelligence for breast imaging. Semin Roentgenol. 57, 160–167 (2022).
Bitencourt, A., Daimiel Naranjo, I., Lo Gullo, R., Rossi Saccarelli, C. & Pinker, K. AI-enhanced breast imaging: Where are we and where are we heading? Eur. J. Radio. 142, 109882 (2021).
Lokaj, B., Pugliese, M. T., Kinkel, K., Lovis, C. & Schmid, J. Barriers and facilitators of artificial intelligence conception and implementation for breast imaging diagnosis in clinical practice: a scoping review. Eur. Radio. 34, 2096–2109 (2023).
Pesapane, F. et al. Evolving paradigms in breast cancer screening: balancing efficacy, personalization, and equity. Eur. J. Radio. [Internet] 172, 111321 (2024).
Pesapane, F. et al. Patients’ perceptions and attitudes to the use of artificial intelligence in breast cancer diagnosis: a narrative review. Life 14, 454 (2024).
Ozcan, B. B., Dogan, B. E., Xi, Y. & Knippa, E. E. Patient perception of artificial intelligence use in interpretation of screening mammograms: a survey study. Radiol. Imaging Cancer 7, e240290 (2025).
Johansson, J., Dembrower, K., Strand, F. & Grauman, Å. Women’s perceptions and attitudes towards the use of AI in mammography in Sweden: a qualitative interview study. BMJ Open 14, e084014 (2024).
Holen, Å. S. et al. Women’s attitudes and perspectives on the use of artificial intelligence in the assessment of screening mammograms. Eur. J. Radiol. 175, 111431 (2024).
Garry, K. et al. Patient experience with notification of radiology results: a comparison of direct communication and patient portal use. J. Am. Coll. Radiol. 17, 1130–1138 (2020).
Lourenco, A. P. & Baird, G. L. Optimizing radiology reports for patients and referring physicians: Mitigating the curse of knowledge. Acad. Radiol. 27, 436–439 (2020).
Beauchamp T. L. & Childress J. F. Principles Biomedical Ethics (Oxford University Press, 1992).
Dietvorst, B. J., Simmons, J. P. & Massey, C. Algorithm aversion: people erroneously avoid algorithms after seeing them err. J. Exp. Psychol. Gen. 144, 114 (2015).
Loving, V. A., Aminololama-Shakeri, S. & Leung, J. W. T. Anxiety and its association with screening mammography. J. Breast Imaging 3, 266–272 (2021).
Platt, J., Nong, P., Carmona, G. & Kardia, S. Public attitudes toward notification of use of artificial intelligence in health care. JAMA Netw. Open 7, e2450102– (2024).
Khullar, D. et al. Perspectives of patients about artificial intelligence in health care. JAMA Netw. Open 5, e2210309 (2022).
Reis, M., Reis, F. & Kunde, W. Influence of believed AI involvement on the perception of digital medical advice. Nat. Med. 1–3 (2024).
Pesapane, F. et al. Women’s perceptions and attitudes to the use of AI in breast cancer screening: a survey in a cancer referral centre. Br. J. Radiol. 96, 20220569 (2023).
Good machine learning practice for medical device development: Guiding principles. U.S. Food and Drug Administration. https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles (2023).
Bernstein, M. H. et al. Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography. Eur. Radiol. 33, 8263–8269 (2023).
Bernstein, M. H., Sheppard, B., Bruno, M. A., Lay, P. S. & Baird, G. L. Randomized study of the impact of AI on perceived legal liability for radiologists. NEJM AI 2, AIoa2400785 (2025).
Waite, S., Scott, J. & Colombo, D. Narrowing the gap: Imaging disparities in radiology. Radiology 299, 27–35 (2021).
DeBenedectis, C. M. et al. Health care disparities in radiology—a review of the current literature. J. Am. Coll. Radiol. 19, 101–111 (2022).
Miles, R. C., Onega, T. & Lee, C. I. Addressing potential health disparities in the adoption of advanced breast imaging technologies. Acad. Radiol. 25, 547–551 (2018).
Millen, M., Tai-Seale, M. & Longhurst, C. A. A call for disclosure when using AI for patient communications. NEJM AI 2, AIp2401167 (2025).
Seker, M. E. et al. Diagnostic capabilities of artificial intelligence as an additional reader in a breast cancer screening program. Eur. Radiol. 34, 6145–6157 (2024).
Author information
Authors and Affiliations
Contributions
The study was conceived of by E.C.S. and G.L.B. The experimental design was created primarily by E.C.S., M.H.B., and G.L.B. with assistance from P.S.L. and L.D. E.H.D. and A.P.L. refined the wording of the vignette to reflect real-world clinical practice. E.C.S. drafted the introduction, methods, and discussion with assistance from M.H.B. and G.L.B. G.L.B. drafted the results section and conducted all analyses. P.S.L., L.D., E.H.D., and A.P.L. reviewed the manuscript and provided feedback. All authors approve of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Song, E.C., Bernstein, M.H., Lay, P.S. et al. Accessing AI mammography reports impacts patient follow-up behaviors: the unintended consequences of including AI in patient portals. npj Digit. Med. 8, 761 (2025). https://doi.org/10.1038/s41746-025-02201-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41746-025-02201-0






