Accessing AI mammography reports impacts patient follow-up behaviors: the unintended consequences of including AI in patient portals

Song, Elizabeth C.; Bernstein, Michael H.; Lay, Parker S.; Druart, Leo; Dibble, Elizabeth H.; Lourenco, Ana P.; Baird, Grayson L.

doi:10.1038/s41746-025-02201-0

Download PDF

Article
Open access
Published: 03 December 2025

Accessing AI mammography reports impacts patient follow-up behaviors: the unintended consequences of including AI in patient portals

Elizabeth C. Song¹,
Michael H. Bernstein^2,3,4,
Parker S. Lay^2,3,
Leo Druart^2,3,4,
Elizabeth H. Dibble⁴,
Ana P. Lourenco⁴ &
…
Grayson L. Baird^2,3,4

npj Digital Medicine volume 8, Article number: 761 (2025) Cite this article

3300 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Although artificial intelligence (AI) tools are growing in breast imaging, little research has examined how to communicate AI findings in patient portals. English-speaking US women with ≥1 prior mammogram (n = 1623) were randomized to one of thirteen conditions. All participants viewed a radiologist report negative for breast cancer; twelve conditions included an AI report with one of four AI scores (0, 29 [no suspicion of cancer]; 31, 50 [suspicion of cancer]), presented alone, with an abnormality cutoff threshold, or with both the threshold and the AI’s False Discovery Rate (FDR) or False Omission Rate (FOR). Participants reported whether they would consult an attorney for litigation if advanced cancer were detected one year later (primary outcome). Secondary outcomes included follow-up decisions, concern for cancer, and trust. Litigation was higher when AI was included, especially when AI concluded suspicion of cancer. Providing the FDR/FOR reduced litigation; similar effects were observed for follow-up behaviors.

Economics of AI and human task sharing for decision making in screening mammography

Article Open access 07 March 2025

Equitable impact of an AI-driven breast cancer screening workflow in real-world US-wide deployment

Article 17 November 2025

Artificial intelligence for breast cancer screening in mammography (AI-STREAM): preliminary analysis of a prospective multicenter cohort study

Article Open access 06 March 2025

Introduction

Artificial intelligence (AI) tools in breast imaging have rapidly increased, offering significant potential to improve breast cancer detection and management. The Food and Drug Administration (FDA) has approved several AI tools for screening mammography with applications in triaging exams, detecting and classifying lesions, and assessing breast density¹. With increasing imaging volume and continued demand for breast cancer screening, radiomics and AI studies have demonstrated that AI-enhanced workflows hold promise for improving diagnostic accuracy and efficiency².

Nonetheless, successfully implementing AI tools in clinical practice requires participation from all stakeholders, including radiologists, developers, policymakers, and patients³. Given that screening examinations can cause anxiety, understanding patients’ reactions to the incorporation of AI in this space is crucial⁴. Patient surveys reveal various levels of trust and acceptance toward AI tools in breast imaging, which are impacted by factors such as education level and prior knowledge about AI^5,6. While many express positive attitudes toward radiologist and AI collaboration, concerns surrounding trust, transparency, and accountability persist^5,6,7,8. Critically, little research has examined patients’ perceptions toward AI in breast imaging—or imaging, more broadly—with respect to how it is communicated in a patient portal setting.

With the movement for transparency and patient-centered care, patient portals have played an increasingly important role in healthcare delivery, enabling patients to send messages, schedule appointments, and access lab and imaging results⁹. As more AI tools are implemented, it is important to establish best practices for communicating these results to patients. For a traditional radiologist report, writing with patients as the audience—such as by reducing medical jargon—can facilitate communication and comprehension¹⁰. For an AI-interpreted imaging report, however, there are additional considerations about effectively presenting information and ensuring patient autonomy¹¹. Put differently, while AI disclosure is ethically essential, inadequate guidance on AI interpretation may inadvertently lead to further patient anxiety and confusion.

The purpose of this study was to illustrate how contextual information can modify patient attitudes; our hope is that these results help inform what information is provided to patients in their portal regarding their screening mammogram. In the US, screening mammograms for average-risk women are conducted annually starting at age 40, with some variations among societal guidelines. We compare participants’ hypothetical decision-making and attitudes when presented with a Breast Imaging Reporting and Data System (BI-RADS) 1 radiologist report (a report concluding no breast cancer was found) and variations of an AI-interpreted report. Our specific hypotheses are listed in Supplemental Table 1.

Table 1 Demographic characteristics

Full size table

Results

Primary outcome

Litigation

As seen in Fig. 1 and Supplementary Tables 3 and 4, consideration of a lawsuit after a BI-RADS 1 determination upon discovering breast cancer the following year increased when AI results were included relative to No AI, p = 0.04977. Consideration of a lawsuit increased from 39.7% with No AI (control) to 46.9% when the AI score of 0 was presented alone. However, this decreased back to the control of 40.0% upon providing both the cutoff threshold of 30 and the FOR of 0.2%—all other conditions where AI agreed with the radiologist had higher percentages pursuing litigation compared to control, p = 0.3734. Once AI flagged the case as abnormal, consideration of a lawsuit increased to 60.7% when an AI score of 31 was provided alone, decreasing to 56.1% when paired with the threshold and further to 31.6% when the FDR of 95% was provided, p = 0.0005. Moreover, when the AI score of 50 was presented alone, the consideration of a lawsuit was 48.3%. This increased to 51.9% when paired with the cutoff threshold but decreased to 36.8% when the FDR was provided, p = 0.014.

**Fig. 1: Considering a Lawsuit after a BI-RADS 1 determination upon discovering breast cancer the following year.**

Secondary outcomes

Second opinion

As seen in Fig. 2A and Supplementary Tables 3 and 5, desire to follow up with a different radiologist (second opinion) after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0003. Desire for a second opinion increased from 8.3% with No AI (control) to 27.2% when the AI score of 29 was presented alone, which increased to 68.0% when paired with the cutoff threshold of 30, which decreased slightly to 62.5% when the FOR of 0.2% was included—note, the latter two were relatively high even though AI agreed with the radiologist. Once AI flagged the case as abnormal, desire for a second opinion increased to 70.2% when an AI score of 31 was provided alone and increased to 81.3% when paired with threshold, p = 0.0003. This decreased to 61% when the FDR of 95% was provided, p = 0.0008. Likewise, when the AI score of 50 was presented alone, desire for a second opinion was 68.9%, which increased to 74.0% when paired with threshold, which decreased to 60.9% when the FDR was provided, p = 0.0215.

Second look

As seen in Fig. 2B and Supplementary Tables 3 and 5, desire to follow up with the same radiologist (second look) after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0003. Desire for a second look increased from 14.3% with No AI (control) to 30.7% when the AI score of 29 was presented alone, which increased to 62.1% when paired with the cutoff threshold of 30, which decreased slightly to 57.5% when the FOR of 0.2% was included—note, the latter two were relatively high even though AI agreed with the radiologist. Once AI flagged the case as abnormal, desire for a second look increased to 73.2% when an AI score of 31 was provided alone, increasing to 81.3% when paired with threshold, p = 0.0003. This decreased to 54.1% when the FDR of 95% was provided, p = 0.0002. Likewise, when the AI score of 50 was presented alone, desire for a second opinion was 65.8%, which increased to 71.8% when paired with threshold, which decreased to 63.6% when the FDR was provided, but failed to reach statistical significance, p = 0.09889.

Primary care physician (PCP) follow-up

As seen in Fig. 3A and Supplementary Tables 3 and 5, desire to follow up with the PCP after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0002. Desire for PCP follow-up increased from 24.5% with No AI (control) to 42.9% when the AI score of 29 was presented alone, which increased to 80.4% when paired with the cutoff threshold of 30, which decreased slightly to 71.1% when the FOR of 0.2% was included—note, all three were relatively high even though AI agreed with the radiologist. Once AI flagged the case as abnormal, desire for PCP follow-up increased to 82.8% when an AI score of 31 was provided alone, increasing to 84.9% when paired with the cutoff threshold of 30, p = 0.0002. This decreased to 68.4% when the FDR of 95% was provided, p = 0.0027. Likewise, when the AI score of 50 was presented alone, desire for PCP follow-up was 79.2%, which increased to 80.8% when paired with the cutoff threshold, which decreased to 73.7% when the FDR was provided, but failed to reach statistical significance, p = 0.104.

**Fig. 3: Participant Follow-up Behaviors: Primary Care Physician and More Accurate Imaging.**

Request for more accurate follow-up imaging (MAFI)

As seen in Fig. 3B and Supplementary Tables 3 and 5, desire for MAFI after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0002. Desire for MAFI was 10.8% with No AI (control). This increased to 21.6% when the AI score of 29 was presented alone, 45.6% when paired with the cutoff threshold of 30, and 52.3% when the FOR of 0.2% was included despite AI agreeing with the radiologist. Once AI flagged the case as abnormal, MAFI desire increased to 58.9% when an AI score of 31 was provided alone. This increased to 67.3% when paired with the threshold of 30, p = 0.0002, but decreased to 44.7% when the FDR of 95% was provided, p = 0.0005. Likewise, the AI score of 50 had a 46.3% desire for MAFI, which increased to 64.4% when threshold was provided but decreased to 53.4% when the FDR was provided, p = 0.0497.

Trust

As seen in Fig. 4A and Supplementary Tables 3 and 6, trust in the radiologist and AI after a BI-RADS 1 determination varied depending on whether AI flagged a case and what corresponding information was provided, p = 0.0002. Trust in both the radiologist and AI generally decreased when AI scores increased, p = 0.0002. When the FDR of 95% was provided for both a score of 31 and 50, trust in the radiologist was elevated relative to just providing a score and threshold (12.6, p = 0.000167 and 5.35, p = 0.0498, respectively). Conversely, trust in the AI decreased relative to just providing a score and threshold (-24.8, p = 0.000167 and -23.7, p = 0.000176, respectively). Trust in the radiologist was always higher than AI, regardless of experimental condition, p = 0.0002.

Concern about breast cancer

As seen in Fig. 4B and Supplementary Tables 3 and 6, concern about breast cancer after a BI-RADS 1 determination increased when AI results were included relative to No AI, p = 0.0002. Concern about breast cancer increased from -20.9 with No AI (control) to −9.6 when the AI score of 29 was presented alone. This increased to 5.1 when paired with the cutoff threshold of 30 and remained at 3.5 when the FOR of 0.2% was included—note, the latter three were relatively high despite AI agreeing with the radiologist. Once AI flagged the case as abnormal, concern increased to 13.8 when an AI score of 31 was provided alone, increasing to 17.4 when paired with threshold, p = 0.0002. This decreased to 1.8 when the FDR of 95% was provided, p = 0.0002. A similar pattern was observed for an AI score of 50: concern about breast cancer was 12.1 with score alone, which increased to 14.0 when paired with threshold, and decreased to 7.3 when the FDR was provided, p = 0.0146.

Financial Reasoning

As seen in Supplemental Table 7, willingness to get a second opinion, a follow-up with the same radiologist, a follow-up with the ordering physician, and additional imaging varied by AI information (as noted in the sections above). Willingness to pay out of pocket for these services also varied. Interestingly, approximately two-thirds of respondents wanted these follow-up services but not if they had to pay for them indicated unaffordability as the reason.

Discussion

In this experiment, we demonstrate that participants’ hypothetical decision-making behaviors when presented with a BI-RADS 1 radiologist report alongside an AI report vary based on the AI score, flagging status, and contextual information for interpreting AI triage. Specifically, participants’ consideration of a lawsuit increased with higher AI scores, and this was mitigated when the FDR was provided. Furthermore, participants’ requests for additional imaging and follow-up—from the same radiologist, a different radiologist, and the ordering physician—increased when AI was included relative to no AI. Desire for follow-up also increased as the AI score increased, though this was often mitigated by providing the FDR (and sometimes FOR). Of note, the FDR used in this study was very high—likely exceeding what participants would expect—whereas the FOR was low. This difference may explain why FDR had a greater effect on participants’ responses, as its magnitude was more salient. Trust in the radiologist and the AI tool also varied depending on the contextual information provided, though was always higher for the radiologist across all conditions, likely related to algorithm aversion as described by Dietvorst et al.¹². The majority of participants who indicated they would request follow-up only if they were not responsible for the costs cited the inability—rather than unwillingness—to pay out of pocket.

These findings are important to consider in the context of prior studies probing patient attitudes toward AI in breast imaging. For example, semi-structured interviews showed that trust in AI relied on clear communication about its application and assurance that a radiologist was involved in the assessment process; participants further expressed they would rather tolerate anxiety from callbacks than risk a missed cancer diagnosis⁷. Abnormal mammography findings can already be a source of anxiety¹³ and the integration of AI findings into the patient portal—which 95% of the public reportedly desires¹⁴—may heighten concern if the radiologist and AI reports conflict. Indeed, a recent patient survey found that 91.5% were concerned about AI misdiagnosis¹⁵, while another study observed that health communication is seen as more reliable when told that the information comes from a physician versus AI¹⁶. Another survey found that patients preferred AI to be used as a second reader rather than as a standalone⁶. Our study demonstrates that contextualizing findings by providing the FDR (and sometimes FOR) can reduce the inclination to pursue follow-up care or a lawsuit. Providing contextual information to interpret AI findings can be one way to reduce uncertainty and concern.

Additionally, Pesapane et al. found that most respondents approved of AI as a supplementary tool but held both software developers and radiologists accountable for AI errors¹⁷. Emerging guidelines emphasize the importance of appropriate disclosure: Good Machine Learning Practice (GMLP) principles by the FDA and collaborating agencies encourage that “users are provided ready access to clear, contextually relevant information that is appropriate for the intended audience.”¹⁸

Human factors of AI implementation can impact patient outcomes and legal liability. One study demonstrated that incorrect AI results could worsen radiologist performance; critically, however, implementation strategies mitigated those effects¹⁹. A similar consideration translates to the interaction between patient and AI, recognizing that how AI information is presented in patient portals holds downstream implications for imaging volume, healthcare expenditure, and—as shown in the current study—litigation. Indeed, another vignette experiment demonstrated that when a radiologist provided a false-negative report despite AI correctly observing a pathology, mock jurors were more likely to side with the radiologist-defendant if AI accuracy data was provided²⁰. Implementation strategies addressing human factors could reduce existing concerns surrounding trust, transparency, accountability, and legal liability^5,6,7,8.

Lastly, our study highlights that financial considerations shape participants’ decision-making. This finding is important to contextualize against disparities in imaging acquisition and interpretation²¹. In breast imaging, marginalized populations—including racial and ethnic minorities and those uninsured or of lower socioeconomic status—bear a disproportionate burden of cancer morbidity and mortality due to social determinants²². For instance, newer advances in breast imaging, such as digital breast tomosynthesis, are not universally available and often have higher costs²³. Similarly, incorporating AI tools may introduce new financial barriers and exacerbate existing disparities, as our study demonstrates the mismatch between the desire for follow-up and the inability to afford it.

This study assessed participants’ decision-making behaviors based on hypothetical imaging results bearing no actual consequence on or relationship to the participant, which limits ecological validity. Real-world constraints such as time, financial barriers, and other logistical considerations contribute to patients’ ability to seek second opinions and further imaging. Participants may not be accurate at predicting their behavior. By collapsing the inherently complex process of litigation into a single yes/no item regarding attorney consultation, the overall rates may not be generalizable, although the comparison between conditions still likely applies to real-world medical malpractice litigation. This is because the purpose of this study was to see how attitudes towards litigation changed by these conditions, not to estimate how many people would pursue legal action. The demographic composition of the study population, as well as the inclusion criteria focused on English-speaking women living in the US, limits the generalizability of the study’s findings; it is also unclear to what extent they would generalize to other legal systems outside of the US. In addition, financial reasoning could only be described here without inference, as these results were conditioned on responses to a prior item.

Additionally, the experimental conditions only included a BI-RADS 1 radiologist report with two subthreshold and two supra-threshold AI scores. Other combinations of concordance and discordance between radiologist and AI were not assessed. There was also no condition in which a radiologist was given additional information from a source other than AI. Future studies may benefit by including a condition where a second radiologist provides feedback to the interpreting radiologist, and power analyses for determining the sample size of such research could be based on the effect size(s) observed here. Also, inclusion of additional comprehension check items to ensure that participants understand the interpretation of FDR and FOR would be helpful, as would examining participants’ attitudes towards and experience with AI as potential moderators. Finally, this study was designed only to examine overall effects, rather than to explore potential mechanisms, which would be an avenue for future research.

Our study demonstrates that disclosing AI mammography results will impact patients’ interest in pursuing a medical malpractice claim against a radiologist. However, this can be mitigated by providing greater contextual information about an AI system. Moreover, access to AI results impacts patients’ behaviors pertaining to follow-up and checking. Best practices are needed to engage and inform patients about the application of AI tools in their care²⁴. AI results that include “flags” and specific AI scores in their output should be accompanied by their corresponding error rates so patients can better understand that these scores come with inherent uncertainty.

Materials and methods

Participants and setting

Participants were recruited online through Prolific and were eligible if they identified as a female, English-speaking adult living in the United States (US) with 1+ prior mammogram. In total, n = 1623 participants were included (n = 361 and 802 failed a comprehension and manipulation check, accordingly). Demographic information is shown in Table 1. A target of approximately 80 or more completed observations per condition was based on practical (i.e., budget) constraints. This would allow an approximant effect size of d = 0.65 to be detected, assuming 0.80 power and an alpha of 0.05 with 28 comparisons using Šidák correction.

Procedure

Online participants who screened and provided consent were directed to a Qualtrics survey where they were randomized to one of 13 conditions. Participants were compensated $1.25 for completing the survey and $0.14 if they screened out for never having a mammogram. Participants were prevented from participating more than once. The protocol was approved by the Brown University Health Institutional Review Board (IRB009824).

Vignette design and experimental conditions

All participants were shown a vignette asking them to imagine they had recently received their annual screening mammogram and logged into their patient portal to access the results. In all conditions, the hypothetical results consisted of a standard radiologist’s report with a BI-RADS 1 (Negative) determination and a 2D mammogram. No additional information was provided in the control condition.

In the remaining twelve conditions, participants were provided an additional AI-interpreted report with one of four AI scores: 0 (Not flagged for abnormality), 29 (Not flagged for abnormality), 31 (Flagged for abnormality), or 50 (Flagged for abnormality). Moreover, there were three conditions for each AI score: (1) only the AI score was presented, (2) both (1) and the abnormality cutoff threshold of 30 with a range 0-100 were presented, or (3) both (1) and (2) were presented in addition to either a) False Discovery Rate (FDR, complement of the positive predictive value, 1-PPV) for the AI Score=31 or 50 conditions or b) False Omission Rate (FOR, complement of the negative predictive value, 1-NPV) for the AI Score=0 or 29 conditions (Supplemental Table 2). For all 12 AI conditions, the 2D mammogram included the corresponding AI information (see Appendix II for full vignettes).

To maximize ecological validity, the hypothetical AI’s performance was based on a commercially available AI mammography interpretation software, Lunit INSIGHT MMG version 1.1.7.1 (Lunit, Seoul, South Korea)²⁵. Using the optimal threshold of 30 with 72.38% sensitivity and 92.86% specificity described by Seker et al.—and assuming a breast cancer prevalence of 0.57%—the corresponding FDR and FOR were 95% and 0.2%, respectively. These were the FDR and FOR values given and explained to participants²⁵.

Measures

Checking behavior (i.e., behavior to reduce anxiety)

Four questions probed hypothetical decisions participants would make after receiving the mammogram report: Would you request 1) “a second opinion from another radiologist?” 2) “a follow-up meeting with the radiologist who wrote your original report?” 3) “a follow-up consult with your primary care physician (including OB/GYN)?” and 4) “follow-up imaging that is more accurate even if a physician says it is not necessary for you?” For each question, participants could choose between: 1.) “No”, 2.) “Yes, even if I have to pay for it out of pocket”, or 3.) “Yes, but only if I don’t pay for it (if, for example, insurance pays)”. If they chose item 3, they were asked why: 1.) “I cannot afford to pay” or 2.) “I do not want to pay.”

Litigation

As the primary outcome, to measure participants’ inclination to sue the radiologist, participants were told, “Imagine one year later you receive a new mammogram with a new report from a different radiologist that indicates evidence of potential breast cancer. A biopsy confirms that breast cancer is present and is stage 3 (which indicates advanced cancer). Would you consult an attorney to explore suing the original radiologist for not having found the cancer in the previous year?” (yes/no).

Trust

Two questions measured participants’ trust in the decisions of the radiologist and AI system depicted in the vignette, from 0 (not at all) to 100 (very much): “Do you ultimately trust the…” 1) “radiologist’s decision?” 2) “AI’s decision?”

Concern

One question measured concern for breast cancer: “How, if at all, would these results change your concern about having breast cancer?” (-50-Greatly Decrease; 0-No Change; 50-Greatly Increase).

Comprehension and manipulation check

All participants had to correctly identify “No mammographic evidence of malignancy” as a comprehension check. All participants in AI conditions had to correctly recall the AI score (0, 29, 31, 50) and “(Not)flagged for abnormality” determination. Those in the cutoff conditions had to recall the correct cutoff of “30.” Those provided a FOR or FDR had to correctly recall “The AI had a False Omission Rate of 0.2%” or “The AI had a False Discovery Rate of 95%”, respectively.

Statistical analyses

All analyses were conducted using SAS Software 9.4 (SAS Inc. Cary, NC). Attitudes (the dependent variables) by experimental condition (the independent variable) were modeled using generalized linear modeling assuming a normal or binary distribution, where appropriate with the GLIMMX procedure using contrast comparisons reflecting the hypotheses. Trust in radiologist and AI, an exploratory aim, was modeled as an interaction by condition (i.e., condition X AI vs. Radiologist) using generalized estimating equations with sandwich estimation. All interval estimates were calculated for 95% confidence and alpha was established at the 0.05 level. P-values were adjusted for FDR correction using the Benjamini-Hochberg method. Tests of superiority corresponding with hypotheses were one-tailed, otherwise, two-tailed tests were used.

Data availability

The datasets generated and/or analyzed during the current study are not publicly available due to the IRB application not requesting data be made public, but the data are available from the corresponding author on reasonable request.

Code availability

The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.

References

Bahl, M. Updates in artificial intelligence for breast imaging. Semin Roentgenol. 57, 160–167 (2022).
Article PubMed Google Scholar
Bitencourt, A., Daimiel Naranjo, I., Lo Gullo, R., Rossi Saccarelli, C. & Pinker, K. AI-enhanced breast imaging: Where are we and where are we heading? Eur. J. Radio. 142, 109882 (2021).
Article Google Scholar
Lokaj, B., Pugliese, M. T., Kinkel, K., Lovis, C. & Schmid, J. Barriers and facilitators of artificial intelligence conception and implementation for breast imaging diagnosis in clinical practice: a scoping review. Eur. Radio. 34, 2096–2109 (2023).
Article Google Scholar
Pesapane, F. et al. Evolving paradigms in breast cancer screening: balancing efficacy, personalization, and equity. Eur. J. Radio. [Internet] 172, 111321 (2024).
Article Google Scholar
Pesapane, F. et al. Patients’ perceptions and attitudes to the use of artificial intelligence in breast cancer diagnosis: a narrative review. Life 14, 454 (2024).
Article PubMed PubMed Central Google Scholar
Ozcan, B. B., Dogan, B. E., Xi, Y. & Knippa, E. E. Patient perception of artificial intelligence use in interpretation of screening mammograms: a survey study. Radiol. Imaging Cancer 7, e240290 (2025).
Article PubMed PubMed Central Google Scholar
Johansson, J., Dembrower, K., Strand, F. & Grauman, Å. Women’s perceptions and attitudes towards the use of AI in mammography in Sweden: a qualitative interview study. BMJ Open 14, e084014 (2024).
Article Google Scholar
Holen, Å. S. et al. Women’s attitudes and perspectives on the use of artificial intelligence in the assessment of screening mammograms. Eur. J. Radiol. 175, 111431 (2024).
Article PubMed Google Scholar
Garry, K. et al. Patient experience with notification of radiology results: a comparison of direct communication and patient portal use. J. Am. Coll. Radiol. 17, 1130–1138 (2020).
Article PubMed Google Scholar
Lourenco, A. P. & Baird, G. L. Optimizing radiology reports for patients and referring physicians: Mitigating the curse of knowledge. Acad. Radiol. 27, 436–439 (2020).
Article PubMed Google Scholar
Beauchamp T. L. & Childress J. F. Principles Biomedical Ethics (Oxford University Press, 1992).
Dietvorst, B. J., Simmons, J. P. & Massey, C. Algorithm aversion: people erroneously avoid algorithms after seeing them err. J. Exp. Psychol. Gen. 144, 114 (2015).
Article PubMed Google Scholar
Loving, V. A., Aminololama-Shakeri, S. & Leung, J. W. T. Anxiety and its association with screening mammography. J. Breast Imaging 3, 266–272 (2021).
Article PubMed Google Scholar
Platt, J., Nong, P., Carmona, G. & Kardia, S. Public attitudes toward notification of use of artificial intelligence in health care. JAMA Netw. Open 7, e2450102– (2024).
Article PubMed PubMed Central Google Scholar
Khullar, D. et al. Perspectives of patients about artificial intelligence in health care. JAMA Netw. Open 5, e2210309 (2022).
Article PubMed PubMed Central Google Scholar
Reis, M., Reis, F. & Kunde, W. Influence of believed AI involvement on the perception of digital medical advice. Nat. Med. 1–3 (2024).
Pesapane, F. et al. Women’s perceptions and attitudes to the use of AI in breast cancer screening: a survey in a cancer referral centre. Br. J. Radiol. 96, 20220569 (2023).
Article PubMed Google Scholar
Good machine learning practice for medical device development: Guiding principles. U.S. Food and Drug Administration. https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles (2023).
Bernstein, M. H. et al. Can incorrect artificial intelligence (AI) results impact radiologists, and if so, what can we do about it? A multi-reader pilot study of lung cancer detection with chest radiography. Eur. Radiol. 33, 8263–8269 (2023).
Article CAS PubMed PubMed Central Google Scholar
Bernstein, M. H., Sheppard, B., Bruno, M. A., Lay, P. S. & Baird, G. L. Randomized study of the impact of AI on perceived legal liability for radiologists. NEJM AI 2, AIoa2400785 (2025).
Article Google Scholar
Waite, S., Scott, J. & Colombo, D. Narrowing the gap: Imaging disparities in radiology. Radiology 299, 27–35 (2021).
Article PubMed Google Scholar
DeBenedectis, C. M. et al. Health care disparities in radiology—a review of the current literature. J. Am. Coll. Radiol. 19, 101–111 (2022).
Article PubMed Google Scholar
Miles, R. C., Onega, T. & Lee, C. I. Addressing potential health disparities in the adoption of advanced breast imaging technologies. Acad. Radiol. 25, 547–551 (2018).
Article PubMed PubMed Central Google Scholar
Millen, M., Tai-Seale, M. & Longhurst, C. A. A call for disclosure when using AI for patient communications. NEJM AI 2, AIp2401167 (2025).
Article Google Scholar
Seker, M. E. et al. Diagnostic capabilities of artificial intelligence as an additional reader in a breast cancer screening program. Eur. Radiol. 34, 6145–6157 (2024).
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

The Warren Alpert Medical School of Brown University, Providence, RI, USA
Elizabeth C. Song
The Brown Radiology, Psychology, and Law Lab, Brown University Health, Providence, RI, USA
Michael H. Bernstein, Parker S. Lay, Leo Druart & Grayson L. Baird
Rhode Island Hospital, Brown University Health, Providence, RI, USA
Michael H. Bernstein, Parker S. Lay, Leo Druart & Grayson L. Baird
Department of Radiology, Warren Alpert Medical School of Brown University, Providence, RI, USA
Michael H. Bernstein, Leo Druart, Elizabeth H. Dibble, Ana P. Lourenco & Grayson L. Baird

Authors

Elizabeth C. Song
View author publications
Search author on:PubMed Google Scholar
Michael H. Bernstein
View author publications
Search author on:PubMed Google Scholar
Parker S. Lay
View author publications
Search author on:PubMed Google Scholar
Leo Druart
View author publications
Search author on:PubMed Google Scholar
Elizabeth H. Dibble
View author publications
Search author on:PubMed Google Scholar
Ana P. Lourenco
View author publications
Search author on:PubMed Google Scholar
Grayson L. Baird
View author publications
Search author on:PubMed Google Scholar

Contributions

The study was conceived of by E.C.S. and G.L.B. The experimental design was created primarily by E.C.S., M.H.B., and G.L.B. with assistance from P.S.L. and L.D. E.H.D. and A.P.L. refined the wording of the vignette to reflect real-world clinical practice. E.C.S. drafted the introduction, methods, and discussion with assistance from M.H.B. and G.L.B. G.L.B. drafted the results section and conducted all analyses. P.S.L., L.D., E.H.D., and A.P.L. reviewed the manuscript and provided feedback. All authors approve of the manuscript.

Corresponding author

Correspondence to Grayson L. Baird.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Song, E.C., Bernstein, M.H., Lay, P.S. et al. Accessing AI mammography reports impacts patient follow-up behaviors: the unintended consequences of including AI in patient portals. npj Digit. Med. 8, 761 (2025). https://doi.org/10.1038/s41746-025-02201-0

Download citation

Received: 22 October 2025
Accepted: 21 November 2025
Published: 03 December 2025
Version of record: 15 December 2025
DOI: https://doi.org/10.1038/s41746-025-02201-0

Subjects

Abstract

Similar content being viewed by others

Economics of AI and human task sharing for decision making in screening mammography

Equitable impact of an AI-driven breast cancer screening workflow in real-world US-wide deployment

Artificial intelligence for breast cancer screening in mammography (AI-STREAM): preliminary analysis of a prospective multicenter cohort study

Introduction

Results

Primary outcome

Litigation

Secondary outcomes

Second opinion

Second look

Primary care physician (PCP) follow-up

Request for more accurate follow-up imaging (MAFI)

Trust

Concern about breast cancer

Financial Reasoning

Discussion

Materials and methods

Participants and setting

Procedure

Vignette design and experimental conditions

Measures

Checking behavior (i.e., behavior to reduce anxiety)

Litigation

Trust

Concern

Comprehension and manipulation check

Statistical analyses

Data availability

Code availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links