Introduction

More than 1.3 billion individuals are estimated to live with diabetes mellitus by 2050 globally1,2. Diabetes is a risk factor for numerous other diseases and increased morbidity and mortality3,4,5. Diabetes has been linked to socioeconomic status (SES), dietary risk, and physical activity, and it is crucial to provide tailored support for each patient subgroup to help them manage this lifelong disease condition6,7. Patient-centered care (PCC), which emphasizes care addressing each patient’s particular needs and preferences8,9, has become an essential approach for the self-management of diabetes10. PCC was found to improve patient engagement in care and enhance self-care skills and confidence while reducing disease-related distress11,12. Moreover, PCC was associated with better quality of life and health outcomes10,13.

Secure messaging through the patient portal has been an essential tool for patients to share concerns with clinicians and ask questions to care for their medical conditions14. Driven by the pandemic, the volume of patient messages increased by 50% in the past few years, reflecting patients’ high demand for this secure communications channel15,16. Reportedly, diabetes was a strong predictor of increased severity and mortality of COVID-1917. Hence, patients with diabetes may have had substantial clinical issues or specific concerns during the pandemic. A systematic review found that those with diabetes were one of the most active patient groups in the patient portal18. Therefore, secure patient messages may contain a key component to enhance PCC, which may differ from the clinician’s perspectives from clinical notes19. Although, patient messages reflect their concerns of interest, limited prior study exists on the range of content and inquiries of patients beyond broad process categorizations20. The restricted feasibility of analyzing this massive text data could potentially be one reason until the recent emergence of innovative artificial intelligence (AI).

Natural language processing (NLP)-based topic modeling can extract key information from large volumes of text data. Recently introduced AI-enabled NLP models (e.g., bidirectional encoder representations from transformer (BERT)) have been widely used for the analysis of patient-generated health data, including social media or online forum text21,22. This AI/NLP-based approach of analyzing the secure patient messages allows us to gain insight into what clinical issues were brought up to the clinicians by patients with diabetes. Furthermore, generative AI, the most popular form known as ChatGPT-4, has obtained rapid popularity and has been actively tested in medicine for its potential applications23. While its expert level of medical knowledge and advanced clinical reasoning has been reported24,25,26, potential use as a clinical assistant in tailored diabetes care has been rarely explored despite its promising capacity.

Thus, we aimed to draft potential AI tools that can assist clinicians in providing tailored diabetes care uniquely corresponding to patients’ spoken needs and assess the usefulness of those AI tools. To illuminate patients’ needs, we analyzed electronic patient messages from those with diabetes using NLP. We tackled prompt-engineered LLM to comprehensively draft AI tools to reflect patients’ addressed issues. Finally, we performed a validation study with five endocrinologists to quantify clinicians’ perceived usefulness of the suggested AI tools. The findings provide novel insights from both beneficiaries’ perspectives into formulating AI tools as clinical assistants for tailored patient support, which could contribute to empowering patients and PCC.

Results

We identified a total of 11,151,561 unique message threads in 2013–2024, and further limited messages to clinical questions (Patient medical advice request (PMAR)), 66.9% n = 7,456,800/11,151,561), excluding patient scheduling, patient medication renewal request, or general questionnaire submissions. Of those, this study included unique message threads that were routed to Endocrinology divisions (7.1%, n = 528,199/7,456,800) (Fig. 1).

Fig. 1: Data source and study design.
figure 1

a Individuals with diabetes mellitus: Determined by ICD-10 codes: E08 (underlying DM), E09 (induced DM), E10 (Type 1 DM), E11 (Type 2 DM), E13 (Secondary DM); b Number of unique message threads.

LLM-defined clinical issues

LLM-interpreted primary issues of patients with diabetes are in Table 1: Topic 1) Dietary concerns and weight control; Topic 2) Interpreting lab results (e.g., blood, urine, and A1C); Topic 3) Thyroid management (e.g., medication, thyroid-stimulating hormone (TSH) level and test); Topic 4) Administrative challenges (e.g., paperwork and authorizations); Topic 5) Bone health (e.g., imaging and surgery for bone disease related to diabetes); Topic 6) Navigating lab orders and results; Topic 7) Appointment scheduling; Topic 8) Medication dosage management; Topic 9) Prescription and supplies refills and insurance; Topic 10) Data education needs for Dexcom and pump; Topic 11) Patient-reported hypoglycemia concerns; Topic 12) Blood glucose management in diabetes. In terms of the top 12 topics, the two time periods shared the same primary concerns. All word clouds and LLM-interpreted topics and titles are available in Supplementary Table 1.

Table 1 Primary issues patients with diabetes addressed through patient portal messagesa

AI tools to assist clinicians in tailored patient support

Table 2 presents LLM-drafted AI tools for supporting clinicians in patient care. The overall mean of clinicians’ perceived usefulness for AI assistance was 4.30/5.00 ((SD) = 0.38; where 5 = Useful). Clinicians perceived some AI tools as highly useful, including 1) Provide evidence-based answers to frequently asked questions, reducing response time (m = 4.8, SD = 0.45); 2) Summarize policy changes and provide real-time updates on covered medications under popular insurance (m = 4.8, SD = 0.45); 3) Automate patient education on hypoglycemia management and prevention strategies (m = 4.6, SD = 0.89); 4) Offer templated but customizable responses for common lab-related inquiries (m = 4.8, SD = 0.45); and 5) Create templates for authorization letters to expedite processing (m = 5.0, SD = 0.00). While the overall perceived risk was moderate and low, m = 3.68/5.00 (SD = 0.42; where 5 = Strongly disagree with risk), a few AI assistance were perceived as relatively risky, including 1) Synthesizing patient data to help prioritize urgent request (m = 2.80, SD = 0.84); 2) AI-driven message triage system by urgency and topic (m = 2.80, SD = 0.84); and 3) Real-time glucose data interpretation and adjustment suggestions (m = 2.80, SD = 1.10). Commonly suggested mechanisms were automating feedback, creating educational content, and crafting templated responses for nutrition, TSH, and other lab tests and bone issues to personally respond to each patient to reduce the workload of clinicians. See Supplementary Tables 24 for full information.

Table 2 Perceived as highly useful AI assistance with perceived risk in tailored patient support in diabetes care

Patient characteristics

A total of 11,123 patients with diabetes were included in this study: 4530 individuals in the pre-COVID-19 group and 6593 individuals in the COVID-19 group. Our study population was mostly non-Hispanic ethnicity (82–83%) and married (65%), and nearly half were females (53–54%) and White race (43–50%) in both pre-COVID-19 and COVID-19 (Supplementary Table 5).

Message characteristics

A total of 324,109 messages from patients with diabetes were included, and pre-defined topics captured 80.9% (n = 262,249) of messages (Table 3). During COVID-19 (n = 200,657), approximately twice as many messages were included as compared to the pre-COVID group (n = 123,452). Messages from patients with type 2 diabetes were dominant (84.3%, n = 273,241).

Table 3 The number of patient portal messages by topics among patients with diabetes mellitus

The most common primary topics were scheduling (20.7%, n = 54,401), meal/diet (19.2%, n = 50,471), pharmacy/refill (14.0%, n = 36,758), lab order/result (11.2%, n = 29,428), medication (9.4%, n = 24,605), and thyroid/hormone (8.6%, n = 22,560). After excluding scheduling and pharmacy refills, two already existing separate channels, the top 4 topics accounted for half of the total messages (Table 3). During COVID-19, meal/diet-related messages increased (from 17.5% to 20.3%, p < 0.0001) while lab order/result and thyroid/hormone-related messages decreased (from 13.5% to 9.9% and from 10.4% to 7.5%, respectively, both p < 0.0001) compared to pre-COVID-19 (Supplementary Table 6). Figure 2 presents topic clusters that show the size and distributions of neighboring topics. In this topic visualization, the thyroid/hormone topic was distinct from other topics and did not overlap with other topics.

Fig. 2: Low-dimensional representation of patient messages from individuals with diabetes mellitus, 2013–2024.
figure 2

12,000 messages were randomly selected for each diagnosis (type II and type I DM) for visualization. Message topics: Scheduling/canceling, Pharmacy/refill, Medication, Symptom (Pain/fell), Thyroid/hormone, Lab/test order, Treatment/surgery, Mental Health/sleep, Insurance/coverage, Device/Sensor/supplies, Meal/diet.

The number of messages discussed on specific topics differed by patient characteristics among those with diabetes (Fig. 3). White patients had more messages on scheduling, lab order/result, pharmacy refill, thyroid/hormone, medication, device/sensor, insurance/coverage, symptom, imaging/surgery, and mental health/sleep (all, p < 0.0001) than all other racial groups (Fig. 3a). Compared to Hispanic patients, non-Hispanic ethnic groups had more messages on meal/diet (p = 0.035), lab order/result (p = 0.004), insurance/coverage (p = 0.007), symptom (p = 0.001), and imaging/surgery (p < 0.0001) (Fig. 3b). Female patients had more messages on scheduling, lab order/result, thyroid/hormone, medication, symptom, imaging/surgery, and mental health/sleep compared to male patients (all, p < 0.0001). Males had more messages on meal/diet and device/sensor than females (both, p < 0.0001) (Fig. 3c). Unmarried patients sought more advice on insurance/coverage (p = 0.001) and device/sensor (p < 0.0001) topics than their married counterparts (Fig. 3d).

Fig. 3: Ratea comparisons of sending messages on specific topics by demographic characteristics.
figure 3

a RateA = The number of patients in subgroup A who had messages on the topic X /Total number of patients in subgroup A; a Race: non-White includes Asians, Blacks, Native Americans, Other, and Pacific Islanders; b Ethnicity (Hispanic vs. non-Hispanic); c Sex (Female vs. Male); d Marital status: Unmarried includes divorced, life partner, other, separated, single, and widowed; * When the rate difference is significant (p < 0.05).

Discussion

We showcased and assessed clinicians’ perceptions of potential AI tools that can support clinicians in providing tailored patient care, reflecting patients’ pressing issues defined from 528,199 patient portal messages from those with diabetes. Perceived highly useful AI tools were, including assisting expedited administrative processes (e.g., drafting templates of authorization letters) and patient education (e.g., creating educational materials for common lab inquiries and customizable responses for glucose monitoring and pump usage). These AI tools may help clinicians with timely interaction and streamlined support, yet in an efficient manner. Meanwhile, AI tools that directly handle patient data were perceived as risky (e.g., synthesizing patient data for message triage by urgency and topic). This study proposes assorted AI applications as clinical assistance tailored to patients’ needs which were substantiated by clinicians’ evaluations. Our work contributes to offering critical ramifications for the development and advancement of potential AI tools for precision diabetes care.

Meal and dietary concerns related to blood glucose and insulin were the most actively discussed clinical topics, indicating an ongoing desire for patients with diabetes for support. As CGM devices have been widely adopted in the past few years, personalized nutrient suggestions based on real-time blood glucose levels may improve PCC27. Moreover, digital platform-based dietary suggestions were found to be acceptable and effective for glycemic control and weight loss among patients with diabetes28. Aligned with existing knowledge, in our study, clinicians perceived it as highly useful for AI-assisted carb-counting tools and personalized nutrition advice tailored to CGM and patient-reported data. A holistic approach that can account for all the essential data (e.g., CGM, dietary habits and lifestyles, medications, and insulin use) could be ultimately optimal. It can be started with AI tools that prepare evidence-based yet easy-to-understand educational materials on the impact of meal timing and nutrient composition with real-life examples to empower patients. This may help them understand those important relationships and have more autonomy in dietary management. Additionally, future studies may want to further develop AI tools that can craft meal plans for individuals with specific dietary restrictions to enhance tailored patient care and embrace diversity.

Not surprisingly, AI’s administrative and operational assistance was perceived as useful. Despite the existence of a separate scheduling channel, appointment-related issues were still the most common in the medical advice request channel. This highlights the need to reinforce the existing scheduling system. Perhaps a real-time interactive assistant could soon triage scheduling queries and efficiently schedule patient visits in the patient portal. Moreover, this AI-enabled conversational agent may especially help those with limited proficiency in direct scheduling, including older adults and non-English speakers, which could narrow the digital divide and possibly reduce related health access differences29. In addition, referral requests to other specialists, including ophthalmologists, podiatrists, or rheumatologists, for complications associated with diabetes may be streamlined. Lastly, real-time resource updates, including policy changes and medication supplies, could also be worth pursuing further. During the widespread shortage of glucagon-like peptide receptor agonizts (GLP-1RA) medications in the past few years, patients were exposed to the risks of falsified medications and unreasonable consequences30. AI-enhanced prediction models based on past medication use and supply could help avoid future imbalances of supply and demand, creating an environment for equitable access to demanding medications like GLP-1RA products.

Notably, thyroid hormone messages were common and distinct from other topics in visualization, which demonstrates thyroid-related messages were likely silo and independent. This could highlight that patients with diabetes may have benefited from a separate communication channel specialized for thyroid hormone in the patient portal. Caring for thyroid dysfunction is important to prevent further complications associated with diabetes. Uncontrolled TSH levels were related to the increased risk of developing other complications in the eyes and kidneys31. We hypothesize that AI assistance could help triage messages, interpret TSH test results, and educate on normal ranges and levothyroxine dosages. Given that patients with diabetes have serious concerns about managing their hormone levels, particularly among female and white individuals in this study, further efforts to empower their self-care with high-quality information and guidance would be worth pursuing. Furthermore, evaluating patients’ feedback and perspectives on the specific AI tool will also be essential.

Furthermore, patients’ primary concerns differed by race, ethnicity, sex, and marital status. White or female patients raised more concerns about bone-related issues than all other racial groups or males. Our findings align with previously reported risks of bone fracture among patients with diabetes, with higher risk in whites32 and women33. Through topic visualization, we observed that bone health issues were neighbored by pain, medication, and scheduling topics. Given that the combination of good glycemic control, medications for diabetes and osteoporosis, and a dietary and lifestyle-based approach is optimal for bone disease care in patients with diabetes, an interdisciplinary educational approach is important34. AI-driven automated Q&A for common queries for bone imaging and surgery and a decision-support tool for clinicians to optimize MRI referrals could be further refined for development.

White and female patients shared another serious concern, mental distress (e.g., depression, anxiety, and sleep problems), compared to other racial groups or males. Given that mental health inequities by race and gender became wider during COVID-1935,36, proactive mental health screening or telepsychiatry-based support could be helpful, where AI assistance may step in as a frontline symptom screener along with contemporary efforts37. Future efforts could focus on targeted mental health support by harnessing AI tools for women as a start because women were found to use telehealth for mental health care more than men38.

This study has several limitations. First, patient message data was from one academic institution. This limited variance of the source of the study population may restrain the generalizability of findings. However, we included messages from 22 affiliated health centers in northern California, and the race and ethnicity of included patients were diverse, with more than 50% non-White races and more than 11% Hispanic ethnicity. Second, we were unable to account for further SES for message analysis, including household income, education, and insurance which are important elements of social determinants of health (SDOH). However, this study provides the foundation for others to learn from and apply our results even with the minimal SDOH information available at the time of study. One of the study’s future directions includes assessing patient messages from diverse data sources (e.g., other healthcare systems or geolocations, community clinics, or online forums) or longitudinally with comprehensive sociodemographic characteristics to enhance our understanding of patients’ needs and issues that might differ by these elements. Third, we only captured the needs of patients communicating through secure messaging, while there might be other needs from patients not using the patient portal, which can add to the digital divide. Yet this study may have covered the majority of issues because patients with diabetes are highly active patient portal users. For instance, more than 70% already used secure messaging a decade ago20. In the future, assessing the needs of patients who are non-users of secure messaging should be considered to fill the knowledge gap.

In conclusion, AI-powered analyses were able to comprehensively voice patients’ needs and concerns, and we suggested various potential AI tools to assist clinicians in uniquely corresponding to such needs. Tailored patient support based on an enhanced understanding of their concerns may facilitate patient engagement and care, which is essential to achieving improved outcomes and lifelong management of diseases like diabetes. Demonstrated AI tools could expand the scope of AI’s use in tailored patient care that is not limited to diabetes.

Methods

Data source and study design

We obtained patient portal messages of individuals with diabetes from a large academic hospital (Stanford Health Care) and 22 multiple affiliated centers in the Bay Area, July 2013 to April 2024. We defined patients with diabetes, including type I and type II diabetes, using the ICD-10 codes. We excluded patients who had more than one ICD-10 code to focus on specifying patient groups with diabetes as a primary condition. Through the portal, patients can specify the purpose of sending messages by scheduling, insurance, general questions, or medical advice. PMAR allows patients to bring up health issues to discuss with and obtain optimal medical advice from clinicians. This was secondary analysis, no human participants were involved, hence, no informed consent was needed. The Institutional Review Board at Stanford University approved this study.

Topic modeling

We performed topic modeling to identify the primary concerns and issues addressed through the patient portal messages. By applying two widely used NLP approaches, we intended to extrinsically validate the primary model’s performance and generate different types of knowledge from the two models. Additionally, we stratified the messages by time, 2013–2020 and 2020–2024, to explore if there were unique topics during the pandemic.

First, to overview the patient message topics, we conducted unsupervised topic modeling. For text analysis, we first removed unwanted patterns and excessive whitespace, cleaned up irrelevant characters and formatting codes from raw message data, then tokenized the cleaned text and converted the words to their base forms. We then transformed into sentences by a pre-calculated embedding model (all-miniLM-L6-v2)39. This model was pre-trained with over 600 million social media posts and 12 million medical journals. We used uniform mapping and approximation and projection (UMAP) to simplify embeddings and removed stop words and infrequent words through the ConvectVectorizer model. We applied cTF-IDF to weigh the words by frequency and significance, and the K-means model to generate the optimal number of clusters based on the silhouette coefficient and Davies–Bouldin index, computational measurements that indicate the quality of clustering determined by similarity and overlapping40,41. To visualize these topics, we created word clouds, which consist of the top ten keywords for each cluster. The larger keywords represent greater frequency.

Second, we performed semi-supervised topic modeling using the BERTopic package, which leverages BERT techniques42. In this model, another pre-calculated embedding model was used for sentence transformation (thenlper/gte-small), which was pre-trained for general purposes of text analysis43. This model encodes each message as a 384-vector number. We also applied UMAP and CounterVectorizor models to lower the dimensionality of embeddings and remove stopwords. For clustering, we took a zero-shot approach by providing pre-defined keywords for the topic model to cluster similar topics to the given keywords. We set the semantic similarity threshold as 0.82, and messages meeting this threshold, computed by cosine similarity, were categorized to the corresponding topic. The benefit of this approach includes the ability to capture messages of our interest even though those may not contain the exact keywords that we provided. To visualize the distribution of representative messages, we plotted low-dimensional UMAP by topics, in which we randomly sampled 12,000 messages from patients with diabetes. Additionally, the remaining messages that were not categorized into the pre-determined topics created their own clusters of new topics based on similarity. This enabled us to comprehend novel topics that we may need to pay attention to. Pre-defined topics and keywords were obtained from our first unsupervised topic modeling (e.g., Scheduling [“appointment,” “schedule,” “canceling,” “available,”] Device/sensor [“freestyle,” “sensor,” “dexcom,” “data,” “CGM,” “pump.” “meter.”]) When the keyword repeatedly appeared in more than one cluster, we assigned the keyword to the group with significance determined by the size of the word in a word cloud. A full list of pre-defined topics and keywords is available in Supplementary Note 1.

Generative AI for topic interpretation and potential roles as a clinical assistant

To enhance the interpretability of extracted topics, we applied a widely used large language model/generative AI (ChatGPT-4, OpenAI Inc.). We input the word clouds from our first NLP model into generative AI and directed it to perform two tasks: First, summarize the main issues and provide a title for each word cloud. Second, offer suggestions on how AI can assist healthcare professionals in providing personalized patient support. To improve the information quality, we used a prompt engineering strategy, which consists of multiple techniques44,45: 1) role prompting (e.g., as Dr. GPT, a professional endocrinologist), 2) directive commanding (e.g., First, summarize the primary issues; Second, suggest the title; Third, provide two or three suggestions), 3) expertize simulation (e.g., I myself am an endocrinologist in the hospital), 4) zero-shot chain of thought (e.g., take time to think deeply and step-by-step to be sure). The full-engineered prompts are available in Supplementary Note 2.

Evaluation of generative AI-suggested information

Five experienced endocrinologists with various levels of experience and sub-specialty areas (C.D., S.H.K., R.A.L., S.M.S, and T.A.) independently evaluated the information that LLM provided: 1) their agreement with AI’s interpretation about patients’ primary concerns, and 2) perceived usefulness and 3) perceived risk of suggested AI’s roles in assisting healthcare professionals in providing tailored patient support. We required the assessors to read the given instructions and assessment protocol thoroughly and follow it strictly. For perceived risk assessment, we adapted the AI Risk Management Framework from the National Institute of Standards and Technology46. Endocrinologists used the 5-point Likert scale for agreement (1-disagree; 5-agree), perceived usefulness (1-not useful; 5-useful), and perceived risk (1-Strongly agree with the risk of harm; 5-Strongly disagree with the risk of harm). To compute the mean and standard deviation (SD) for each topic, we averaged out five endocrinologists’ scores, following an ensemble approach used when there is no ground truth to compare47. In this approach, the mean represents the degree of agreement among the assessors, and SD shows the uncertainty48. The guided protocols for clinicians’ assessment and scores are available in Supplementary Tables 14, respectively.

Patient messages by demographic characteristics

To investigate if the primary concerns and issues differ by patient demographics, we stratified patient characteristics, including sex (female vs male), race (White vs non-White including Asian, Black, Native American/Pacific Islander, and Other), ethnicity (Hispanic vs non-Hispanic), marital status (married vs unmarried, including single, divorced, separated, widowed, life partner, and other) for each message topic. Then, we calculated the rates of sending messages on specific topics by patient characteristics and compared them to identify demographic subgroups that addressed specific concerns more (e.g., RateA = The number of patients in subgroup A who had messages on the topic X /Total number of patients in subgroup A). Moreover, to investigate if the addressed concerns differed during the COVID-19 pandemic, we also stratified the analysis by time, before (2013–2020) and during COVID-19 (2020–2024). We applied two proportions z-test for quantitative comparisons of rates by demographics and obtained 95% confidence intervals (95% CIs) and p-values. Statistical significance was determined at a p < 0.05 level using Python 3.10 in Google Colab (Mountain View, CA, USA).