Introduction

Diabetes mellitus, a chronic disease characterized by hyperglycemia, altered metabolism, and complications affecting both microvascular and macrovascular systems, represents a significant and escalating public health challenge1,2. Current estimates place global prevalence at over 800 million people, with projections suggesting this figure could rise to 1.3 billion by 20503,4,5. More than 90% of these cases are attributed to type 2 diabetes (T2D), emphasizing the urgent need for innovative, effective management strategies that can meet the diverse and evolving needs of this population4,6.

The emergence of artificial intelligence (AI) and wearable technology has revolutionized healthcare, offering innovative solutions for diabetes management7. AI, broadly defined as the ability of computer systems to perform tasks typically requiring human intelligence, is playing an important role in healthcare8. AI-powered systems can analyze vast amounts of data, identify patterns, and make predictions, enabling them to support clinical decision-making9, predict outcomes, and personalize treatment approaches10. This ability to process and interpret complex datasets is particularly valuable in diabetes management, where individual patient needs and responses to treatment vary widely9.

Wearable devices, such as continuous glucose monitors (CGMs), smartwatches, and other sensors, are becoming increasingly important in capturing real-time data for individuals living with diabetes8. These devices provide continuous monitoring of physiological parameters8, allowing individuals to gain insights into their interstitial glucose levels and make informed decisions about their lifestyle choices11. CGMs have revolutionized diabetes management for individuals with type 1 diabetes and are increasingly being adopted by people with type 2 diabetes as an effective tool for real-time glucose monitoring.12,13,14,15. They provide real-time interstitial glucose readings, enabling individuals to adjust insulin doses, dietary intake, physical activity, and other lifestyle factors to prevent hypoglycemia and hyperglycemia16.

While the use of wearable technology in diabetes management is expanding17, and numerous studies have explored the use of AI and wearable devices in healthcare9,18,19,20, the integration of AI with these devices, particularly for T2D, remains relatively unexplored8,9. This gap in research is significant because AI has the potential to significantly enhance the effectiveness of wearable devices8,13. By analyzing data from wearable sensors, AI algorithms can provide personalized insights, predict interstitial glucose fluctuations, and even suggest dietary and lifestyle adjustments11,21. AI-powered systems can also be used to automate insulin delivery, reducing the burden on individuals with diabetes and improving effectiveness of treatment plans10,22.

While prior reviews have examined either AI applications in diabetes care or the role of wearable technologies independently, this review is, to our knowledge, one of the first to systematically evaluate the intersection of artificial intelligence and wearable technology specifically T2D management. Our review uniquely focuses on AI models that operate on physiological data collected from wearable devices—such as CGMs, fitness trackers, and smartwatches—to improve glycemic prediction, clinical decision support, and self-management outcomes. We also employed a novel synthesis framework to extract and analyze key dimensions across included studies: study population characteristics (including demographic diversity and metabolic profile), AI model type and interpretability, sensor modality and fusion, and clinical endpoints. This enables a more granular and equity-focused evaluation than prior reviews. This review focuses specifically on empirical studies where artificial intelligence models were applied to physiological data collected from wearable devices, such as continuous glucose monitors and smartwatches, for clinical or self-management outcomes in individuals with T2D. Studies that involved mobile health tools or digital coaching platforms without a direct AI modeling component were not included.

This systematic review aims to assess the current state of research on the integration of AI and wearable technology in T2D management, highlighting potential benefits and challenges while identifying future directions for research and development.

Results

Study selection

Figure 1 illustrates publication by year, indicating a steady increase in AI-related manuscripts primarily focused on T2D over the past decade, with significant growth beginning in 2022. Early years featured sparse publications, predominantly centered on interstitial glucose prediction, while later years showed diversification into insulin management and classification tasks. In subsequent years, the scope expanded significantly, with interstitial glucose prediction remaining dominant, accompanied by growing contributions from classification tasks and insulin management. Other objectives included detecting and classifying physical activity, evaluating diabetic retinopathy using CGM data, estimating stress levels based on several physiological parameters, and assessing the impact of CGM sensor location on glucose forecasting errors. Overall, there was a rapid growth and diversification of AI applications in diabetes research, reflecting a shift toward more balanced and varied study objectives in recent years.

Fig. 1
Fig. 1
Full size image

Trends in study objective across publication years.

Study characteristics

Studies included in the review demonstrated a range of wearable technologies and diverse applications of AI models for managing T2D. Key attributes such as study design, geographic location, population demographics, wearable device types, AI architectures, model performance, and interpretability measures were summarized in Tables 1, 2. Sixty-seven percent (40 of 60) of the studies were observational or experimental in nature, with 20% (12 of 60) consisting of prospective observational studies focused on real-time data from wearable devices in naturalistic settings. Thirteen percent (8 of 60) of studies employed randomized controlled trials or non-randomized experimental designs to evaluate interventions involving CGMs and wearable activity trackers.

Table 1 Summary of study characteristics
Table 2 Summary of studies integrating artificial intelligence and wearable technology for diabetes management

Studies were conducted across diverse geographical regions, with 45% (27 of 60) conducted in North America, primarily in the United States; 30% (18 of 60) in Asian countries, notably China and South Korea; 20% (12 of 60) in Europe; and 5% (3 of 60) in other regions, such as Australia and the Middle East. Notably, regions like Africa and South America were underrepresented. Sample sizes varied from five to over 1000 participants, with a median size of 150 participants. Forty percent (24 of 60) of studies included fewer than 100 participants, potentially limiting generalizability. Most studies focused on adults with T2D, with an average participant age of 55 years. Gender distribution was relatively balanced, with 48% female participants. However, only 7% (4 of 60) of studies reported racial and ethnic demographics, with low representation of minority populations.

Wearable devices used in the studies included CGMs in 70% (42 of 60) of studies to provide real-time glucose monitoring. Other devices, such as fitness trackers and smartwatches, were utilized in 20% (12 of 60) of studies to capture physical activity, heart rate, and other metrics. Less common wearables, including photoplethysmography (PPG) sensors and electrodermal activity monitors, accounted for 10% (6 of 60) of studies. Data collected typically included minute-by-minute glucose readings, heart rate variability, physical activity, and sleep patterns, often integrated into AI models for predictive analysis.

AI architectures varied significantly across the studies, reflecting a shift toward advanced modeling techniques. Deep learning models, particularly recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, were employed in 45% (27 of 60) of studies, owing to their ability to process time-series data from wearables. Traditional machine learning models, such as random forests and support vector machines (SVMs), were used in 30% (18 of 60) of studies for their interpretability. Emerging architectures like temporal fusion transformers and hybrid models constituted the remaining 25% (15 of 60), highlighting a trend toward sophisticated AI solutions.

Model performance and predictive accuracy

Most studies reported performance metrics, including root mean square error, mean absolute error, and area under the receiver operating characteristic curve, with prediction accuracies varying based on model complexity and data input quality. Sixty percent (36 of 60) of the studies achieved root mean squared error (RMSE) values within the clinically acceptable range for glucose prediction, typically under 15 mg/dL. Predictive accuracy was generally higher in studies utilizing large, high-frequency datasets from CGMs, with some advanced models achieving accuracy rates above 85% in predicting glucose levels within a 1–2-hour window.

Forty percent (24 of 60) of studies incorporated interpretability measures such as Shapley Additive Explanations (SHAP), and feature importance analysis to improve clinician trust and model transparency. Despite this, 60% (36 of 60) of studies used complex “black-box” models, posing barriers to clinical adoption due to limited transparency.

Results of individual studies

The 60 manuscripts reviewed explored a range of AI models with applications to data from wearable devices in T2D, encompassing early detection, diagnosis, real-time glucose monitoring, lifestyle interventions, and personalized insulin management. Several studies, such as those by Avram et al.23. and Yin et al.24, developed digital biomarkers using smartphone-based PPG and other wearable sensors to enable non-invasive diabetes detection. Deep neural networks and machine learning frameworks were leveraged to improve diagnostic accuracy, but limitations in data diversity and model generalizability were noted. Similarly, CGM devices, integrated with AI models such as recurrent neural networks and transfer learning approaches demonstrated by Deng et al.25 and He et al.26, showcased the capability to predict interstitial glucose trends; however, challenges related to data imbalance and patient-specific adaptation persisted. Other approaches, including ensemble models and temporal convolution networks, improved prediction accuracy, though the need for broader clinical validation remained a significant concern.

Beyond glucose monitoring, several studies explored real-time intervention strategies, including the use of reinforcement learning and fuzzy logic models for optimizing insulin dosing. For example, studies by Zhu et al.27 and Sun et al.28 demonstrated promising results in managing glycemic variability through AI-guided insulin adjustments. Integrating lifestyle data (e.g., sleep, exercise) from wearables, studies such as those by Kim et al.29 and Ramazi et al.30 provided insights into individualized glycemic management, underscoring the value of multimodal data in tailoring diabetes management.

Key limitations noted in the studies included interpretability of complex AI models and sources of bias. Studies focused on clustering algorithms and decision support systems highlighted the need for clearer model explanations to facilitate clinical adoption. Furthermore, gaps in real-world validation and the lack of broader demographic representation were consistently cited, underscoring the need for future research to address these limitations and improve the equity and clinical applicability of AI-driven solutions for T2D management.

Various biases observed in the manuscripts reviewed included selection bias, racial and ethnic bias, and other sources of bias. Selection bias was the most prevalent, stemming from homogeneous study populations often skewed toward certain racial or ethnic groups. This limited generalizability and resulted in the underrepresentation of minority populations, which impacted AI model performance across subpopulations and raised equity concerns. Additional biases, such as data source integrity, technological limitations (e.g., sensor accuracy), and outcome reporting bias, affected intervention consistency and result accuracy. While some studies employed robust methodologies, many lacked external validation and did not adequately address demographic imbalances, underscoring key areas for improvement.

Discussion

This systematic review highlights the increasing role of AI and wearable devices in managing T2D. The majority of studies utilized CGMs to provide granular, real-time glucose data, enabling AI models to predict glycemic variability and detect early hypo- or hyperglycemic events. Other wearables, such as fitness trackers and smartwatches, were used to monitor physiological parameters, including physical activity and heart rate, broadening the data scope available for AI-driven insights. AI models predominantly used deep learning architectures, including recurrent neural networks and long short-term memory networks, which excelled at capturing temporal data patterns from wearables. Traditional machine learning models, such as random forests, remained prevalent due to their interpretability, while emerging architectures demonstrated the potential for integrating high-dimensional data but often lacked transparency. Efforts to enhance model interpretability included tools like SHAP values, which improved transparency but were applied inconsistently across studies. However, several key gaps were identified. Only 7% of studies reported racial and ethnic demographics, with limited representation of racial and ethnic minority populations, particularly in U.S.-based studies. While the geographic distribution of studies was broad, including several from Asian countries, most did not report disaggregated demographic data, making it difficult to assess inclusion of underrepresented populations in any given context. Many studies did not include external validation, and smaller sample sizes limited the generalizability of their findings. None of the studies conducted long-term follow-ups, leaving gaps in understanding the prolonged impact of AI-driven interventions on T2D outcomes. While these advancements underscore the potential of AI and wearable devices to revolutionize diabetes management, they also reveal critical gaps in the existing body of research. Addressing these limitations is essential to ensure the equitable and effective application of AI-driven interventions in diverse clinical populations.

This systematic review fills an important gap in the literature by being the first to comprehensively synthesize studies that combine artificial intelligence with wearable-derived physiological data for the management of T2D. While many prior reviews have focused broadly on digital health or AI in diabetes care, they have not systematically assessed the combined application of wearable technology and AI modeling in T2D populations. In addition to mapping this intersection, our review introduces a framework that explicitly addresses issues of demographic inclusivity, model interpretability, and sensor fusion—areas that are often discussed in isolation but rarely integrated in prior literature reviews. These features differentiate our review and provide specific guidance for future development of equitable, explainable, and data-rich AI systems for diabetes care.

Broader limitations in the studies reviewed have significant implications for the development and application of AI models with wearable devices for T2D management. One prominent limitation is the small sample sizes used in many studies, which can lead to model overfitting and limit the generalizability of findings to broader populations31,32. Small datasets may fail to capture the full range of physiological variability seen in individuals with T2D, reducing the robustness and reliability of AI predictions in clinical settings. Additionally, many studies demonstrated demographic homogeneity, with a significant proportion of research conducted on specific regional populations, such as predominantly Chinese or white non-Hispanic cohorts. This lack of diversity may result in biased AI models that underperform in underrepresented groups, raising concerns about equity in diabetes care.

Another notable limitation relates to data quality and imbalance. Wearable data, while rich in granularity, often suffers from issues such as missing data, noise, and inconsistencies due to patient adherence or device fatigue, sensor malfunctions, or differing data collection protocols. Such data quality issues can undermine model accuracy and skew predictions, particularly for rare events like hypoglycemia33,34. The wearable devices themselves present additional challenges, as non-invasive sensors can produce less reliable data under specific conditions, such as varying skin tones, high physical activity, or temperature fluctuations. This impacts the real-time accuracy and reliability of AI models dependent on wearable inputs thereby reducing their overall utility.

Limited external validation of AI models is another critical constraint observed in the reviewed studies. Many models were trained and tested on the same datasets without independent validation, reducing their applicability in real-world settings. The lack of external validation poses a significant challenge for translating AI-driven interventions into routine clinical practice35. Furthermore, the complex “black-box” nature of many AI models, such as deep neural networks, often leads to limited interpretability, hindering clinician trust and adoption36,37. While efforts such as SHAP values38 and feature importance ranking have been employed to improve model transparency, these methods were applied inconsistently across studies.

Finally, few diabetes-specific studies have directly assessed differential AI model performance across diverse populations, prior research in other health domains such as imaging, genomics, and EHR-based prediction has demonstrated disparities by race, sex, and age39,40,41,42,43,44. As such, ensuring diversity in data and transparency in validation remains an important research priority to avoid inequities as AI-driven wearables advance. These inclusive models aim to minimize performance disparities across subgroups and reduce algorithmic bias in healthcare settings. In the short term, inclusivity can be improved by mandating standardized reporting of participant demographics and evaluating model performance across subgroups. In the long term, efforts should focus on the development of large, federated, and representative datasets; incorporation of fairness-aware modeling techniques; and the establishment of regulatory standards for equity auditing. These strategies are critical to ensuring that AI-driven wearable solutions are equitable, generalizable, and beneficial to all individuals with T2D, especially those historically underrepresented in research and clinical care.

Historical clinical practices and demographic imbalances in data can lead to biased model predictions, with potentially harmful consequences for specific patient subgroups45,46,47,48,49. Additionally, integrating data from multiple wearable sensors presents challenges, as differences in data formats, sampling rates, and reliability complicate fusion and interpretation. Addressing these challenges is critical to ensuring equitable and effective application of AI-driven wearable technologies in clinical practice.

The integration of AI-driven wearable technologies into clinical practice for T2D management offers significant potential for improving patient outcomes but also poses practical challenges. AI-enabled wearables facilitate real-time glucose tracking and predictive intervention, reducing glycemic variability and preventing acute complications such as hypoglycemia or hyperglycemia. By offering timely, personalized insights, these technologies support proactive diabetes management, minimizing the need for emergency care and enhancing long-term outcomes. Wearables also empower individuals with diabetes to take an active role in their care, fostering better adherence and self-management by providing real-time feedback on how daily lifestyle factors impact glucose levels50. However, these benefits are tempered by implementation challenges, including barriers to clinical adoption that must be addressed to maximize the potential of AI-driven wearable devices.

A major challenge is the limited interpretability of complex AI models. Clinicians may be reluctant to rely on AI-driven recommendations when they cannot fully understand or explain the model’s decision-making process to individuals with diabetes. Improving transparency through tools like SHAP38 values and feature importance analysis is essential to gaining clinician trust and ensuring effective integration into practice.

In addition to interpretability, logistical challenges, such as infrastructure development and clinician training, must also be overcome to support the adoption of AI-enabled wearables. Moreover, interpreting and responding to AI-generated outputs requires additional clinician time, which may not be reimbursed under current healthcare models. Without appropriate policy changes, this may inadvertently increase clinician workload rather than alleviate it. Implementing AI-driven wearables requires robust infrastructure and targeted clinician training programs. Many healthcare professionals are unfamiliar with AI’s technical aspects, which may hinder effective use and interpretation of wearable data. Comprehensive training focused on AI fundamentals, data interpretation, and clinical implications will be crucial. Cost also remains a significant barrier, as high-quality wearables may be prohibitively expensive for underserved populations. Expanding insurance coverage and improving accessibility are necessary for equitable adoption. Lastly, over-reliance on wearable-generated data poses risks, as AI insights cannot replace holistic clinical assessments or personalized patient care plans. Clinicians must balance technology use with traditional care to ensure comprehensive diabetes management. Addressing these challenges will require targeted research to close existing gaps and optimize AI-driven solutions.

Future research should prioritize several areas to enhance the effectiveness, equity, and reliability of AI-driven wearable solutions. Developing benchmark datasets with diverse patient data from multiple sources is essential to improve model validation and reproducibility51. Increasing diversity in patient populations included in AI model development will ensure that models better reflect the physiological and lifestyle factors relevant to T2D. Additionally, expanding multimodal AI techniques to integrate data from CGMs, heart rate monitors, and activity trackers can provide richer, context-aware predictions. Such advancements would enable more personalized and equitable care.

Emerging AI models, such as liquid neural networks (LNNs), exemplify the potential for innovation in this field52. By dynamically adapting to real-time data, LNNs excel at capturing rapid changes in physiological signals, such as fluctuating glucose levels, making them highly effective for wearable device applications. Their resilience against noisy or incomplete data and low computational overhead enables efficient on-device processing46. However, challenges such as complex training requirements and regulatory concerns must be addressed to fully realize their potential. Future efforts should focus on balancing these limitations with their adaptability to ensure practical applications in healthcare.

Cost-effectiveness analyses comparing AI-driven interventions to traditional care approaches are also needed, as they provide evidence on the economic benefits and efficiency of AI technologies, helping to inform healthcare policy and drive broader adoption of these innovative solutions. Finally, developing practical interpretability tools for complex models and investigating how clinician trust in AI can be enhanced will ensure AI-driven wearables are effectively and equitably integrated into clinical practice. Research in these areas will be critical for transforming AI-driven wearables from innovative concepts into trusted, impactful tools in T2D care.

There is also a pressing need to move beyond simplistic metrics such as time-in-range, which often fail to capture the rich variability in physiological patterns. Leveraging the entirety of time-series data generated by wearables can provide deeper insights into individual health, enabling more precise anomaly detection and trend prediction. To achieve this, researchers must develop scalable algorithms capable of integrating contextual data while addressing critical issues such as data privacy and ethical considerations. These efforts will be essential to unlocking the full potential of wearable technologies in diabetes care.

This systematic review has limitations that should be acknowledged. First, the exclusion of gray literature and non-English studies may have resulted in the omission of relevant research, potentially introducing selection bias. Additionally, the review relied on studies with variable methodologies, data quality, and reporting standards, which may have introduced heterogeneity that was challenging to fully address in the synthesis. Lastly, the absence of a meta-analysis, due to study heterogeneity, restricted the quantitative integration of results, which could have strengthened the overall conclusions. These limitations highlight the need for more comprehensive and standardized research in this field.

This systematic review emphasizes the transformative potential of AI and wearable devices in managing T2D. Despite challenges, these technologies have shown significant promise in enabling real-time glucose monitoring, early intervention, and personalized feedback, offering new opportunities to improve glycemic management and patient self-management. However, several critical gaps must be addressed to ensure their equitable and effective implementation in clinical practice. These include addressing limited demographic diversity in study populations, use of benchmark datasets for validating and comparing AI models, and improving model interpretability and transparency. Future research should prioritize the development of inclusive AI models that account for diverse patient populations, the enhancement of complex algorithm explainability, and the exploration of multimodal AI approaches for integrating data from various wearable devices. Addressing these gaps is vital to unlocking the full potential of AI-driven wearables and transforming diabetes care into a more personalized, data-driven, and effective approach for diverse populations.

Methods

Information sources and search strategy

This systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines53. The review protocol was registered with PROSPERO under registration number 1009318 before conducting the review. A comprehensive search strategy was developed to identify relevant studies on the application of AI models in wearable devices for T2D management. The strategy included controlled vocabulary (e.g., MeSH terms) and free-text keywords, combined using Boolean operators. Search concepts included:

  1. 1.

    Type 2 Diabetes Mellitus: Terms related to type 2 diabetes, including common synonyms such as “Non-Insulin Dependent Diabetes Mellitus” and “Maturity-Onset Diabetes,” were used54.

  2. 2.

    Wearable Devices: Terms related to wearable technology, including “wearable devices,” “smartwatches,” “continuous glucose monitors,” and “fitness trackers” were used17,55,56.

  3. 3.

    Artificial Intelligence: A comprehensive list of AI-related terms was used, including “machine learning,” “deep learning,” “neural networks,” and “predictive analytics”8,9.

Database-specific syntax and filters were employed to maximize retrieval efficiency, with filters specifically used to exclude gray literature. The last comprehensive search was conducted on September 30, 2024, covering PubMed, IEEE Xplore, ACM Digital Library, and Embase. No additional studies were identified through manual reference searches. Specific keyword searches for each database are provided in Supplementary Tables 14.

Manuscript eligibility criteria

The eligibility criteria for manuscripts included were defined based on language, study type, population, intervention, and outcomes. Specific inclusion and exclusion criteria were established to ensure relevance and focus on the application of AI models in wearable devices for diabetes management. Although our primary focus was on AI models and wearable devices in T2D, we also considered studies involving other diabetes populations (e.g., T1D, gestational diabetes mellitus, prediabetes) under specific conditions. These studies were included if (1) they had mixed cohorts that included individuals with T2D, (2) they evaluated AI algorithms or wearable-device applications that are translatable or generalizable to T2D care (e.g., glucose prediction models based on CGM data, which are physiologically relevant across diabetes types), or (3) they provided methodological insights or technological innovations with clear implications for advancing T2D management. For example, AI techniques developed in T1D populations for hypoglycemia prediction may be adaptable to T2D, particularly as CGM use expands among people with T2D. Similarly, studies in prediabetes populations were included if they addressed preventive strategies directly applicable to delaying T2D onset. Conversely, studies that focused exclusively on populations or outcomes without relevance to T2D—such as GDM interventions unrelated to longer-term metabolic risk or T1D algorithms designed solely for closed-loop insulin delivery without broader implications—were excluded.

Studies focused solely on mobile health interventions, digital coaching platforms, or app-based behavioral prompts—without application of AI algorithms to wearable-derived physiological data—were excluded. This review emphasizes AI-powered models that utilize data streams from wearable sensors, such as CGMs or smartwatches, to support tasks like prediction, classification, detection, and clinical decision-making relevant to Type 2 Diabetes management. Inclusion criteria were: (1) Peer-reviewed articles published in English from January 2014 to September 2024, (2) Manuscripts involving pre-diabetes, Type 1 and Type 2 diabetes mellitus, and relevant applications in healthy individuals, artificial skin phantoms, or in silico models, (3) Use of wearable devices, including CGMs, smartwatches, fitness trackers, or health-tracking wearables, to capture physiological data relevant to diabetes management (e.g., glucose levels, heart rate, physical activity), (4) Application of AI techniques (e.g., machine learning, deep learning, predictive analytics) on wearable data, (5) Reporting of diabetes-related clinical or self-management outcomes (e.g., glycemic management, glycemic events, medication adherence, or prevention of complications), and (6) Inclusion of experimental (e.g., randomized controlled trials) and observational study designs (e.g., cohort, cross-sectional, case-control).

Exclusion criteria included: (1) Manuscripts not in English or published prior to January 2014, (2) Non-peer-reviewed literature (e.g., conference abstracts, opinion pieces, gray literature), (3) Manuscripts unrelated to diabetes management or lacking relevant outcomes., (4) Manuscripts without wearable devices or using stationary monitors, (5) Manuscripts not applying AI or machine learning to wearable data, and (6) Non-empirical studies (e.g., reviews, theoretical papers, meta-analyses).

Study selection

The study selection process is illustrated in a PRISMA flow diagram in Fig. 2, detailing the number of studies screened, retrieved, and included. A systematic search of PubMed, IEEE Xplore, ACM Digital Library, and Embase yielded a total of 5152 records. Following the removal of 214 duplicates and 527 ineligible records, 1,111 records underwent title and abstract screening. Of these, 1042 records were excluded based on the predefined criteria, leaving 69 reports for retrieval. Six manuscripts could not be retrieved due to inaccessible links or unavailable full texts from the publisher despite repeated effort and 3 were excluded because they did not use AI models. After the full-text review, 60 studies met the inclusion criteria and were included in the final analysis8,11,22,23,24,25,26,27,28,29,30,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106.

Fig. 2
Fig. 2
Full size image

Preferred reporting items for systematic reviews and meta-analyses chart.

The selection of studies followed a two-phase process. Initially, titles and abstracts of retrieved articles were screened independently by two reviewers using the predefined inclusion and exclusion criteria listed above. Articles that did not meet the eligibility criteria were excluded at this stage. Any discrepancies between reviewers were resolved through discussion, or, if necessary, by consulting a third reviewer. In the second phase, one reviewer conducted a detailed assessment of the full texts of articles that passed the initial screening. The inclusion and exclusion criteria were rigorously applied to ensure only relevant studies were included. A Fig. 2 provides a PRISMA flow diagram documenting the number of studies screened, assessed for eligibility, and included along with the specific reasons for exclusions.

Data extraction and data synthesis

Data extraction was conducted manually using Microsoft Excel, with a predefined template to ensure consistency. From each manuscript, we extracted 25 structured fields including study metadata (author, year, title, country), study objectives, AI/ML model types, model inputs and outputs, prediction horizon, population size, glycemic and metabolic profiles, demographics (age, sex, race/ethnicity), comorbidities, study setting and duration, CGM devices and other modalities used, data sources, dataset type, modality integration and impact, model performance metrics, and whether interpretability was addressed. One reviewer conducted the data extraction using a predefined template, with best practices including piloting the extraction process, revisiting source material for verification, and documenting all decisions to ensure accuracy and consistency.

Data from the included studies were synthesized using both qualitative and quantitative approaches, with a primary emphasis on narrative synthesis to accommodate the substantial variability in study designs, AI model architectures, and wearable device characteristics. Thematic grouping of studies facilitated the identification of trends and patterns without relying on meta-analysis, which was precluded by the heterogeneity in methodologies and populations. Quantitative results, including key performance metrics such as accuracy and sensitivity, were summarized descriptively to provide context to the narrative findings. Variations in AI models, such as neural networks versus regression-based approaches, and differences in wearable device functionalities were qualitatively explored to ensure a comprehensive understanding of study outcomes. This approach emphasized transparency and rigor in addressing and reporting the methodological and contextual diversity of the included studies.