Introduction

Glycogen storage disease (GSD) is a rare inherited metabolic disorder characterized by aberrant glycogen storage and mobilization1,2,3,4. The types of GSD range from Type I (Von Gierke Disease) to Type XIV, each manifesting differently and requiring a nuanced management approach. Among these, type I GSD exhibits the most severe symptoms. This condition results from a mutation in G6PC1, which plays a crucial role in glucose storage and degradation. It affects approximately 1 in 20,000 to 40,000 individuals due to its autosomal recessive inheritance. This mutation leads to the inability of the body to consume large amounts of glycogen, resulting in its accumulation in the liver and kidneys. This accumulation causes glucose intolerance, leading to several complications including hypoglycemia, elevated lactic acid levels, and secondary metabolic abnormalities5. Affected individuals may experience growth retardation, osteoporosis, and pulmonary hypertension, with hypokinesis as a potential complication. Moreover, chronic liver and kidney damage can impair their function6. Patients with GSD are prone to severe hypoglycemia because of their inability to produce glucose owing to enzyme deficiency. Although proper dietary intake, including cornstarch, has been used to prevent hypoglycemia, these measures alone have limitations in managing glucose levels. Complications, such as hepatic adenoma, hepatocellular carcinoma, kidney disease, lactic acidosis, and hypertriglyceridemia, which occur in patients with GSD, ultimately arise due to uncontrolled glucose levels. These complications are not inevitable but result from a lack of optimal metabolic control. Therefore, patients with GSD require precise diagnostic and treatment approaches because of the complexity of the disease.

GSD cannot be treated clinically, except for gene therapy, and there are difficulties in the development of effective treatments. However, GSD can be managed by controlling glucose levels. Traditionally, management involves dietary interventions, notably a cornstarch-based diet, to maintain appropriate glucose levels in patients with GSD7. Such dietary management ensures a consistent and stable glucose supply for those who are unable to store or adequately utilize glycogen. Moreover, this continuous dietary approach aids in controlling blood lactate levels, preventing organ damage and supporting normal growth and lifestyle8,9. Despite these efforts, the inherent nature of this genetic condition means that patients may still experience sudden hypoglycemic episodes, in which glucose levels drop sharply, even with diligent dietary management10. To mitigate the risk of hypoglycemia and ensure proper patient care, key clinical metrics, including patient height and weight, along with blood biomarkers such as glucose, cholesterol, and lactic acid levels, are regularly tracked and assessed. Nevertheless, the limited number of medical specialists relative to the number of patients, along with the costs of hospital and outpatient care, make continuous staff-dependent monitoring a challenging and labor-intensive task.

Checking and monitoring glucose levels is the easiest way to managing patients with GSD11,12. This test is often performed using the finger-prick glucose test, a method aimed at preventing hypoglycemia and its associated side effects13. Despite its widespread use, this approach faces challenges in the accurate detection of rapid fluctuations in glucose levels, which can be significantly influenced by the timing of blood sample collection.

The requirement for continuous blood glucose tracking has prompted the development of continuous glucose-monitoring (CGM) systems. Initially designed for individuals with diabetes, the convenience and ability to provide real-time glucose information have broadened its use in effectively managing glucose levels. Moreover, these systems have led to therapeutic advancements in the care of patients with GSD14. In addition, the introduction of CGM systems revolutionized the management process. The CGM system, a wearable device, continuously tracks glucose concentrations at set intervals and displays the data on a dedicated smartphone app or receiver15. This technology offers a detailed view of glucose fluctuations, enabling better management strategies beyond traditional dietary adjustments and symptomatic treatments.

With the evolution of CGM systems, there has been a surge in the development of algorithms for analyzing and predicting glucose levels to forecast and prevent issues before glucose levels increase or decrease16,17. Owing to its high prevalence, it has developed significantly, and the application of blood glucose prediction models has preceded. Statistically based models such as autoregressive (AR), autoregressive exogenous (ARX), and autoregressive moving average (ARMA) have been proposed18, and machine learning (ML)-based models such as support vector regression (SVR)19 and random forest (RF)20 have been utilized. However, these models often fall short because of their inability to consider a myriad of factors that influence glucose levels, such as lifestyle and physiological changes. To overcome these limitations, the application of deep learning (DL) algorithms is expected to integrate variables affecting blood glucose fluctuations and analyze a wide range of patterns in diabetes.

Despite advances in DL algorithms for diabetes management, research specifically targeting patients with GSD remains lacking. This study aimed to forecast glucose levels in patients with GSD using state-of-the-art (SOTA) DL algorithms in the time series forecasting field.

We assessed the predictive accuracy of the model using CGM in patients with GSD as part of their management strategy. Additionally, we trained the models while accounting for patient-specific characteristics to enhance the prevention of sudden hypoglycemia (Fig. 1). The contributions of this study are as follows: patients with GSD can effortlessly track and manage their glucose level through an app or online platform powered by algorithm. This approach promises personalized healthcare tailored to each patient’s glucose data and lifestyle, potentially diminishing the reliance on expensive medical interventions by averting complications.

To the best of our knowledge, this is the first study focusing on forecasting glucose levels in patients with GSD.

Fig. 1
figure 1

Schematic overview of the study. (a) Preprocessing of the data extracted from the CGM device. (b) Training process for personalized DL models. (c) Input/output and inference process for the blood sugar level prediction task. (d) Input/output and inference process for the hypoglycemia classification task. The data collected from the CGM device undergoes a series of preprocessing steps to ensure accuracy and consistency. Each patient’s data is then used to train a personalized DL model. For the prediction task, the input consists of historical blood glucose data, and the output is the predicted blood sugar level for a specific future time horizon. In the classification task, the output is the probability of hypoglycemia occurring within a specified future time frame. CGM continuous glucose monitoring, DL Deep Learning.

Materials and methods

Data collection

Blood glucose data were collected from patients with GSD who were hospitalized or managed at Yonsei University Wonju Severance Christian Hospital, Wonju, Korea, from August 2020 to February 2024 (IRB-CR324074). A total of 46 patients who wore the CGM system for at least 14 days were included in this study. The maximum duration of the CGM was 524 days (Fig. 2).

The Freestyle Libre (Abbott, Chicago, Illinois, USA) was used. The Freestyle Libre continuously measures glucose concentrations in the interstitial fluid via sensors attached to a patient’s skin. These sensors measure glucose concentration in the interstitial fluid beneath the skin, which closely mirrors the glucose concentration in the blood. The sensor estimates the glucose levels every 15 min based on these data. This device periodically collects glucose concentrations in the interstitial fluid data and provides real-time information to patients who can access the stored information using a reader or smartphone app. The collected data were automatically uploaded to the Libreview platform for analysis and shared with the medical staff. To the best of our knowledge, the dataset used in this study was collected from medical institutions that manage and treat the largest number of GSD patients in Korea.

Fig. 2
figure 2

Flow diagram of patient selection and dataset configuration. This diagram illustrates the process of patient selection and configuration of the dataset used in the study, including inclusion/exclusion process. CGM continuous glucose monitoring, DL deep learning.

Preprocessing of CGM data

Preprocessing of the CGM data was conducted in three steps: interpolation, normalization, and slicing. Filtering was not applied during preprocessing to avoid potential distortions in the CGM data pattern, which could complicate the medical interpretation. First, interpolation was performed to address instances in which patients replaced their CGM devices or missed glucose measurements owing to device errors. Referring to the preceding literature21, and sampling rate and reference specifications of the blood glucose meter used in the study, the interpolation period was set to 3 h. The referenced literature utilized a 1-h interval with a 5-min sampling rate. In contrast, our study employed a 15-min sampling rate, warranting a 3-h interval. Consequently, the missing values spanned more than 3 h, these sections were excluded from the slicing process and were not used for model training. For gaps of 3 h or less, quadratic interpolation was applied using the preceding and succeeding 10 data points to fill in missing values. Subsequently, robust scalar normalization was applied to minimize the impact of outliers based on the median and interquartile range of the CGM data. Finally, slicing was performed to create data-label pairs. The look-back window and forecast size for slicing were set to 48 and 4, respectively, based on previous studies and a grid search, allowing prediction of the future 1 h of glucose levels from the past 12 h of data. The collected data included only blood glucose measurement readings and real-time data, without any details on life events such as ingestion or exercise. Therefore, a long length of look-back window was considered to allow the model to capture the morphology and trends in blood glucose fluctuations as effectively as possible. Consequently, during the slicing process, the look-back window was set to 12 h, ensuring that all datasets (training, validation, and testing) included sufficient data for analysis.

Data-label pairs where glucose levels were ≥ 300 mg/dL or < 40 mg/dL were retrospectively reviewed and considered outliers due to potential inaccuracies, and thus, were excluded from the dataset. Furthermore, retrospective analysis revealed that external pressure artifacts, particularly during sleep or daily activities, could lead to artificially high or low readings, compromising measurement accuracy. The device used in this study had a lower detection limit of 40 mg/dL, but occasional readings below this threshold were recorded during the first day after CGM placement and toward the end of its lifespan. These readings were deemed erroneous and inconsistent with clinical expectations. Finally, each CGM dataset was sorted in chronological ascending order and then divided at a ratio of 6:2:2 to construct the dataset.

Deep learning models

For the deep-learning model, the latest three state-of-the-art deep-learning models in time-series forecasting were utilized (Fig. 1). First, we employed the PatchTST model proposed for 202222. This model was inspired by the application of transformer models in natural language processing (NLP) and computer vision. The PatchTST architecture processes time-series data in a manner similar to how images are processed by transformers. This effectively captures long-range dependencies through self-attention mechanisms. Second, an LTSF N-linear model was used23. This model emphasizes the development of time series forecasting using multiple linear layers. It features an architecture designed to capture various aspects of time-series data, while maintaining efficient computational complexity and scalability. Multiple linear layers alone have proven to be effective in handling high-dimensional time-series data, which can often be challenging for the other methods. Finally, a TS Mixer is employed24.

The TS Mixer effectively extracts information from time-series data by leveraging a mixing mechanism that integrates both temporal and feature dimensions. This architecture is particularly suitable for multivariate prediction and has demonstrated effectiveness across several benchmarks. The three models (PatchTST, LTSF N-Line, TS Mixer) achieved state-of-the-art performance in 2023, 2022, and 2023, respectively, in time series forecasting. These models were selected as they represented the most recent SOTA approaches available during the stages of research planning and experimentation. The hyperparameters for all the models were set to the default parameters proposed in their respective papers. Specifically, the parameters related to the input and output, such as the lookback window and forecast size, were set to be the same across all three models.

Forecasting of future CGM data

In this study, the patient’s glucose concentrations in the interstitial fluid levels over the future 1 h were predicted using a model (Fig. 1c). Predictions were based on the past blood glucose values of patients with GSD using an independent model for each dataset. The input to the model included both glucose levels and timestamps recorded by the CGM device. Time was segmented into years, months, days, and minutes and encoded using sine and cosine functions to capture the cyclical nature of time over a 24-h period. There were differences in the measurement periods among the patients participating in this study, and there were cases in which the year changed during the measurement. To account for this, all segmented time variables were incorporated as inputs. DL models receive 7-channel multivariate time series data, and because the models used in the study were designed for multivariate prediction, the 7-channel future data were predicted. Optimization was performed by selecting the mean squared error (MSE) loss as the optimization function to minimize the discrepancies between the predicted and actual values of the model. The loss for each channel was calculated and the average loss across all channels was determined. To predict the glucose levels, we defined the optimal model as the one with the lowest mean absolute percentage error (MAPE) between the predicted and actual blood glucose channel values. The model with the lowest MAPE is saved as the optimal model.

Classification of future hypoglycemic event

In this study, a binary classification was conducted to determine the feasibility of predicting hypoglycemia in addition to direct glucose levels (Fig. 1d). Hypoglycemia was defined as a blood glucose level of < 80 mg/dL. Readings below this threshold were labeled as 1, indicating hypoglycemia. The 80 mg/dL threshold was chosen to facilitate the effective use of a cornstarch-based diet, enabling timely interventions to stabilize glucose levels and prevent further declines. This approach ensures a steady glucose release, which is essential for maintaining metabolic stability in patients with GSD. For the model input, 12 h of data was utilized, similar to the forecasting task. In terms of the model structure, the window size was set to 1, and a linear layer was added to process the 7-channel outputs for binary classification. The output node was set to one, with a sigmoid activation function. Optimization was performed using binary cross-entropy loss to minimize the discrepancy between the actual hypoglycemia occurrences and model predictions. In particular, the number of actual hypoglycemic events was significantly lower than the number of normal events, resulting in an imbalance between the two classes. To address this imbalance, class weights are applied to the loss function during the learning process. Finally, the model with the highest area under the receiver operating characteristic (AUROC) curve was selected as the optimal model for predicting hypoglycemia.

Performance metrics

In the forecasting task, the primary outcome of the DL model is the direct prediction of glucose levels. The MAPE was selected as the main indicator because of its interpretability, adaptability to changes in data value sizes, and suitability for comparing the performance of various models. The MAPE provides a probability value between 0 and 100%. The mean absolute error and root mean square error were used as supplementary statistical indicators. For the forecasting task, each model generated predictions for 4-h intervals (15, 30, 45, and 60 min). Therefore, statistical indicators were calculated by comparing the predicted values with the actual values at each time point.

In the classification task, the primary outcome was the model’s predictive power for hypoglycemic events. In binary classification, the AUROC was used as a major indicator and criterion for determining the optimal state of the model. The model’s prediction results were classified into four categories: true positive (TP), false negative (FN), true negative (TN), and false positive (FP), compared to the actual answer. Based on these results, additional statistical indicators, including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score were used. The optimal cutoff to minimize model misclassification was calculated based on the Youden index (J) within the ROC curve. In the classification task, each model outputs a result at a single time point. Therefore, independent models were developed to predict hypoglycemic events at each future time point. In summary, the models predicting hypoglycemic events after 15, 30, 45, and 60 min were independently trained.

Independent models were trained for each patient to build a personalized predictive model. This approach is expected to effectively capture each patient’s unique blood glucose level variability and enable more accurate predictions. We calculated the statistical metrics for each task by synthesizing the prediction results at each time point. All statistical metrics were reported as point estimates and 95% confidence intervals (CIs). The data were analyzed and visualized using Python 3.9.5 (Python Software Foundation).

Results

Participant characteristics

Table 1 summarizes the demographic and clinical characteristics of the study group. A total of 46 subjects wore CGM devices for blood glucose measurements, with only those who wore the device for at least 14 days included in the study (Fig. 2). The study group comprised 26 male subjects with an average age of 13.27 years, and 20 female subjects with an average age of 14.85 years. The average duration for the male group was 151 days, while that for the female group was 171 days (Table 1). All participating patients were of Asian descent.

Table 1 Dataset characteristics.

Glucose level forecasting result

The results of each model for forecasting glucose levels are summarized in Fig. 3; Table 2. Additionally, the results of the clinical error grid analysis are summarized in Fig S1. To enable a clearer comparison of classification performance across models, the results derived from Equal Error Rates are presented in Table S1. Among the three models, the LTSF N-Linear demonstrated the highest performance. It exhibited low MAPE across all prediction horizons, with a particularly low MAPE of 5.95% (95% confidence interval (CI), 5.64–6.26%) for the 15-min prediction. For the 60-min prediction, which had the longest temporal length, the MAPE was 11.14% (95% CI, 10.39–11.89%), indicating the lowest error at this horizon. In contrast, the TS Mixer exhibited the lowest performance. It exhibited high MAPE values across all prediction horizons, with errors of 16.14% (95% CI, 13.72–18.57%), 16.50% (95% CI, 14.03–18.97%), 16.57% (95% CI, 13.90–19.23%), and 16.77% (95% CI, 13.59–19.95%) for each time point. The Patch TST model demonstrated a relatively moderate performance compared with the other two models.

Fig. 3
figure 3

Model performance for the forecasting task. (a) Linear correlation for the 15-min prediction horizon. (b) Linear correlation for the 30-min prediction horizon. (c) Linear correlation for the 45-min prediction horizon. (d) Linear correlation for the 60-min prediction horizon. The horizontal axis represents the patient’s actual glucose levels, while the vertical axis represents the model’s predicted values. The LTSF N-Linear model demonstrated a relatively high Pearson R across all prediction intervals. Pearson R Pearson correlation coefficient, CI confidence interval.

Table 2 Performance metrics for forecasting task with 95% CI.

Hypoglycemia classification result

The results of each model for hypoglycemia prediction are summarized in Fig. 4; Table 3. In the classification task, the TS Mixer model demonstrated relatively high performance among the three models. Notably, the prediction of hypoglycemia 15 min ahead achieved an AUROC of 0.866 (95% CI, 0.829–0.904) with an average sensitivity of 72.27% (95% CI, 63.97–80.56%). Although the predictive power decreased as the temporal horizon was extended, the model still performed reasonably well, with an AUROC of 0.672 (95% CI, 0.635–0.708) for predictions made 60 min in advance. Conversely, the LTSF N-linear model exhibits poor predictive power for classification tasks. Across all prediction horizons, the AUROC values were 0.600 (95% CI, 0.556–0.644), 0.570 (95% CI, 0.539–0.602), 0.562 (95% CI, 0.533–0.591), and 0.566 (95% CI, 0.532–0.587). Again, the Patch TST model exhibited a relatively moderate performance.

Fig. 4
figure 4

Model performance for the classification task. (a) AUROC for hypoglycemia prediction at the 15-min prediction horizon. (b) AUROC for hypoglycemia prediction at the 30-min prediction horizon. (c) AUROC for hypoglycemia prediction at the 45-min prediction horizon. (d) AUROC for hypoglycemia prediction at the 60-min prediction horizon. The thin lines represent the results for individual patients, whereas the thick lines represent the mean. The shaded area on either side of the thick lines indicates the 95% CI. TS Mixer achieved a relatively high AUROC across all prediction horizons. AUROC area under receiver operating characteristics curve, CI confidence interval, ERR equal error rate.

Table 3 Performance metrics for classification task with 95% CI.

Discussion

Traditionally, patients with GSD have had to be hospitalized or rely on fingerprick tests at home to monitor and manage their glucose levels. However, these methods have limitations because they do not provide a comprehensive view of glucose levels throughout the day. Although the CGM was originally designed for patients with diabetes with blood glucose issues, it is also considered applicable to patients with GSD. Derks’s research on 15 patients with GSD demonstrated that an in-depth analysis using CGM data can effectively evaluate glucose management25. Additionally, a study in 2022 involving 10 adult patients with GSD showed that CGM may be beneficial26. Another study demonstrated the ability of an artificial neural network to predict glucose levels in patients with type 1 diabetes using an artificial neural network27,28.

Therefore, we hypothesized that metabolic predictions could be made based on the results of previous studies and data from patients with GSD accumulated at the Yonsei University Wonju Severance Christian Hospital. Our objective was to validate and analyze this hypothesis using the results generated by the DL models. This study yielded meaningful outcomes, confirming that by analyzing the CGM data from patients with GSD, it is possible to manage and predict their glucose levels. This finding supports the feasibility of precision medicine tailored to each individual, highlighting the potential for customized care that addresses the unique metabolic needs of each patient.

From a big data-based perspective, the data is typically divided by the number of individuals, followed by learning and testing. However, to predict the aspects related to human metabolism, we considered individualized data learning to be more appropriate. Therefore, we adopted an approach in which an independent model was trained for each study subject and quantification indicators were subsequently calculated. Given that the number of models to be tested increases the time and computational costs exponentially, we focused our efforts on deriving research results using the latest models that have achieved SOTA performance in the field of time-series forecasting.

We conducted both forecasting and classification tasks to assess the performance of the model in predicting and managing immediate blood glucose levels (Fig. 1). In the forecasting task, the LTSF N-Linear model demonstrated relatively strong forecasting ability. For blood glucose level prediction 15 min ahead, the Pearson’s R correlation coefficient was 0.887 (95% CI, 0.886–0.888), indicating a high predictive accuracy (Fig. 3). Even at the 30- and 45-min prediction horizons, the Pearson’s R values remained relatively high than those of the compared models, with values of 0.721 (95% CI, 0.719–0.724) and 0.617 (95% CI, 0.614–0.621), respectively. However, at the 60-min prediction horizon, the predictive power significantly decreased, with a Pearson’s R of 0.561 (95% CI, 0.557–0.565). The LTSF N-Linear focuses solely on the temporal relationships between data points in a linear manner. It performs well when complex modeling of the data structure is not required, owing to the simplicity of its architecture. The participants in this study continuously managed their diet to maintain stable blood sugar levels. Since changes in metabolic blood sugar primarily occur through food intake29,30, it is likely that the LTSF N-Linear benefited structurally from this process. The blood sugar data in this study followed a relatively linear trend over short prediction periods with few complex nonlinear relationships. As a result, LTSF N-Linear delivered excellent short-term prediction results, as demonstrated in this study; however, its performance for long-term predictions indicates areas for improvement. However, only the variables directly related to blood sugar levels and time were used in this study. As the input data were already time-series data, temporal information may have been inherently included, making it difficult for the model to capture additional dependencies, particularly those related to metabolism. Furthermore, the relatively complex architecture of the model may have led to overfitting, as it may have been more sensitive to noise than to learning meaningful patterns. Patch TST, which employs a patch-based approach, was designed to capture both short- and long-term trends. Although its performance was slightly lower than that of the LTSF M Linear, the difference between the two models was not significant. The patch-based TST produced reasonably predictable results, demonstrating its capability to handle tasks effectively.

In the classification task, the TS Mixer demonstrated a strong classification performance. Notably, an AUROC of 0.866 (95% CI, 0.829–0.904) for the 15-min prediction horizon (Fig. 4). Although the predictive power decreased for the 45- and 60-min prediction horizons, it still achieved a specificity of 74.09% (95% CI, 66.97–81.22%) and 75.42% (95% CI, 69.75–81.1%) for these horizons, respectively. Additionally, the model maintained a NPV of 91.21% (95% CI, 86.52–95.90%) and 92.72% (95% CI, ` 90.72–94.71%), even at longer time intervals (Table 3). Conversely, the LTSF N-Linear model showed little classification ability, with AUROC values close to 0.5. Specifically, it recorded an AUROC of 0.556 (95% CI, 0.600–0.644) for the near-term prediction horizon. For the TS Mixer, the model constructs a predictive decision boundary for binary classification based on nonlinear patterns. Conversely, LTSF N-Linear, which operates on a simple linear basis, is well suited for trend prediction but struggles in classification tasks where nonlinear patterns are crucial. In this study, even after adding an additional linear layer, the performance deteriorated, with AUROC values close to 0.5. This outcome reflects the challenges that linear classifiers face in achieving good performance in binary classification31,32. The patch TST sits between the relatively simple LTSF N-Linear and the more complex TS Mixer, and its quantitative performance indicators fall in the middle range. While Patch TST is capable of capturing dependencies and nonlinear patterns in waveforms, it is likely to be less effective at considering global patterns compared with the TS Mixer, as it relies on patch-based processing. Additionally, when analyzing the overall metrics, certain models demonstrated effective performance in hypoglycemia classification. However, a slight low PPV was observed in some cases. This reduction could result in inaccurate predictions, leading to false alarms or unnecessary treatments or interventions, highlighting the need for further refinement and improvement.

This study has some limitations. First, there is a lack of input data for predicting changes in glucose levels; human metabolism is highly complex, and incorporating additional information, such as meal timing, physical activity, and other events, could improve the model’s predictive accuracy. For example, in the OhioT1DM dataset33, a well-known dataset for blood glucose prediction, biometric information, such as insulin dosage and heart rate, and behavioral factors, such as meal intake, sleep, and physical activity, were systematically monitored. In contrast, this study relied solely on the blood glucose data obtained from the CGM device and time annotations. Although the patients in this study had already maintained optimal metabolic control, the data were retrospective, and it was difficult to obtain additional inputs owing to the limitations of the CGM device. As a result, information such as biometric data or behavioral factors could not be incorporated into this study. Because34,35, the absence of this information is considered one of the primary reasons for the poor predictive power of the model over longer prediction horizons in this study. In summary, several studies on glucose levels prediction have achieved acceptable or expected results using blood glucose data alone. However, given the characteristics of GSD patients, who frequently require dietary control, it is expected that incorporating additional information could help compensate for the model’s limited predictive capability. Second, this study was conducted using retrospective data. Future studies should incorporate prospective data to verify the applicability of the model to real-world GSD cases. If future studies can predict glucose levels using CGM data from patients who are not yet in optimal metabolic control, this could lead to a groundbreaking management method. Third, a relatively high sampling rate was used in this study, and the CGM device was set to measure blood glucose every 15 min in real time. In general, a shorter sampling rate21,27,36. Furthermore, this study aimed to explore the potential of DL models to help patients with GSD efficiently manage their glucose levels in daily life and proactively prevent possible hypoglycemic symptoms. Therefore, the sampling rate used in this study is considered a factor that directly influences the model’s predictive power. In future research, it will be important to adjust the sampling rate, considering both model performance and real-life applicability, and to evaluate whether the adjusted rate enhances efficiency37,38. Fourth, the general applicability and interpretability of error grid analysis are limited. Various analysis methods, such as the clinical error grid methodology, have been proposed to validate39,40. In contrast, this study was conducted in a group of patients with GSD, a condition with an extremely low prevalence. The type 2 diabetes is typically associated with elevated blood sugar levels, while hypoglycemia is a major concern in type 1 diabetes. In contrast, GSD is characterized by low blood sugar levels caused by genetic mutations affecting glucose storage and release due to genetic mutations. Consequently, the blood sugar range in patients with GSD differs significantly from that in patients41, making it difficult to apply the analytical frameworks designed for diabetes to GSD. When performing clinical error grid methodology on the model used in this study, it was observed that most characteristics of GSD patients fell within Region A (Fig. S1). However, the sizes of Regions B and D, which are critical for identifying in-application treatment failures or instances of hypoglycemia, differed from the average blood glucose range of GSD patients. While some quantification results demonstrated excellent accuracy, the overall persuasiveness of the methodology is limited by the constraints. These challenges limit the applicability of previously proposed glycemic analysis methodologies for GSD and consequently impose restrictions on the analysis of model outcomes in this study. Fifth, further research on interpretability is needed to understand how patient-specific factors influence the model’s performance. To investigate whether pattern-specific factors, such as gender, affect glucose level fluctuations and subsequently impact model outcomes, we analyzed test results by gender (Tables 4 and 5). However, no significant differences were observed between the two groups across any task. One potential explanation for these findings is that statistical analysis of Table 1 revealed no significant differences between the two groups. This trend is likely reflected in the model results. Additionally, due to the limitations of the retrospective dataset, the collected data lacks explanatory power, as it only includes glucose levels and does not provide additional relevant information. Consequently, these constraints make it challenging to analyze patient-specific factors and limit the examination of other variables beyond gender. In the future, follow-up studies should not only refine the research design but also enhance model interpretability and incorporate additional analyses using Explainable AI methods. Finally, this study requires external validation. Although many DL models produce promising results based on specific datasets, it is crucial to verify whether they generalize well to data from other42. This study was no exception. Although an independent model was developed for each individual to achieve personalized optimization, all data used for learning were obtained from a single hospital. In particular, owing to the rarity of GSD, there is a bias in the age range of the participants, and the study population was limited to Asian populations. The data used in this study were primarily collected from medical institutions in Korea that diagnose and treat the largest number of GSD patients. In contrast, other medical institutions rarely manage patients with rare diseases such as GSD, making it relatively difficult to collect external datasets. This limitation may reduce reliability due to the small number of contributors and raises. Additionally, there is a potential bias related to the CGM device used, particularly with respect to the sampling rate. Therefore, external validation is necessary to address these biases and verify the model’s performance across a more diverse set of subjects.

Table 4 Performance of the M linear by gender for forecasting task with 95% CI.
Table 5 Performance of the TS mixer by gender for classification task with 95% CI.

Despite these limitations, this study is significant, as it is the first to demonstrate the ability to forecast glucose levels using CGM and DL in patients with GSD. GSD is a genetic disorder for which no definitive treatment is currently available. Even if gene therapy becomes available in the future, its cost may be prohibitive for many patients. However, the use of CGM, which is relatively affordable and easily accessible, combined with DL-based personalized blood glucose management, can offer an accessible method for managing this condition. If this method is further refined, hypoglycemia can be predicted in advance and alerts can be provided, enabling better management of cornstarch intake, diet, and exercise based on blood glucose predictions. Although patients must continue to consume cornstarch and follow an appropriate diet, this approach is highly beneficial from a cost perspective. In this study, predictions were conducted using a single variable, glucose levels. While some DL models demonstrated promising results depending on the task, it is crucial to acknowledge that fluctuations in glucose levels can be highly sensitive to patient-specific factors such as fat metabolism, physical activity, and cornstarch intake. Follow-up studies are planned to address these variables. Additionally, the consideration of interpretability is being planned by leveraging explainable machine learning techniques. This approach is expected to provide clinicians with intuitive and actionable insights through the development of visualization dashboards.