Introduction

Myopia has emerged as a critical global public health issue, with particularly high prevalence rates in Asia1,2. In China, an estimated 80% of high school graduates are affected by myopia, of whom 10–20% suffer from high myopia (defined as ≤−6.0 diopters)3. High myopia carries significant risks of vision-threatening complications including retinal detachment and macular degeneration, which can lead to irreversible vision loss and diminished quality of life2,4. These concerns underscore the importance of early prediction and timely intervention for myopia progression.

Current clinical strategies for myopia control include 0.01% low-concentration atropine eye drops (Atropine), orthokeratology lenses (Ortho-K), peripheral defocus spectacles (PDS), and repeated low-level red light (RLRL) therapy. However, these interventions face several limitations including high costs, prolonged treatment durations, and potential adverse effects. Atropine may cause photophobia, transient near-vision impairment, or allergic reactions2,5, while Ortho-K increase risks of corneal epithelial injury and infection6,7. PDS involve substantial fitting costs that may limit accessibility8, and RLRL therapy raises concerns about potential retinal phototoxicity with long-term use9,10. Artificial intelligence (AI) models offer a promising solution by enabling precise prediction of myopia progression and individualized treatment efficacy assessments, thereby facilitating optimized early interventions for high-risk pediatric patients. This approach could significantly improve healthcare resource allocation, reduce unnecessary costs and risks, and enhance clinical outcomes.

While previous studies have developed AI models to predict future spherical equivalent (SE) from historical refraction data11,12,13,14,15, and randomized controlled trials (RCTs) have demonstrated the benefits of various myopia control interventions16,17,18,19,20,21,22, two critical research gaps remain. First, there is a need for accurate annual progression predictions spanning the typical 10-year myopia progression period from ages 8 to 18, particularly accurate predictions for long-term axial length (AL) growth changes. Second, the field lacks individualized, quantitative predictions for treatment benefits.

Transformer architectures, with their ability to capture long-range temporal dependencies in sequential data23, provide a powerful framework for modeling long-term myopia progression. To address the two limitations mentioned above, we developed a Transformer-based Myopia Progression Predictive Model (MPPM) with two modules: the Natural Progression Module (NPM), which predicts the untreated course of myopia progression, and the Intervention Progression Module (IPM), which forecasts myopia progression under specific interventions. The NPM was trained on a large-scale longitudinal cohort of children with myopia, with follow-up durations exceeding 10 years. Participants in this cohort had not received any myopia control interventions other than single-vision spectacles, allowing the model to capture the natural course of refractive development. The IPM was trained using data from four real-world myopia intervention cohorts, in which participants received Atropine, Ortho-K, PDS, or RLRL, respectively. The IPM was based on the NPM, but because numerous confounding factors may bias the estimation of treatment benefits24,25, we incorporated a gradient reversal layer and adversarial training mechanism into the IPM, thereby establishing a Transformer-based causal machine learning framework26,27,28. This design enables the IPM to generate accurate individualized treatment effect (ITE) predictions for different myopia control interventions29.

Results

Overview of the study design

An overview of the study design is illustrated in Fig. 1. In this study, we included longitudinal data of children and adolescents with myopia from the Eye Hospital, Wenzhou Medical University (WMU). In the WMU dataset, participants were categorized into five cohorts based on whether they received myopia control interventions and, if so, the type of intervention: a non-intervention cohort, an Atropine cohort, a PDS cohort, an Ortho-K cohort, and an RLRL cohort. We also included longitudinal data of children and adolescents with myopia from Dazhou Central Hospital (DCH) and data from an Investigator Initiated Trial (IIT) of RLRL therapy (ChiCTR2200066365). The specific myopia correction and/or control interventions received by participants in different cohorts are detailed in the Datasets and subjects section.

Fig. 1: Overview of the study design.
Fig. 1: Overview of the study design.
Full size image

This study utilized three datasets: (1) a pediatric myopia dataset from the Eye Hospital, Wenzhou Medical University (WMU), (2) a pediatric myopia dataset from Dazhou Central Hospital (DCH), and (3) data from an Investigator-Initiated Trial (IIT) of repeated low-intensity red light therapy (RLRL). In the WMU dataset, participants were categorized into five cohorts based on whether they received myopia control interventions and, if so, the type of intervention: a non-intervention cohort, a 0.01% atropine eye drop (Atropine) cohort, an orthokeratology (Ortho-K) cohort, a peripheral defocus spectacle (PDS) cohort, and an RLRL cohort. Data from the non-intervention cohort were used to train and internally test the Natural Progression Module (NPM), whereas data from the intervention cohorts were used to train and internally test the Intervention Progression Module (IPM). Together, the NPM and IPM constituted the Myopia Progression Prediction Model (MPPM). For the external test, the DCH dataset and the control arm of the RLRL IIT were used to evaluate the NPM, while the intervention arm of the RLRL IIT was used to evaluate the IPM. In the WMU and DCH datasets, spherical equivalent (SE) was recorded at every visit, whereas axial length (AL) was not always measured, resulting in missing AL values. To address this, we applied a machine-learning-based imputation strategy to reconstruct missing AL records.

For all participants, demographic information (sex and age at each visit) and ocular measurements of both eyes, including SE and AL, were collected. As subjective refraction was performed at every visit but AL was measured less frequently, the WMU dataset and DCH dataset contained more SE than AL records. Missing AL values were reconstructed using machine-learning-based imputation methods.

We developed the MPPM, which consists of two modules: the NPM, designed to predict myopia progression in the absence of interventions, and the IPM, designed to forecast progression under different myopia control strategies (Figs. 1 and 2). The NPM, based on a Transformer architecture, used sex, age, and prior SE and AL measurements to predict future SE and AL values over a 10-year horizon. It was trained and internally validated (8:2 split) using the WMU non-intervention cohort and externally validated on the DCH dataset and the RLRL IIT dataset’s control group. The IPM was derived from the NPM by introducing a gradient reversal layer and adversarial training mechanism to mitigate confounding and to estimate the causal effects of interventions on clinical outcomes. This allowed the model to predict individual myopia trajectories under different intervention strategies. The IPM was trained and internally validated (8:2 split) using the WMU intervention cohorts and externally validated with the RLRL IIT dataset’s intervention group. It is noteworthy that imputed AL values were used only for model training, whereas model validation were performed exclusively on observed AL values.

Fig. 2: Architecture of the Myopia Progression Prediction Model (MPPM).
Fig. 2: Architecture of the Myopia Progression Prediction Model (MPPM).
Full size image

The model consists of a Natural Progression Module (NPM), which is based on a Transformer architecture and predicts the untreated course of myopia progression, and an Intervention Progression Module (IPM), which extends the NPM with adversarial deconfounding to enable causal inference and individualized treatment effect (ITE) estimation for myopia control interventions.

Participant characteristics and follow-up details

Table 1 summarizes participant characteristics and follow-up details for both WMU dataset and DCH dataset. The WMU dataset included 304,353 individuals contributing 1,109,827 subjective refraction records, with 81,142 individuals providing 276,298 AL measurements. The DCH dataset contained 60,533 participants with 141,498 SE records, including 12,846 participants who contributed 30,134 AL measurements.

Table 1 Participant characteristics and follow-up details for the WMU and DCH datasets

Notably, in the WMU dataset, 4848 participants had follow-up durations exceeding 10 years. No statistically significant differences were observed in the annual progression rates of SE and AL among the groups stratified by follow-up duration (≤3 years, 3–5 years, 5–10 years, and >10 years), as determined by one-way analyses of variance (for SE, p = 0.213; for AL, p = 0.339; see Supplementary Tables 1 and 2). These results suggest no substantial heterogeneity between participants with long- and short-term follow-up.

Table 2 summarizes the baseline characteristics and follow-up profiles of the non-intervention cohort and the four myopia intervention cohorts in the WMU dataset.

Table 2 Participant characteristics and follow-up details for the 5 cohorts in WMU datasets

AL data imputation

Given axial elongation’s fundamental role in pediatric myopia progression30, incorporating historical AL measurements into prediction models carries significant clinical importance. However, although subjective refraction data were available for all clinical visits, AL measurements were missing for a substantial proportion (~75%) of visits (Table 1). To maximize data utility, we implemented an XGBoost regression model with age, sex, and SE of both eyes as input features for AL imputation. The model was trained and validated using 304,353 concurrent subjective refraction and AL measurements from the WMU cohorts. Model evaluation showed strong concordance between predicted and measured AL values, with Pearson correlation coefficients (PCC) of 0.848 ± 0.003 (mean ± standard deviation) for right eyes and 0.849 ± 0.002 for left eyes, coefficients of determination (R²) of either eye were 0.719 ± 0.005 and 0.722 ± 0.004, respectively, and mean absolute errors (MAE) were 0.495 ± 0.002 mm and 0.494 ± 0.002 mm, respectively (Fig. 3). These results demonstrated high accuracy and reliability of the machine-learning-based AL imputation.

Fig. 3: Machine learning–based imputation of axial length.
Fig. 3: Machine learning–based imputation of axial length.
Full size image

a, b Scatter plots comparing observed AL values with imputed AL values for right eyes (a) and left eyes (b), respectively. Model evaluation showed strong concordance between predicted and actual AL values, with PCC of 0.848 for right eyes and 0.849 for left eyes, R² of 0.719 and 0.722, and MAE of 0.495 mm and 0.494 mm, respectively. These results demonstrate the high accuracy and reliability of the machine-learning-based AL imputation. AL axial length, od right eyes. os left eyes, PCC Pearson correlation coefficients. R² coefficients of determination, MAE mean absolute errors.

The model was subsequently applied to impute missing AL values from measured subjective refraction data across the WMU dataset, thus establishing a comprehensive longitudinally matched dataset with complete SE-AL pairs (WMU paired dataset) for the MPPM contruction. It should be noted that both the observed and imputed AL data were used for model training, but the model validation were performed only on observed AL values to ensure methodological rigor.

Myopia Progression Predictive Model (MPPM) architecture

The architecture of MPPM is illustrated in Fig. 2. MPPM includes two modules: Natural Progression Module (NPM) and Intervention Progression Module (IPM).

The NPM was trained on the WMU non-intervention cohort. It was based on Transformer architecture and comprised five key components: input layer, feature embedding module, temporal sequence encoder, multi-task prediction head, and output layer. The model processes longitudinal visit records containing categorical (e.g., sex, intervention), continuous (age, SE and AL for both eyes), and temporal (inter-visit intervals) features. Notably, the feature embedding module separately processes categorical and continuous features into dense vector representations, incorporating positional encoding to maintain temporal ordering and multi-head attention to capture feature dependencies. These embedded vectors feed into the temporal sequence encoder, which uses masked multi-head attention with additional positional encoding to model temporal dependencies. The temporal sequence encoder output passes to the multi-task prediction head, which bifurcates into SE and AL prediction branches sharing common underlying features to enhance generalizability. The SE head predicts current time-step SE values for both eyes, while the AL head performs analogous AL predictions, with final outputs generated through the output layer (Fig. 2).

Numerous confounding factors may influence the evaluation of the relationship between myopia control interventions and clinical benefits. Many myopia control studies adopt broad inclusion criteria (e.g., enrolling all children with myopia) without stratifying participants by disease characteristics such as baseline myopia severity. This increases outcome heterogeneity and hampers precise quantification of treatment benefits. For instance, when assessing orthokeratology or low-dose atropine, failure to account for baseline refractive error may dilute the estimated treatment effect24,25. Therefore, building upon the NPM, we incorporated a gradient reversal layer and adversarial training to construct a Transformer-based causal inference framework, referred to as the Intervention Progression Module (IPM). The IPM mitigates the influence of confounding factors and enables accurate estimation of the causal effects (individualized treatment effect, ITE) of myopia control interventions on changes in SE and AL progression. The formula for IPM to predict future SE and AL values is: The future value under natural growth predicted by NPM – The growth reduction due to control interventions predicted by IPM (i.e., ITE) = The future value under control interventions predicted by IPM.

The performance of the Natural Progression Module (NPM)

We evaluated the NPM through assessment of SE and AL prediction performance using R² and MAE. The evaluation specifically measured the proportion of predictions achieving clinically acceptable thresholds: absolute error <0.75 diopters for SE (P[AE < 0.75D]) and <0.25 mm for AL (P[AE < 0.25 mm])13,14,31. As a result, the NPM demonstrated strong predictive accuracy for future SE and AL values over a 10-year period across both internal and external test dataset. In detail, in the internal test set, its prediction of SE showed an R² of 0.94, with an MAE of 0.35D, and P[AE < 0.75D] was 0.91. For AL, its R² was 0.91, with an MAE of 0.16 mm, and P[AE < 0.25 mm] was 0.84. In the external test set, NPM’s prediction of SE yielded an R² of 0.94, an MAE of 0.40D, and P[AE < 0.75D] = 0.86. The prediction for AL had an R² of 0.94, an MAE of 0.19 mm, and P[AE < 0.25 mm] = 0.72. (Table 3 and Fig. 4).

Fig. 4: Performance of the Natural Progression Module (NPM) on the internal test set and external test set.
Fig. 4: Performance of the Natural Progression Module (NPM) on the internal test set and external test set.
Full size image

ad internal test set. eh external test set. a, b Scatter plots showing the comparisons between observed and predicted SE values in the internal test set for right eyes (a) and left eyes (b), respectively. c, d Scatter plots showing the comparisons between observed and predicted AL values in the internal test set for right eyes (c) and left eyes (d), respectively. e, f Scatter plots comparing observed and predicted SE values in the external test set for right eyes (e) and left eyes (f). g, h Scatter plots comparing observed and predicted AL values in the external test set for right eyes (g) and left eyes (h). NPM demonstrated high accuracy in predicting SE and AL for both the right and left eyes in both the internal validation set and the external test set, with all R² values exceeding 0.9. For SE prediction, over 90% of the predictions had an absolute error <0.75D, which is considered a clinically acceptable error. For AL prediction, >71% of the predictions had an absolute error below 0.25 mm, which represents a very small margin of error. SE spherical equivalent, AL axial length. od right eyes, os left eyes, R² coefficient of determination, MAE mean absolute error. P(AE < 0.75D): proportion of predictions with absolute error less than 0.75 diopters. P(AE < 0.25 mm): proportion of predictions with absolute error <0.25 mm.

Table 3 Overall performance of Natural Progression Module (NPM)

To further evaluate the model’s predictive performance, Bland–Altman analyses and calibration curve assessments were conducted on the internal test set. In the Bland–Altman analyses, for SE prediction, the mean difference was 0.01 D with limits of agreement within ±0.80 D; for AL prediction, the mean difference was 0.03 mm with limits of agreement within ±0.50 mm (Supplementary Table 3, Supplementary Fig. 1). For the calibration curve analyses, continuous predictions were first divided into equally sized bins based on the predicted values (i.e., deciles). For each bin, the mean predicted value was plotted against the corresponding mean observed value in the test dataset. A locally weighted scatterplot smoothing (LOESS) curve was fitted to visualize the relationship, and the identity line (y = x) served as the reference representing perfect calibration. The resulting calibration curves demonstrated near-perfect agreement between predicted and observed values for both SE and AL in both eyes (Supplementary Fig. 2). Collectively, these findings indicate that the predicted SE and AL values closely correspond to the actual clinical measurements.

Besides, we conducted subgroup analyses on the internal test set. Based on baseline myopia severity, participants were categorized into three subgroups: mild myopia (SE ≥ −3.0D), moderate myopia (−6.0D ≤ SE < −3.0D), and high myopia (SE < −6.0D). According to baseline age, participants were divided into two subgroups: children (3 ≤ age ≤ 10) and adolescents (10 < age ≤ 18). Subgroup analyses were also performed by sex (male and female). The results showed that, across all subgroups, the prediction of SE achieved R2 ≥ 0.86 (slightly lower in the mild myopia subgroup, R2 = 0.78), MAE < 0.4D, and P[AE < 0.75D] > 0.90 (slightly lower in the high myopia subgroup, P = 0.87); across all subgroups, the prediction of SE achieved R2 ≥ 0.86 (slightly lower in the mild myopia subgroup, R2 = 0.78), MAE ≤ 0.2 mm, and P[AE < 0.25 mm] ≥ 0.73. (Supplementary Tables 46, Supplementary Fig. 38). These findings confirm the predictive accuracy of the NPM in forecasting future myopia progression across different demographic and clinical subgroups.

Furthemore, since the model’s time-series design leveraged all available prior visit records for each individual to predict future myopia progression, we evaluated the model performance of predicting (n + 1)th visit values using n prior visits and observed improved predicting accuracy with increasing numbers of prior visits (Fig. 5), which indicated the importance of prior visits as input data in predicting accuracy. Moreover, we defined the prediction horizon as the duration (in years) between the nth and (n + 1)th visits. Analysis revealed an inverse relationship between prediction horizon length and accuracy (Fig. 6). Collectively, these results demonstrated that model accuracy would be increased with more prior visits but decreased moderately with longer prediction horizons (Fig. 7).

Fig. 5: Impact of the number of prior follow-up visits on model prediction accuracy.
Fig. 5: Impact of the number of prior follow-up visits on model prediction accuracy.
Full size image

ac Line plots showing how evaluation metrics for SE prediction in the right eyes vary with the number of prior visits. df Line plots illustrating the trends of SE prediction metrics for the left eyes as the number of prior visits changes. gi Line plots depicting the variation of AL prediction metrics for the right eyes with increasing prior visit counts. jl Line plots demonstrating how AL prediction metrics for the left eyes change with the number of prior visits. The trend suggested that the model’s predictive accuracy improved with an increasing number of prior follow-up visits, as indicated by increasing R², P(AE < 0.75D) and P(AE < 0.25 mm) values approaching 1, and decreasing MAE. od: right eyes. os: left eyes. SE spherical equivalent. AL axial length. R² coefficient of determination, MAE mean absolute error. P(AE < 0.75D): proportion of predictions with absolute error less than 0.75 diopters. P(AE < 0.25 mm): proportion of predictions with absolute error less than 0.25 mm.

Fig. 6: Impact of prediction horizon (in years) on model prediction accuracy.
Fig. 6: Impact of prediction horizon (in years) on model prediction accuracy.
Full size image

ac Line plots showing how evaluation metrics for SE prediction in the right eyes vary with the prediction horizon. df Line plots illustrating the trends of SE prediction metrics for the left eyes as the prediction horizon changes. gi Line plots depicting the variation of AL prediction metrics for the right eyes with increasing prediction horizon. jl Line plots demonstrating how AL prediction metrics for the left eyes change with prediction horizon. The trend suggested that the model’s predictive accuracy declined moderately as the prediction horizon increased, as reflected by decreasing R², P(AE < 0.75D) and P(AE < 0.25 mm) values and increasing MAE. od right eyes. os left eyes, SE spherical equivalent, AL axial length, R² coefficient of determination. MAE mean absolute error. P(AE < 0.75D): proportion of predictions with absolute error less than 0.75 diopters. P(AE < 0.25 mm): proportion of predictions with absolute error <0.25 mm.

Fig. 7: Combined effect of the number of prior follow-up visits and prediction horizon (in years) on model prediction accuracy.
Fig. 7: Combined effect of the number of prior follow-up visits and prediction horizon (in years) on model prediction accuracy.
Full size image

ac Trends in right-eye SE prediction performance metrics with increasing prior visit count and prediction horizon. df Trends in left-eye SE prediction performance metrics with increasing prior visit count and prediction horizon. gi Trends in right-eye AL prediction performance metrics as the number of prior visits and prediction years increase. jl Trends in left-eye AL prediction performance metrics as the number of prior visits and prediction years increase. The model’s predictive accuracy showed a clear trend of improvement with more prior follow-up visits and a moderate decline with longer prediction horizons. od: right eyes. os: left eyes. SE spherical equivalent, AL axial length. R² coefficient of determination. MAE mean absolute error. P(AE < 0.75D): proportion of predictions with absolute error < 0.75 diopters. P(AE < 0.25 mm): proportion of predictions with absolute error <0.25 mm.

To enhance model interpretability, we conducted feature-ablation tests in the internal test set to assess the importance of each variable. We systematically removed key input features, including SE history, AL history, Age and Sex from input features respectively and evaluated the resulting impact on model performance. The results are shown in Table 4. We observed that removing SE or AL history led to the largest decrease in performance, while removing age and sex resulted in a slight performance decrease. These ablation results demonstrate that baseline SE and AL are the most critical predictors, while age and sex contribute moderately to prediction accuracy.

Table 4 Feature ablation analysis of NPM performance

The performance of Intervention Progression Module (IPM)

The IPM predicted future SE and AL under four myopia control strategies, and its performance was assessed using R² and MAE between predicted and observed outcomes. The results demonstrated that the IPM provided accurate predictions of myopia progression across all interventions. For Atropine, SE prediction achieved an R² of 0.97 with an MAE of 0.23D, and AL prediction achieved an R² of 0.96 with an MAE of 0.14 mm. For Ortho-K, SE prediction reached an R² of 0.88 with an MAE of 0.44D, and AL prediction reached an R² of 0.82 with an MAE of 0.31 mm. For PDS, SE prediction achieved an R² of 0.96 with an MAE of 0.27D, and AL prediction achieved an R² of 0.94 with an MAE of 0.16 mm. For RLRL, SE prediction achieved an R² of 0.90 with an MAE of 0.36D, and AL prediction achieved an R² of 0.90 with an MAE of 0.18 mm (Table 5, Fig. 8, Supplementary Fig. 9).

Fig. 8: Evaluation of individualized treatment effect (measured by SE) from myopia control interventions using MPPM.
Fig. 8: Evaluation of individualized treatment effect (measured by SE) from myopia control interventions using MPPM.
Full size image

We employed MPPM to assess the individualized treatment effects of four myopia control interventions: 0.01% atropine (Atropine, a, b), peripheral defocus spectacles (PDS, c, d), orthokeratology lenses (Ortho-K, e, f), and repeated low-level red-light therapy (RLRL, g, h). The x-axis indicates time in months, with month 0 marking the initiation of the myopia control intervention. The y-axis represents the change in myopia degree, measured by the difference in SE from baseline (time 0). Each scatter point corresponds to an individual data observation. Orange dots represent pre-intervention measurements, while blue dots correspond to post-intervention follow-up data. Green dots depict the predicted trajectory of myopia progression without intervention, as estimated by NPM. Red dots reflect the predicted changes in myopia after intervention, as estimated by IPM. Solid lines represent fitted trends for each set of data points. These plots illustrate: (1) IPM reliably predicted individual myopia progression after the initiation of myopia control interventions; and (2) compared to the natural progression predicted by NPM, the actual rate of myopia progression was reduced in individuals receiving each of the four interventions. For the RLRL cohort, the orange line is simply an extension of the green line in the opposite direction, as participants in this cohort did not have refractive measurements before timepoint 0. SE spherical equivalent, od: right eye, os: left eye, Atropine: 0.01% atropine eye drops; PDS peripheral defocus spectacles, Ortho-K orthokeratology, RLRL repeated low-level reg light therapy.

Table 5 Performance evaluation of the IPM and myopia intervention benefit predicted by the AI model

Individualized treatment effect (ITE) estimated by MPPM

By comparing the SE and AL changes predicted by the IPM under intervention conditions with those predicted by the NPM under non-intervention conditions, we found that on average Atropine reduced SE progression by ~55% and AL progression by ~75%; Ortho-K reduced SE and AL progression by ~45% and ~45%, respectively; PDS reduced SE and AL progression by ~50% and ~70%, respectively; RLRL therapy not only halted the progression of SE and AL, but also led to an approximate 10% reversal in SE (Table 5, Fig. 8, Supplementary Fig. 9). RLRL was the most effective intervention, while the others showed varying degrees of progression slowing. It should be noted that these estimates were based on different follow-up durations: within 1 year for RLRL and up to 30 months for the other three interventions.

Validation of the MPPM using an investigator initiated trial

Finally, we further validated the MPPM using data from an investigator initiated trial (IIT) of RLRL therapy. In this IIT trial, participants in the intervention group received RLRL treatment and wore single-vision spectacles, while those in the control group only wore single-vision spectacles. Follow-up examinations were conducted at 1, 3, 6, and 12 months after baseline. Baseline characteristics of participants are summarized in Table 6. We used each participant’s baseline features to predict SE and AL progression under natural conditions using NPM and compared these predictions with actual follow-up data from the control group. The model achieved an R² of 0.89 and an MAE of 0.35D for SE, and an R² of 0.85 and an MAE of 0.21 mm for AL. We then used IPM to predict changes in SE and AL under RLRL intervention and compared these with observed values in the treatment group. The model yielded an R² of 0.86 and an MAE of 0.37D for SE, and an R² of 0.83 and an MAE of 0.23 mm for AL (Table 7). These results further demonstrate the strong predictive performance of the MPPM model.

Table 6 Characteristics of the participants in the IIT of RLRL therapy
Table 7 Validation of MPPM using data from the IIT of RLRL therapy

Discussion

In this study, we developed and validated a Transformer-based time series AI model, MPPM, to predict long-term myopia progression in children and adolescents. The model has two modules: NPM, for predicting the natural progression of myopia; and IPM, for forecasting progression under specific interventions. The NPM demonstrated high accuracy in forecasting SE and AL over a 10-year period. Prediction performance improved with a greater number of prior follow-up visits and declined moderately as the prediction horizon increased. Building upon NPM, by incorporating a gradient reversal layer and adversarial training, we developed the IPM capable of predicting the causal effect between myopia control interventions and individualized patient benefits.

This study addressed two major gaps in the field of myopia prediction: (1) forecasting future AL in children and adolescents; (2) providing individualized, quantitative estimates of treatment benefits from myopia control interventions.

The use of AI for myopia prediction has recently become a research hotspot. Xu et al. developed an AI model that utilizes cycloplegic refraction and AL measurements from non-myopic children to predict the risk of future myopia occurrence32. Similarly, Qi et al. introduced an AI model that employs fundus photography and electronic medical records to assess the risk of myopia onset in children33. Foo et al. used fundus photography to predict the risk of high myopia occurrence in children within five years34. However, these studies did not predict the degree of myopia (SE and AL) in children on an annual basis, this is the key distinction of the present study. While some studies have attempted to predict future SE in children11,12,13,14,15, none have focused on AL. We attribute this gap largely to the lack of large-scale, longitudinal AL datasets with extended follow-up. To overcome this limitation, we introduced a machine-learning-based imputation strategy to infer missing AL values from long-term patient follow-up records. This enabled the construction of a comprehensive dataset suitable for model training. Using this augmented dataset, we developed a Transformer-based model for predicting both SE and AL. The model was subsequently validated on real (non-imputed) data and demonstrated satisfactory predictive performance.

More importantly, we developed a tool that enables precise, individualized prediction of the clinical benefits of different myopia control interventions in children with myopia. Although RCTs and cohort studies have established the efficacy of various myopia control interventions16,17,18,19,35,36,37,38, such evidence is typically population-based and reflects only average treatment effects. These traditional approaches do not provide individualized benefit estimates. To address this issue, we developed the IPM, a Transformer-based causal inference framework designed to overcome the influence of confounding factors and accurately estimate the causal effects of myopia control interventions on clinical outcomes. This approach enables individualized and quantitative prediction of intervention benefits, offering a more precise tool to support clinical decision-making in myopia control.

Using the MPPM model, we estimated the clinical benefits of four myopia control interventions (Atropine, PDS, Ortho-K, and RLRL) in children and adolescents, quantified as the reduction in SE and AL progression. Our results showed that over a follow-up period of up to 30 months, Atropine, PDS, and Ortho-K each slowed myopia progression to varying degrees, consistent with findings from previous RCTs and cohort studies16,18,19,35,36,37,38. Notably, we observed that, on average, RLRL therapy not only halted myopia progression but also led to a modest reversal in SE, which is also in line with results reported in prior RCTs17,20,21,39,40. However, it is important to note that although RLRL appears to be the most effective among the evaluated interventions, clinicians should remain cautious, as safety concerns regarding potential retinal damage have not yet been fully resolved9,10.

In Supplementary Fig. 9a–d, for children receiving Atropine and PDS therapy, the predicted natural growth of AL after time 0 (green line) was substantially faster than the actual AL growth observed before time 0 (orange line). This occurred because, based on their age (Atropine: 10.6 ± 2.32 years; PDS: 10.94 ± 2.39 years) and pre-treatment AL growth trajectory, the model inferred that these children would be entering a phase of accelerated axial elongation. In contrast, such a pattern was not observed in children treated with Ortho-K (Supplementary Fig. 9e–f), where the predicted natural AL growth after time 0 did not markedly exceed the pre-treatment growth rate. This is likely because these children were older at baseline (14.23 ± 3.54 years), at an age when the rate of axial elongation typically slows41,42. For children in these three treatment cohorts (Atropine, PDS, and Ortho-K), the actual AL growth after time 0 (blue line) was lower than the predicted natural growth (green line), indicating that the interventions effectively reduced axial elongation.

When comparing Supplementary Figs. 9a–d with 8a–d, we observed that for children treated with Atropine and PDS, although the model predicted a faster AL growth rate after time 0, the corresponding rate of SE progression did not accelerate. This finding is consistent with known ocular compensatory mechanisms in children and adolescents: During periods of axial elongation, thinning of the crystalline lens and deepening of the anterior chamber can partially compensate for refractive changes43,44,45. As a result, SE progression tends to lag behind AL growth, with the rate of SE change being slower than the rate of axial elongation. This observation further supports the physiological plausibility of the model’s predictions.

This model is currently at the research validation stage and has not yet been approved as a medical device. Future translation into clinical practice should adhere to established regulatory pathways, which necessitate prospective clinical validation and compliance with Software-as-a-Medical-Device (SaMD) requirements. In this study, we observed that once patient data were entered, the model generated prediction results within 1–2 seconds. Given this high inference speed, we anticipate that MPPM could be deployed using a server–client architecture to provide real-time clinical decision support in outpatient settings. In a typical workflow, ocular biometric parameters (SE and AL) collected during routine examinations would be input into the system. The model’s predictions would then be presented to clinicians to inform their management decisions. This proposed integration aligns with existing myopia management pathways without necessitating additional clinical procedures.

This study has several limitations:

  1. (1)

    The study population was limited to children and adolescents with SE ranging from +1.0D to −10.0D and AL between 23 mm and 28 mm, covering individuals from pre-myopia to moderate and high myopia. Individuals with ultra-high myopia were excluded, as such cases are often associated with posterior staphyloma and marked irregularities in ocular morphology46,47,48. These features substantially increase the complexity and reduce the reliability of refractive error predictions using the current model.

  2. (2)

    All participants were Chinese children and adolescents, with no representation from other nationalities. Given that the rate of myopia progression may vary substantially across populations with different ethnic, genetic, and environmental exposures49,50, the generalizability of our model to populations of other ethnic or geographic backgrounds remains to be further validated. Future work will focus on external validation in multi-ethnic cohorts, including datasets from regions with different prevalence profiles and environmental risk factors (e.g., European, Southeast Asian, and African populations). We are currently collaborating with our partners to collect refractive data from non-Chinese children residing in China, with the aim of establishing a multi-ethnic dataset to further validate the generalizability of the model. We also plan to collaborate with international pediatric ophthalmology centers to evaluate model performance across diverse demographic and lifestyle contexts. This will allow us to assess the robustness, transferability, and potential need for recalibration of the model in non-Chinese populations.

Another limitation is the restricted range of input modalities. The model currently uses only basic demographic and clinical variables, including age, sex, and previous refraction and AL measurements. It does not incorporate additional potentially relevant factors such as genetics (e.g., family history or genotyping)51,52, environmental influences (e.g., near work habits, screen time, or academic pressure)53,54, or imaging data (e.g., fundus photographs)33,34. Previous studies have demonstrated that both genetic and environmental factors play important roles in the onset and progression of myopia, and that retinal images may offer added predictive value.

However, we intentionally limited the model inputs for two key reasons:

  1. (1)

    Reducing the required input data enhances model accessibility and usability. Patients can receive personalized predictions based solely on prior clinical records, without undergoing additional tests such as fundus imaging. This simplicity facilitates large-scale deployment and maximizes the model’s reach and impact.

  2. (2)

    The longitudinal patterns captured in historical refractive data may already encode some effects of underlying genetic and environmental influences, which may explain the model’s strong performance despite its minimal input requirements.

Given the model’s robust predictive accuracy, we believe that increasing input complexity may not be necessary at this stage.

In summary, we introduced an AI-driven platform for personalized prediction and optimization of pediatric myopia management. Myopia poses a growing global public health challenge, underscoring the critical need for early prediction of its progression and personalized intervention in pediatric patients to mitigate disease burden. However, research gaps remain unaddressed in this domain: there are no tools for accurately predicting the degree of myopia over long term (including not only SE but also AL), nor are there tools for individualized prediction of the benefits of myopia control interventions. This study addresses these gaps by developing a Transformer-based time series AI model that enables accurate prediction of both SE and AL in children and adolescents over a 10-year period. Furthermore, it innovatively incorporates causal inference techniques to provide individualized predictions of the benefits of myopia control interventions. Our AI platform represents a transformative tool for guiding precision myopia management in pediatric populations, enabling clinicians to optimize intervention strategies based on individual risk profiles and predicted therapeutic responses.

Methods

Datasets and subjects

This study was conducted in accordance with the tenets of the Declaration of Helsinki, and the protocols were approved by the Clinical Research Ethics Committee of the Eye Hospital, Wenzhou Medical University (No. 2023-200-K-162). We retrospectively collected refractive examination data from children and adolescents at WMU and DCH, and additionally included data from an IIT of RLRL therapy.

The inclusion criteria were as follows:

  1. (1)

    age between 3 and 20 years (age at first visit ≤18 years, and age at the last recorded visit ≤20 years);

  2. (2)

    SE ranging from +1.0D to −10.0D, and AL between 23 mm and 28 mm;

  3. (3)

    at least two clinic visits with subjective refraction performed under cycloplegia and best-corrected visual acuity (BCVA) of ≥0.8 (Snellen);

  4. (4)

    absence of ocular diseases other than refractive errors, as confirmed by comprehensive ophthalmic examinations, including slit-lamp biomicroscopy, post-cycloplegic fundus examination, and strabismus assessment;

  5. (5)

    In the WMU non-intervention cohort and the DCH cohort, participants wore only single-vision spectacles and received no other myopia control interventions. In the WMU intervention cohorts, participants received single-vision spectacles and the assigned intervention, with no additional treatments. In the IIT study, participants had not undergone any interventions other than single-vision spectacles prior to enrollment. After enrollment, participants in the RLRL group received only single-vision spectacles and RLRL therapy, whereas those in the control group continued with single-vision spectacles alone.

The dataset included longitudinal visit records for each participant. Each record contained the following information: a de-identified unique patient ID, sex, date of birth, visit date and age at visit, and subjective refraction data (spherical power, cylindrical power, and cylinder axis). AL measurements were available for some visits, while others lacked this information. SE was calculated using the formula: SE = spherical power + 0.5 × cylindrical power.

Machine-learning-based imputation of AL data

To address missing AL data in some visit records, we employed a machine learning–based imputation strategy. Specifically, we developed an XGBoost regression model to predict AL, using age, sex, and the SE of both eyes as input features. The sex variable was numerically encoded (female = 0, male = 1) prior to model training.

Model training and validation were conducted on a subset of participants with complete AL records. Independent XGBoost models were trained for the right and left eyes. To robustly evaluate model performance, we employed five-fold cross-validation: 80% of the data were used for training and 20% for testing in each fold. Model performance was assessed using standard regression metrics, including mean squared error (MSE), MAE, R², and PCC between predicted and actual AL values. We reported the mean and standard deviation of these metrics across the five folds to reflect overall model performance.

The objective function was squared error. Hyperparameters were as follows: learning_rate = 0.05; max_depth = 6; n_estimators = 500; subsample = 0.8; colsample_bytree = 0.8; min_child_weight = 1; reg_alpha = 0.0; reg_lambda = 1.0; tree_method = “hist”; early_stopping_rounds = 50 using a fold-held validation split; random seed = 1.

After cross-validation, the final AL prediction models for the right and left eyes were retrained on the full subset of complete AL records using the optimal hyperparameters identified during validation. These trained models were then applied to the full dataset to impute missing AL values. Only missing AL fields were imputed; existing, non-missing AL records were left unchanged. The resulting AL-complete dataset was used to train the longitudinal myopia progression prediction model.

Model configuration

We used EHRFormer, a time-series transformer-based model. Our EHRFormer consists of an EHREmbedding encoder and a temporal GPT-2 encoder. The EHREmbedding uses a BERT backbone configured with 2 transformer layers and 12 attention heads (hidden size 768), GELU activations, and dropout 0.1. The temporal encoder is GPT‑2 initialized from the “gpt2” configuration (12 layers, 12 attention heads, hidden size 768). Visit order is encoded by explicit position_ids equal to the chronological index of each visit. A multi-task regression head maps the hidden state to SE and AL outputs; an auxiliary gradient-reversal head predicts medication type for representation invariance. For optimizer, we used AdamW with learning rate 5 × 10−5 and weight decay 1×10−3. Learning-rate schedule CosineAnnealingWarmupRestarts with first_cycle_steps = 50 epochs, warmup = 10% of epochs, min_lr = 1 × 10−8. We used a batch size of 100 and trained for a maximum of 50 epochs. No early stopping was applied; instead, the checkpoint with the lowest validation loss was selected. Mixed-precision (bf16) was used.

We use patient‑level 5‑fold mapping: train folds {1,2,3,4}, validation fold {0}, test fold {0}. No patient appears in more than one split. We verify empty intersections of patient IDs across train/validation/test before training. We set global random seed as 1 for Python, NumPy, and PyTorch. These settings ensure run‑to‑run stability on the same hardware and software stack.

Data processing and model training

To increase the sample size, longitudinal visit records for each participant were segmented into multiple training samples. The model was designed to use all available prior visit records of an individual to predict their future refractive status (including both SE and AL). For example, if an individual had four historical visit records (a, b, c, d), the records were transformed into the following training samples, each predicting the SE and AL at the next time point:

  1. (1)

    Record [a] used to predict SE and AL at record [b] (interval [a–b]);

  2. (2)

    Records [a, b] used to predict SE and AL at record [c] (interval [b–c]);

  3. (3)

    Records [a, b, c] used to predict SE and AL at record [d] (interval [c–d]).

Model performance evaluation

For a sample with n + 1 visit records, the model used the first n records to predict SE and AL at the (n + 1)-th visit. The difference between predicted and actual values at the (n + 1)-th visit was used to quantify prediction error. Model performance was evaluated using R² and MAE. For SE prediction, we additionally calculated the proportion of absolute errors within 0.75D (P[AE < 0.75D]), and for AL prediction, the proportion of absolute errors within 0.25 mm (P[AE < 0.25 mm]), representing the percentage of predictions falling within clinically acceptable error thresholds13,14,31.

We further examined how model performance varied with the number of prior visits and the time interval between the last historical visit and the predicted visit (i.e., the prediction horizon, measured in years). To do this, we stratified the test data based on the number of prior visits (1, 2, 3, …) and prediction horizons (0–1 year, 1–2 years, …), and recalculated R², MAE, and P[AE < 0.75D] within each stratum. Additionally, we performed a two-dimensional analysis to visualize the combined effect of prior visit count and prediction horizon on model accuracy.

Causal machine learning

Causal objective

We target counterfactual natural progression and ITE. Let \({H}_{t}\) denote the longitudinal patient history up to time \(t\), \({A}_{t}\in A\) the intervention at \(t\), and \({Y}_{t+\Delta }(a)\) the potential outcome at horizon \(\Delta\) under intervention \(a\) been taken at \(t\). The observed outcome is \({Y}_{t+\Delta }\) (i.e. SE and AL).

Define \({\mu }_{a}\left({H}_{t}\right)={\mathbb{E}}\left[{Y}_{t+\Delta }({\rm{a}}),|,{H}_{t}\right]\) and let \({a}_{0}\) denote “no intervention” (natural progression)

Our estimands are the natural-progression outcome \({\mu }_{a0}\left({H}_{t}\right)\) and the ITE \(\tau \left({H}_{t}\right)={\mu }_{a}\left({H}_{t}\right)-{\mu }_{a0}\left({H}_{t}\right)\) for a prespecified active \(a\). Identification follows from standard sequential assumptions: consistancy (if \({A}_{t}=a\) then \({Y}_{t+\Delta }={Y}_{t+\Delta }(a)\), with no hidden treatment versions or interference), sequential exchangeability (\({Y}_{t+\Delta }(a)\perp {A}_{t}|{H}_{t}\) for all a \(\in A\)), and positivity (\(0 < P({A}_{t}={a|}{H}_{t}) < 1\) on the support).

Transformer + adversarial deconfounding

We encode \({H}_{t}\) with a Transformer to obtain a temporal representation \({z}_{t}=\Phi \left({H}_{t}\right)\) by averaging hidden states. To remove confounding from treatment assignment, we train an adversarial classifier \(g\) to predict \({A}_{t}\) from \({z}_{t}\), coupled with a gradient reversal layer (GRL). During forward propagation GRL is the identity; during backpropagation it multiplies the gradient by \(-\lambda\), driving \(\Phi\) to suppress treatment-predictive information. The adversary is optimized with cross-entropy loss \({L}_{{adv}}\).

Outcome and causal losses

The outcome head \(f\) predicts \({Y}_{t+\triangle }\). To align with causal identification, we weight the outcome loss with inverse propensity weights: \({L}_{{task}}{\mathbb{=}}{\mathbb{E}}\left[w\left({A}_{t},{H}_{t}\right)\cdot l\left(f\left({z}_{t},{A}_{t}\right),{Y}_{t+\Delta }\right)\right],w=\frac{1}{\pi \left({A}_{t},|,{H}_{t}\right)}\left({IPTW}\right)\), where \(\pi \left({A}_{t},|,{H}_{t}\right)\) is a learned propensity model. The full objective is: \(\mathop{\min }\limits_{\Phi ,f}\mathop{\max }\limits_{g}{L}_{{total}}={L}_{{task}}+\beta {L}_{{adv}}\), which induces balanced, treatment-invariant representations \({z}_{t}\) (i.e., reduced \(I({z}_{t};{A}_{t})\)), a sufficient condition for unbiased counterfactual prediction under the assumptions.

Counterfactual prediction

At inference, we obtain counterfactuals by clamping the intervention while holding \({z}_{t}\) fixed: \({\hat{Y}}_{t+\Delta }^{a}=f({z}_{t},a),\,{\hat{\mu }}_{0}({H}_{t})={\hat{Y}}_{t+\Delta }^{a},\,\hat{\tau }({H}_{t})={\hat{Y}}_{t+\Delta }^{1}-{\hat{Y}}_{t+\Delta }^{0}\).