Artificial intelligence based prediction of first recurrence in neovascular age related macular degeneration with validation by 19 experts

Jang, Boa; Lee, Chan Ho; Kim, Seung Jin; Yoon, Chang Ki; Park, Un Chul; Choi, Jinwook; Lee, Eun Kyoung; Kim, Young-Gon

doi:10.1038/s41598-025-34480-8

Download PDF

Article
Open access
Published: 16 January 2026

Artificial intelligence based prediction of first recurrence in neovascular age related macular degeneration with validation by 19 experts

Boa Jang^1,2^na1^na2,
Chan Ho Lee³^na1^na2,
Seung Jin Kim²,
Chang Ki Yoon³,
Un Chul Park³,
Jinwook Choi^4,5,
Eun Kyoung Lee³^na1 &
…
Young-Gon Kim^2,6,7^na1

Scientific Reports volume 16, Article number: 4440 (2026) Cite this article

1860 Accesses
Metrics details

Subjects

Abstract

This study aimed to investigate the value and difference in predictive performance between ophthalmologists and a previously developed and validated artificial intelligence (AI) model, and to evaluate how AI assistance influences expert decision-making in reliably assessing recurrence prediction of neovascular age-related macular degeneration (nAMD) after anti-vascular endothelial growth factor (VEGF) treatment. 19 experts (nine retinal specialist ophthalmologists and ten non-retinal specialist ophthalmologists) predicted the first recurrence of nAMD within three months based on optical coherence tomography (OCT) images and clinical information. Predictions were made in five sessions with increasing information availability. The AI model used in this study had been developed and validated in our earlier work, and it predicted recurrence using baseline and after the loading phase OCT images. We compared the area under the receiver operating characteristic curve (AUROC), Fleiss’ kappa, and Delong’s test between expert groups and the AI algorithm. The study included 149 eyes of 130 patients. The AI model achieved an AUROC of 0.744 (95% confidence interval, 0.665–0.822). Expert performance improved across sessions, with AUROCs ranging from 0.562 ± 0.034 to 0.679 ± 0.049. No significant differences were observed between expert groups based on experience or subspecialty. AI-supported decisions showed slightly improved performance in predicting nAMD recurrence compared to human experts, regardless of clinical experience. These results suggest the potential of AI-assistance in predicting recurrence and optimizing treatment strategies for nAMD, which could significantly improve patient counseling and management. This study also highlights the novel contribution of evaluating the impact of AI assistance on ophthalmologists’ decision-making in nAMD recurrence prediction.

Anti-VEGF treatment outcome prediction based on optical coherence tomography images in neovascular age-related macular degeneration using a deep neural network

Article Open access 16 November 2024

Prediction of neovascular age-related macular degeneration recurrence using optical coherence tomography images with a deep neural network

Article Open access 11 March 2024

Dual site external validation of artificial intelligence-enabled treatment monitoring for neovascular age-related macular degeneration in England

Article Open access 19 September 2025

Introduction

Neovascular age-related macular degeneration (nAMD) is a major cause of vision loss worldwide^1,2 wherein increased levels of vascular endothelial growth factor (VEGF) lead to neovascularization of the choroidal and/or retinal vasculature³. Neovascular leakage causes the accumulation of pathological retinal fluids such as the intraretinal fluid (IRF), subretinal fluid (SRF), or pigment epithelial detachment (PED) in the macular area and damages the neurosensory retina⁴. Inhibition of VEGF by the intravitreal injection of anti-VEGF antibodies allows for fluid reduction and disease stabilization. Accordingly, anti-VEGF therapy is now the established treatment of choice for nAMD, preserving vision in many patients, and clinicians focus on fluid changes in optical coherence tomography (OCT) images to determine the treatment strategy⁵.

However, the short duration of action of anti-VEGF agents requires persistent treatment. Disease activity and the interval between recurrences are highly heterogeneous among patients, making treatment decisions challenging for clinicians. Understanding when the first recurrence occurs after three consecutive anti-VEGF loading treatments would be beneficial to clinicians in these decision-making challenges. Predicting the first recurrence can help decide how often to follow up patients after the loading phase and administer anti-VEGF therapy.

An artificial intelligence (AI)-based computer-aided diagnosis (CADx) system has been developed to predict nAMD progression using OCT. The latest CADx systems employ deep learning (DL) techniques, which are state-of-the-art technologies that deliver ideal outcomes^6,7,8,9,10. Treder et al.⁷ utilized the InceptionV3¹¹ model to differentiate between nAMD and normal OCT images. Hwang et al.⁹ utilized an AI-based cloud platform that accurately diagnosed nAMD and suggested treatment strategies, demonstrating its potential for use in telemedicine. However, previous studies have primarily focused on diagnosing nAMD in comparison with normal cases, rather than predicting the prognosis of nAMD.

Building on our preliminary study, which developed a DL model utilizing OCT images to predict whether the first recurrence would occur within three months after the loading phase, we demonstrated the feasibility of the algorithm¹². Although the task of predicting recurrence using only OCT images was very challenging, even for the advanced DL model, the performance was higher when using after-loading-phase OCT than baseline OCT images.

In this study, we aimed to extend our previous work by directly comparing the predictive performance of a previously validated DL model with that of ophthalmologists in forecasting the first recurrence of nAMD using OCT images. The overall workflow of the reader study is illustrated in Fig. 1, and participating ophthalmologists with varying subspecialties and experience levels completed five sequential reading sessions with increasing information availability, as shown in Supplementary Fig. 1. Beyond simple performance comparison, this study also sought to evaluate how AI-generated recurrence scores and heatmaps influence expert decision-making and inter-reader consistency across these sessions, thereby clarifying the potential role of AI assistance in clinical prediction of nAMD recurrence.

Results

Characteristics of the participants

A total of 149 eyes of 130 patients in the test set were evaluated in the current study (Table 1). Among the 149 eyes, 77 (51.7%) showed recurrence within 3 months and 72 (48.3%) experienced recurrence after 3 months, indicating a relatively balanced class distribution. The mean age of the patients was 72.1 ± 8.1 years. No significant differences were noted in the laterality, baseline best-corrected visual acuity (BCVA) in the logarithm of the minimum angle of resolution (logMAR), axial length, nAMD subtype, or anti-VEGF agents used for the loading phase between the two groups when the first recurrence was within 3 months after the loading phase. The first recurrence within 3 months group had a higher proportion of male participants and exhibited worse BCVA after the loading phase.

Table 1 Demographics and baseline clinical characteristics of the study participants.

Full size table

Twenty ophthalmologists were recruited for the study, of which ten were retinal specialist ophthalmologists (RSOs) and ten non-retinal specialist ophthalmologists (N-RSOs) (Supplementary Table 1). One retinal specialist ophthalmologist was excluded due to incomplete readings, resulting in 19 readers in the final analysis. Based on these 19 participants, the average age was 35.0 ± 5.61 years, and the average years of experience was 8.47 ± 6.20 years.

Comparison between AI-unassisted and AI-assisted reading performance

The performance evaluation of the AI-based CADx system using our in-house validation dataset, as previously reported in our earlier work¹², revealed an area under the receiver operating characteristic curve (AUROC) score¹³ of 0.600 (95% CIs 0.568–0.743) for OCT images at baseline and an AUROC of 0.725 (95% CIs 0.658–0.817) for OCT images after the loading phase¹². The ensemble of the two models through hard voting resulted in an improved AUROC of 0.744 (95% CI 0.655–0.822) for image-wise classification. The ensemble of baseline and after the loading phase OCT models achieved an accuracy of 0.698, sensitivity of 0.571, specificity of 0.833, positive predictive value (PPV) of 0.786, negative predictive value (NPV) of 0.645, and an F1 score of 0.622. These metrics provide a more comprehensive assessment of the model’s predictive capability, and the full details are provided in Supplementary Table 2.

Supplementary Fig. 2 depicts the performance metrics of all experts over the five reading sessions. A significant increase was noted in the AUROC score, reaching 0.679 in session 5, and 0.562 (p < 0.001), 0.663 (p < 0.05), and 0.649 (p < 0.001) in sessions 1, 3, and 4, respectively. The AI-based CADx system led to notable improvements in the AUROC across sessions 1, 3, and 4 (p < 0.05). Notably, session 2, which relied exclusively on OCT images after the loading phase for prognostic prediction, demonstrated outcomes comparable to those of session 5 (p = 0.307), as shown in first column of Table 2.

Table 2 Performance of nine RSOs and ten N-RSOs in five reading sessions.

Full size table

Comparative performance across different reader levels and inter-rater agreement between experts

Comparative performances across different reader levels and inter-rater agreement among different experts across the five reading sessions are highlighted in right side of Tables 2 and Supplementary Table 3, respectively. Table 2 also outlines the differences in performance between the RSOs and N-RSOs groups across the five reading sessions. Across sessions 1, 4, and 5, the AI-based CADx system significantly improved the AUROC scores for both groups (p < 0.05). For example, the overall AUROC increased from 0.562 (95% CI 0.545–0.578) in session 1 to 0.649 (95% CI: 0.623–0.675) in session 4 and 0.679 (95% CI 0.655–0.702) in session 5. Importantly, no statistically significant differences were observed between RSOs and N-RSOs in any session (p > 0.05), suggesting that the AI-based CADx system equally benefits both experienced retinal specialists and non-specialists in improving their ability to predict first recurrence.

Supplementary Table 3 details the reliability of agreement among different experts across the five reading sessions, quantified using Fleiss’ kappa to assess inter-rater agreements. The scoring convention for Fleiss’ kappa ranged from less than 0, indicating poor agreement, to slight, fair, moderate, substantial, and almost perfect agreement, with a score close to 1. It also highlights the improvement in the reliability of agreement among different experts with the AI-based CADx system across all sessions (p < 0.05), except for session 2 (p = 0.161). In session 5, the Fleiss’ kappa score indicated a moderate level of agreement, underscoring the impact of AI on enhancing prediction consistency among various experts. These results demonstrate how the AI-based CADx system aids in achieving more consistent prediction evaluations among different experts, particularly in sessions in which the kappa values reached statistical significance.

Prediction agreement between experts

A subgroup analysis was performed to determine which OCT features resulted in better agreement between the experts. A prediction was considered “good” when more than 70% of experts made the same prediction regarding the time of first recurrence. The degree of agreement is a crucial metric for assessing the reliability of the predictions and guiding the implementation of AI-assisted decision-making processes. Supplementary Table 4 shows the results of the analysis of cases with good agreement between experts categorized by a time to the first recurrence of 3 months. Early recurrence was predicted by the experts with good agreement if there was baseline subretinal hemorrhage or intraretinal hyper-reflective foci at baseline or after the loading phase (p < 0.001). Figure 2 shows the representative cases in which early recurrence was predicted with good inter-expert agreement. When subretinal hemorrhage was present at baseline and intraretinal hyperreflective foci were observed at baseline and after the loading phase, the experts predicted early recurrence within 3 months with good agreement.

Discussion

In this study, we conducted a comparative analysis of AI-assisted and non-AI-assisted recurrence predictions for nAMD, revealing significant insights into the role of AI in improving the ability to predict future recurrence. The AI-based CADx system consistently showed a notable improvement in performance across various experiments, underscoring the potential of the AI algorithm as a valuable tool for enhancing clinical decision making, irrespective of the specialist’s experience level. Specifically, the readings following the OCT images after the loading phase (session 2) aligned closely with those of the AI-based CADx system. This indicates that after the loading phase, OCT is an important factor in predicting the first recurrence of nAMD, further indicating that post-treatment morphological changes and clinical information are critical for predicting nAMD recurrence.

Statistically significant improvements in AUROC were observed in session 1 (p < 0.001), session 3 (p < 0.05), and session 4 (p < 0.001), whereas the sensitivity and specificity did not show significant differences in most sessions. Overall, AUROC was more stable and reliable than sensitivity or specificity, which showed substantial variability and rarely reached statistical significance, except for specificity in a few sessions. This indicates that although the AI-based CADx system may improve the overall accuracy of predictions, the consistency of detecting true positives (sensitivity) and true negatives (specificity) could vary significantly across different sessions.

Although the AUROC of 0.744 reflects moderate discriminative performance, this level of accuracy may still offer meaningful clinical value. In particular, a model capable of identifying patients with a higher likelihood of early recurrence can support triage and risk-stratification workflows after the loading phase. By flagging higher-risk individuals who may require closer monitoring or shorter follow-up intervals, the model could help prevent delayed detection of recurrent exudation, whereas lower-risk patients may be followed more flexibly. Therefore, even a moderately performing model may function as a practical adjunct to clinical decision-making rather than a standalone diagnostic tool.

However, when comparing the diagnostic performance between RSOs and N-RSOs, the comparative performance in the two groups revealed persistent difficulties in interpreting OCT images for predicting nAMD recurrence, despite the AI-based CADx system. This absence of performance difference highlights the inherent difficulty of predicting early nAMD recurrence from OCT images, even among experienced specialists. This challenge is also reflected in the broader literature. Previous studies primarily used machine learning to predict the demand or frequency of anti-VEGF treatment for nAMD^14,15,16,17, rather than to directly forecast recurrence timing. However, studies focused specifically on predicting the timing of nAMD recurrence remain limited^12,18. A recent study by Jung et al.¹⁸ proposed a DL model to predict whether patients with nAMD will experience their first recurrence within three months after receiving three loading injections and following one month of dry-up. They reported that the DL model achieved 53.0% accuracy in predicting nAMD recurrence using a single pre-injection image and 60.2% accuracy after viewing consecutive OCT images, outperforming ophthalmologists with 52.17% and 53.3% accuracy, respectively. They further found that both human specialists and the DL model showed limited ability to accurately predict outcomes based only on a single pretreatment OCT image, leading to almost random results. In our previous work, the DL model achieved an AUC of 0.725 using only a single after the loading phase OCT image¹², whereas in the current study AI-assisted ophthalmologists achieved an AUC of 0.678. Although our model performed better than that of Jung et al.¹⁸, the collective evidence consistently demonstrates that predicting early nAMD recurrence using OCT images alone remains a fundamentally challenging task for both AI models and human experts.

Additionally, this study confirmed that the presence of a subretinal hemorrhage or hyperreflective foci at baseline plays an important role in determining whether recurrence should be predicted by ophthalmologists. Moreover, the presence of subretinal hemorrhages in nAMD releases iron and hemosiderin into the environment, causing oxidative stress and ongoing damage to the macula^19,20, and this ongoing damage makes it more likely to recur. The presence of hyperreflective foci early in the disease has been reported to be associated with increased progression of nAMD²¹. In nAMD, hyperreflective foci have been reported to be present from the earliest stages of the disease, before anti-VEGF injections, and the greater the amount, the higher the markers of inflammation²². In our previous study, heatmap analysis of the recurrence classification mainly highlighted areas of pathological fluid, subsided choroidal neovascularization lesions, and hyperreflective foci on OCT scans¹². In this study, the presence of subretinal hemorrhage and hyperreflective foci were important for ophthalmologists in predicting early recurrence, confirming the role of these OCT biomarkers in predicting recurrence timing.

This study found that AI had a significant impact on improving consistency and reducing discrepancies in diagnostic approaches by analyzing the reliability of agreement among different experts over five sessions. This suggests that AI may contribute to a more standardized interpretation of complex OCT images, which is an important aspect for reliable diagnostic results. Reviewing the predictive agreements among experts also provides valuable insights into the OCT parameters related to agreement among diagnosticians, which along with the AI-based CADx system, could contribute to the development of standardized diagnostic criteria while addressing the diversity of individual interpretations.

Despite these positive findings, this study had several limitations. First, the performance of the AI algorithm did not reach a superior level, owing to an AUROC score of 0.744, which did not reach the desired threshold. Second, the study was conducted at a single tertiary referral center, and no external validation with an independent dataset was performed, which may limit the generalizability of our findings to broader clinical settings. Additionally, the relatively small dataset and the heterogeneous anti-VEGF treatment regimens, as noted in our previous work, may introduce variability in ground-truth labeling and affect the robustness of both the AI performance and the human–AI comparisons. Third, the one-day interval between reading sessions provides a potential cause of bias, emphasizing the need for careful consideration of the study design in future investigations. Finally, because experts had access to both OCT images and clinical information whereas the AI model relied solely on OCT images, the comparison was not fully symmetric. Future models incorporating relevant clinical variables may allow a more balanced and comprehensive evaluation.

To address these limitations, future studies should focus on collecting additional data and model tuning to improve the performance of AI-based CADx systems. In particular, external validation using multi-center datasets will be essential to assess the model’s generalizability across diverse clinical environments. Developing a model for predicting the actual recurrence time is a crucial step forward, providing clinicians with more precise information for treatment planning. In addition, developing AI-based applications tailored for ophthalmology clinics has the potential to improve the diagnostic process, efficiency, accuracy, and consistency in clinical settings. These methods of future work aim to take advantage of AI while addressing its current limitations and ultimately increasing its productivity in the field of ophthalmology.

Methods

Deep learning algorithms

In this study, an AI-based CADx system using OCT was employed¹². The model had been previously trained and validated using 1,295 OCT images from 1172 patients, with a balanced distribution between recurrence and non-recurrence cases. The dataset was randomly split into 70% training, 20% validation, and 10% test sets at the patient level, and five-fold cross-validation was performed to address the limited dataset size and reduce overfitting. The AI algorithm involves a dual-step process: first, it identifies the fluid regions via a fluid segmentation model, followed by a binary classification to predict the recurrence of nAMD within three months after the loading phase. These two distinct AI algorithms are subsequently integrated into a comprehensive AI-based CADx system. The system generates a recurrence score ranging from 0 to 100% that reflects the likelihood of recurrence within a specified period and also provides a heat map that highlights the primary regions prone to recurrence.

For reproducibility, we additionally summarize the key components of the previously developed AI model used in this study. Fluid regions were first localized using a U-Net–based segmentation module trained on expert-annotated OCT images, and the resulting fluid masks were used to extract fluid-centered patches (400 × 400 pixels) as classifier inputs. The recurrence classification network used a ResNet50 backbone pre-trained on ImageNet and was trained for 100 epochs using softmax cross-entropy loss, the Adam optimizer (learning rate 0.0001), and a batch size of 8. Final recurrence predictions were generated by combining outputs from classifiers trained on baseline and after the loading phase OCT images through a hard-voting ensemble, which served as the model presented to readers during the AI-assisted session.

To ensure the integrity of the AI model development, all dataset splits for training, validation, and testing were conducted at the patient level, such that both eyes from the same individual were assigned to the same subset. This prevented any leakage of patient-specific information across data partitions.

Patients and datasets

This retrospective study included patients with treatment-naïve nAMD who visited Seoul National University Hospital (SNUH) between February 2008 and July 2021. All patients were treated with three consecutive loading intravitreal injections of ranibizumab (Lucentis; Novartis, Basel, Switzerland), aflibercept (Eylea; Bayer Pharma, Germany), or bevacizumab (Avastin; F. Hoffmann-La Roche Ltd., Basel, Switzerland). Baseline OCT scans were obtained when the patient was first diagnosed with nAMD, whereas OCT scans after the loading phase were taken one month after three consecutive loading injections. In this study, recurrence was defined as the initial appearance of a new retinal hemorrhage or intra/subretinal fluid accumulation after the initial resolution of exudative changes after three loading injections. Although the persistence of PED was not considered a recurrence, the increase in PED size was considered as recurrence. This study was approved by the Institutional Review Board of Seoul National University Hospital (IRB approval number: 2107-223-1239) and adhered to the tenets of the Declaration of Helsinki. The Institutional Review Board of Seoul National University Hospital waived the need for written informed consent from the participants because of the retrospective design of the study.

Experiment setup

Twenty ophthalmologists were recruited to assess their performance. A comprehensive workflow of the experiment is shown in Fig. 1. The experiment consisted of five reading sessions, and each session was followed by a one-day washout period. The reading times were automatically recorded using a web-based in-house tool. The information provided in each session was as follows: patient’s baseline OCT only (session 1), patient’s after the loading phase OCT only (session 2), both the patient’s baseline and after the loading phase OCT (session 3), clinical information including age, sex, nAMD types, and anti-VEGF agents used, with both the patient’s baseline and after the loading phase OCT (session 4), DL algorithm results, clinical information, and both the patient’s baseline and after the loading phase OCT (session 5). In session 5, the AI algorithm outputted a recurrence score expressed as a percentage from 0 to 100, indicating the likelihood of the first recurrence of nAMD within 3 months after the loading phase. In addition, gradient-weighted class activation mapping (Grad-CAM)²³ using the gradients of the target to represent a localization map highlighting the main regions in the image for predicting the target is presented in session 5.

In each session, the experts sequentially analyzed the OCT images in a predetermined order. The reading environment of each reader remained constant throughout the sessions. An in-house tool specifically designed for this study facilitated a standardized evaluation process. In the session that included AI-assistance, the original OCT images were overlaid with a heatmap that could be easily switched on and off using a user-friendly on/off toggle. Experts considered both the AI-assistance results and original OCT findings, forming their individual judgments using a binary-point scale. The information provided in each reading session and the experimental screen are shown in Supplementary Fig. 1.

Evaluation of the predicted model and statistical analysis

The performance of the experiment was measured using an AUROC score, which became the predictive accuracy of the algorithm. To compare AUROC score across reading sessions, we employed DeLong’s test²⁴, a standard nonparametric method for assessing differences between correlated ROC curves and widely used in multi-reader designs²⁵. In this study, DeLong’s test was applied to ROC curves generated by pooling the predictions of all readers within each session, rather than at the individual-reader level. Because this was an exploratory multi-session reader study, no formal multiple-comparison correction was applied. Session 5 served as the AI-assisted reference condition and, therefore, no pairwise comparison was performed for this session.

Statistical analysis was conducted using the difference in Fleiss’ kappa²⁶. Fleiss’s kappa is an adaptation of Cohen’s kappa designed for scenarios involving three or more raters, making it particularly suitable for this reader study with multiple participants. Kruskal–Wallis and Chi-squared tests were used for comparisons between the distributions of datasets and agreement analyses. Statistical significance was set at p < 0.05, ensuring stringent criteria for determining the significance of the observed differences.

Data availability

All data generated or analyzed during this study are included in this published article.

References

Friedman, D. S. et al. Prevalence of age-related macular degeneration in the United States. Arch. Ophthalmol. 122, 564–572 (2004).
Article PubMed Google Scholar
Quartilho, A. et al. Leading causes of certifiable visual loss in England and Wales during the year ending 31 March 2013. Eye 30, 602–607. https://doi.org/10.1038/eye.2015.288 (2016).
Article CAS PubMed PubMed Central Google Scholar
Spilsbury, K., Garrett, K. L., Shen, W.-Y., Constable, I. J. & Rakoczy, P. E. Overexpression of vascular endothelial growth factor (VEGF) in the retinal pigment epithelium leads to the development of choroidal neovascularization. Am. J. Pathol. 157, 135–144. https://doi.org/10.1016/S0002-9440(10)64525-7 (2000).
Article CAS PubMed PubMed Central Google Scholar
Schmidt-Erfurth, U., Vogl, W.-D., Jampol, L. M. & Bogunović, H. Application of Automated Quantification of Fluid Volumes to Anti–VEGF Therapy of Neovascular Age-Related Macular Degeneration. Ophthalmology 127, 1211–1219. https://doi.org/10.1016/j.ophtha.2020.03.010 (2020).
Article PubMed Google Scholar
Cheng, C.-K. et al. Optimal approaches and criteria to treat-and-extend regimen implementation for Neovascular age-related macular degeneration: experts consensus in Taiwan. BMC Ophthalmol. 22, 25. https://doi.org/10.1186/s12886-021-02231-8 (2022).
Article PubMed PubMed Central Google Scholar
Lee, C. S., Baughman, D. M. & Lee, A. Y. Deep learning is effective for classifying normal versus age-related macular degeneration OCT images. Ophthalmol. Retina 1, 322–327 (2017).
Article PubMed PubMed Central Google Scholar
Treder, M., Lauermann, J. L. & Eter, N. Automated detection of exudative age-related macular degeneration in spectral domain optical coherence tomography using deep learning. Graefes Arch. Clin. Exp. Ophthalmol. 256, 259–265 (2018).
Article CAS PubMed Google Scholar
He, T., Zhou, Q. & Zou, Y. Automatic detection of age-related macular degeneration based on deep learning and local outlier factor algorithm. Diagnostics 12, 532 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hwang, D.-K. et al. Artificial intelligence-based decision-making for age-related macular degeneration. Theranostics 9, 232 (2019).
Article PubMed PubMed Central Google Scholar
Chen, Y.-M., Huang, W.-T., Ho, W.-H. & Tsai, J.-T. Classification of age-related macular degeneration using convolutional-neural-network-based transfer learning. BMC Bioinf. 22, 1–16 (2021).
Article Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.
Jang, B. et al. Preliminary analysis of predicting the first recurrence in patients with neovascular age-related macular degeneration using deep learning. BMC Ophthalmol. 23, 499. https://doi.org/10.1186/s12886-023-03229-0 (2023).
Article PubMed PubMed Central Google Scholar
Mandrekar, J. N. Receiver operating characteristic curve in diagnostic test assessment. J. Thorac. Oncol. 5, 1315–1316 (2010).
Article PubMed Google Scholar
Gallardo, M. et al. Machine learning can predict anti-VEGF treatment demand in a treat-and-extend regimen for patients with neovascular AMD, DME, and RVO associated macular Edema. Ophthalmol. Retina 5, 604–624. https://doi.org/10.1016/j.oret.2021.05.002 (2021).
Article PubMed Google Scholar
Pfau, M. et al. Probabilistic forecasting of anti-VEGF treatment frequency in neovascular age-related macular degeneration. Transl. Vis. Sci. Technol. 10, 30–30. https://doi.org/10.1167/tvst.10.7.30 (2021).
Article PubMed PubMed Central Google Scholar
Chandra, R. S. & Ying, G.-S. Evaluation of multiple machine learning models for predicting number of anti-VEGF injections in the comparison of AMD treatment trials (CATT). Transl. Vis. Sci. Technol. 12, 18–18. https://doi.org/10.1167/tvst.12.1.18 (2023).
Article PubMed PubMed Central Google Scholar
Romo-Bucheli, D., Erfurth, U. S. & Bogunovic, H. End-to-end deep learning model for predicting treatment requirements in neovascular AMD from longitudinal retinal OCT imaging. IEEE J. Biomed. Health Inform. 24, 3456–3465. https://doi.org/10.1109/jbhi.2020.3000136 (2020).
Article PubMed Google Scholar
Jung, J. et al. Prediction of neovascular age-related macular degeneration recurrence using optical coherence tomography images with a deep neural network. Sci. Rep. 14, 5854. https://doi.org/10.1038/s41598-024-56309-6 (2024).
Article CAS PubMed PubMed Central ADS Google Scholar
Oncel, D., Oncel, D., Mishra, K., Oncel, M. & Arevalo, J. F. Current management of subretinal hemorrhage in neovascular age-related macular degeneration. Ophthalmologica 246, 295–305 (2023).
Article PubMed Google Scholar
Casini, G. et al. Traumatic submacular hemorrhage: available treatment options and synthesis of the literature. Int. J. Retina Vitr. 5, 48. https://doi.org/10.1186/s40942-019-0200-0 (2019).
Article Google Scholar
Waldstein, S. M. et al. Characterization of drusen and hyperreflective foci as biomarkers for disease progression in age-related macular degeneration using artificial intelligence in optical coherence tomography. JAMA Ophthalmol. 138, 740–747. https://doi.org/10.1001/jamaophthalmol.2020.1376 (2020).
Article PubMed PubMed Central Google Scholar
Wu, J. et al. Imaging hyperreflective foci as an inflammatory biomarker after anti-vegf treatment in neovascular age-related macular degeneration patients with optical coherence tomography angiography. Biomed. Res. Int. 2021, 6648191. https://doi.org/10.1155/2021/6648191 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Selvaraju, R. R. et al. in Proceedings of the IEEE International Conference on Computer Vision. 618–626.
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 837–845 (1988).
Obuchowski, N. A., Gallas, B. D. & Hillis, S. L. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods. Acad. Radiol. 19, 1508–1517 (2012).
Article PubMed PubMed Central Google Scholar
Fleiss, J. L., Levin, B. & Paik, M. C. Statistical Methods for Rates and Proportions. (john wiley & sons, 2013).

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MIST) (2021R1F1A1045417 and RS-2023-00219548).

Author information

Boa Jang, Chan Ho Lee, Eun Kyoung Lee and Young-Gon Kim contributed equally to this work.
Boa Jang and Chan Ho Lee have shared first authorship.

Authors and Affiliations

Interdisciplinary Program in Bioengineering, College of Engineering, Seoul National University, Seoul, Republic of Korea
Boa Jang
Department of Transdisciplinary Medicine, Seoul National University Hospital, #101, Daehak-Ro, Jongno-Gu, Seoul, 03080, Republic of Korea
Boa Jang, Seung Jin Kim & Young-Gon Kim
Department of Ophthalmology, Seoul National University College of Medicine, Seoul National University Hospital, #101, Daehak-Ro, Jongno-Gu, Seoul, 03080, Republic of Korea
Chan Ho Lee, Chang Ki Yoon, Un Chul Park & Eun Kyoung Lee
Department of Biomedical Engineering, College of Medicine, Seoul National University, Seoul, Republic of Korea
Jinwook Choi
Institute of Medical and Biological Engineering, Medical Research Center, Seoul National University, Seoul, Republic of Korea
Jinwook Choi
Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
Young-Gon Kim
Healthcare AI Research Institute, Seoul National University Hospital, Seoul, Republic of Korea
Young-Gon Kim

Authors

Boa Jang
View author publications
Search author on:PubMed Google Scholar
Chan Ho Lee
View author publications
Search author on:PubMed Google Scholar
Seung Jin Kim
View author publications
Search author on:PubMed Google Scholar
Chang Ki Yoon
View author publications
Search author on:PubMed Google Scholar
Un Chul Park
View author publications
Search author on:PubMed Google Scholar
Jinwook Choi
View author publications
Search author on:PubMed Google Scholar
Eun Kyoung Lee
View author publications
Search author on:PubMed Google Scholar
Young-Gon Kim
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors have made substantive intellectual contributions to this manuscript. Y.G.K. and E.K.L. designed and conducted the study. C.H.L. and E.K.L. collected and managed the data. B.J. and C.H.L. analyzed and interpreted the data. S.J.K. developed artificial intelligence-based computer-aided diagnosis system. C.K.Y., U.C.P., and J.C. contributed to critical review of the manuscript and provided important intellectual input. B.J. and C.H.L. drafted the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Eun Kyoung Lee or Young-Gon Kim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information. (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jang, B., Lee, C.H., Kim, S.J. et al. Artificial intelligence based prediction of first recurrence in neovascular age related macular degeneration with validation by 19 experts. Sci Rep 16, 4440 (2026). https://doi.org/10.1038/s41598-025-34480-8

Download citation

Received: 28 February 2025
Accepted: 29 December 2025
Published: 16 January 2026
Version of record: 02 February 2026
DOI: https://doi.org/10.1038/s41598-025-34480-8