Abstract
Retinal age has emerged as a promising biomarker of aging, offering a non-invasive and accessible assessment tool. We developed a deep learning model to estimate retinal age with enhanced accuracy, leveraging retinal images from diverse populations. Our approach integrates self-supervised learning to capture chronological information from both snapshot and sequential images, alongside a progressive label distribution learning module to model biological aging variability. Trained and validated on healthy cohorts (34,433 participants from the UK Biobank and three Chinese cohorts), the model achieved a mean absolute error of 2.79 years, surpassing previous methods. When applied to broader populations, analysis of the retinal age gap—the difference between retina-predicted and chronological age—revealed associations with increased risks of all-cause mortality and multiple age-related diseases. These findings highlight the potential of retinal age as a reliable biomarker for predicting survival and aging outcomes, supporting targeted risk management and precision health interventions.
Similar content being viewed by others
Introduction
The world’s aging population is placing increasing strain on healthcare systems, amplifying the burden of chronic diseases and raising concerns about quality of care. While current approaches typically treat age-related conditions in isolation, the geroscience hypothesis posits that fundamental aging processes drive systemic cellular deterioration, accelerating disease onset across organ systems1. Targeting biological aging directly could delay or prevent multiple chronic diseases simultaneously, offering a more effective strategy than traditional disease-specific interventions2. However, individuals of the same chronological age exhibit substantial variability in biological aging and health status, highlighting the limitations of chronological age as a predictor of age-related risks. This underscores the urgent need for accurate biological age assessments to capture individual vulnerabilities, improve risk prediction, and enable more precise, personalized interventions3,4.
Initial efforts to estimate biological age have relied on molecular and physiological biomarkers, including epigenetic clocks5, transcriptomic6 and proteomic signatures7,8, and composite clinical indices3. While these methods offer valuable insights into aging mechanisms, their invasiveness, complexity, and reliance on specialized expertise limit their feasibility for large-scale clinical use. Deep learning methods have facilitated biological age estimation using non-invasive medical imaging, including chest X-ray9, brain MRI10,11,12, and 3D facial imaging13. However, these techniques are hindered by high costs, complex infrastructure requirements, and variable clinical reliability, impeding their integration into routine health assessments. These limitations underscore the need for reliable, non-invasive, cost-effective, and widely accessible biomarkers of biological aging that can be seamlessly incorporated into clinical workflows.
Retinal age, derived from fundus imaging, has emerged as a promising biomarker in aging research14,15,16,17,18,19,20. Unlike neuroimaging modalities that require specialized expertise and equipment, retinal imaging is minimally invasive, largely automated, and allows rapid image acquisition, making it particularly suitable for widespread clinical implementation. The retina’s unique composition of vascular and neural tissues, which demonstrates age-related changes paralleling those in critical organs such as the heart and brain, highlights its potential as a window into systemic aging processes21,22,23. These characteristics position retinal age as an exceptionally valuable clinical biomarker, offering distinct advantages in accessibility, non-invasiveness, and potential for integration into routine healthcare assessments.
Several studies have developed retinal age prediction models using deep learning14,24, however, fundamental challenges persist in capturing the complexity of biological aging. A key issue stems from the dynamic and highly variable nature of the aging process. Existing models predominantly rely on single-time snapshot retinal images and predict retinal age as a single-point value using regression methods. While straightforward, this approach oversimplifies the complexity of aging14,24, failing to account for temporal fluctuations and individualized trajectories, which are critical for understanding biological aging25. This constraint reduces both predictive accuracy and clinical relevance14,24. Another major limitation is the lack of generalizability across diverse populations. Aging manifests differently across individuals due to genetic, environmental, and lifestyle factors. Existing studies have relied on single-source datasets, leading to models that may not adequately represent demographic variability14,24. This focus risks introducing biases into predictions and undermines the applicability of these models to broader, heterogeneous populations26,27. Finally, current research has primarily focused on the retinal age gap—the difference between predicted retinal age and chronological age—in relation to specific health conditions such as cardiovascular disease and kidney diseases14,16,19,28,29,30,31. While these studies offer valuable insights into the predictive utility of retinal age gaps for individual diseases, they do not fully explore their potential connections to a broader range of age-related conditions. Given the systemic nature of aging, further investigation into these wider associations could establish retinal age as a robust and holistic biomarker for biological aging and health outcomes.
In this study, we aim to address these challenges by presenting a new framework for retinal age estimation that introduces several advancements. First, we employ a temporal-order self-supervised learning (TO-SSL) framework that leverages longitudinal retinal images to capture the dynamic nature of aging. By incorporating masked image modeling and temporal order tuning, our approach models aging progression over time, addressing the limitations of snapshot-based analyses. Second, we propose a two-stage label distribution learning (LDL) approach that models retinal age as a probabilistic distribution rather than a single-point estimate, reflecting the inherent variability of biological aging. To enhance generalizability across diverse populations, we introduce cross-domain ordinal constraints that ensure consistent feature alignment across datasets, mitigating demographic biases. Additionally, we validate our model across multiple populations by integrating data from the UK Biobank (UKB) and three Chinese (CHN) cohorts, demonstrating its robustness and broad applicability. Finally, we conduct a comprehensive analysis of retinal age gaps, linking them to mortality and a wide spectrum of age-related diseases (see Fig. 1), providing new insights into the systemic health implications of this biomarker.
a Data collection and preparation: The training dataset comprises retinal images from the UK Biobank (UKB) and a Chinese (CHN) cohort, selected to represent a generally healthy population. These images undergo automated quality control through an image grading model. b Framework for estimating retinal age: In Stage I, snapshot images are used to train a masked auto-encoder. This is followed by a partial fine-tuning of the encoder with longitudinal images, enhancing its ability to model the temporal evolution for retinal aging. Stage II involves a sophisticated two-stage label distribution learning process, where a regression head is fused with the pre-trained encoder, enabling accurate prediction of retinal age. c Evaluation of the retinal aging biomarker: The model’s performance is tested across a varied demographic, utilizing the difference between calculated retinal age (RA) and chronological age (CA) as an innovative biomarker to predict susceptibility to age-related diseases.
Results
Study design and overview
In this study, we evaluated retinal age as a biomarker of biological aging through the development and validation of a predictive model and subsequent disease association analyses. The study was structured into three interconnected phases: data processing, model development, and clinical validation. In the data processing phase, retinal fundus images were curated and preprocessed to ensure quality and reliability. A dataset of 133,894 images from 32,282 healthy individuals aged 15–92 was collected from three CHN cohorts32 and UKB cohort A for model development. An additional dataset of 45,436 participants from UKB cohort B, representing the general population, was reserved for disease association analyses. All images underwent automated quality control using a grading model trained on the EyePACS-Q dataset33, classifying images as good, usable, or rejected based on illumination, artifacts, and diagnostic clarity34. Only gradable images (good or usable) were retained to ensure data integrity.
The model development phase centered on creating the retinal age prediction model using data from healthy individuals, under the assumption that biological age aligns with chronological age in the absence of chronic diseases1,12. The model utilized snapshot retinal images to extract spatial features, while longitudinal images captured temporal aging patterns. We implemented a label distribution learning approach to predict retinal age as a probabilistic distribution, with model performance primarily evaluated using mean absolute error (MAE). A lower MAE reflects the model’s enhanced ability to approximate biological age, particularly in healthy individuals, where it closely tracks chronological age. Through ablation studies, we assessed individual model components’ contributions. Additionally, attention map visualizations and systematic image manipulations identified key age-predictive retinal features, while analysis of microvascular features quantified from segmented retinal images further elucidated their relationship with aging.
In the clinical validation phase, we applied the model to the general population to evaluate the retinal age gap as a potential biomarker of aging deviations. This gap, defined as the difference between predicted retinal age and chronological age, was analyzed using data from 45,436 participants in UKB cohort B. We investigated associations between the retinal age gap and various age-related conditions, spanning cardiometabolic, neurological, and ocular diseases. Using Cox proportional hazards models, we quantified associations with mortality and disease incidence, deriving hazard ratios and confidence intervals to characterize the relationship between retinal age gap and age-related health outcomes.
Evaluation of retinal age estimation in relatively healthy participants
In Fig. 2, we present the primary outcomes of retinal age estimation. Our model demonstrated impressive accuracy, achieving an MAE of 2.79 years in both the CHN and UKB cohorts. Moreover, it exhibited a strong correlation between retinal age and chronological age, with Pearson correlation coefficients of 0.954 and 0.885 for the CHN and UKB cohorts, respectively. In Fig. 2c and d, we assessed the level of agreement between the predicted retinal age (RA) and chronological age (CA) using Bland-Altman plots. Bland-Altman plots analyses revealed that the average differences between RA and CA were −0.04 (−1.96 SD −7.1 to +1.96 SD 7.19) for CHN cohort and −0.12 (−1.96 SD −7.20 to +1.96 −7.00)for UKB cohorts. We analyzed the retinal age deviations across various age groups, categorized as follows: under 20, 20–30, 30–40, 40–50, 50–60, 60–70, 70–80, and over 80 years. Predictions within the error margins (−2.79 ≤ age deviation ≤ 2.79) are considered to fall within the expected technological accuracy. Figure 2e and f show that most age groups exhibited an age deviation close to zero, and a significant proportion of cases (60%) fell within this error range. Across age groups (20–30, 30–40, 40–50, 50–60, 60–70, and 70–80), the proportion of prediction differences within the MAE varied between 53.6% and 71.9%. Lower percentages were observed in the 10–20 age group (12.5%, n = 8) and in individuals aged ≥80 years (27.4%, n = 350). The overall age deviation distribution is shown in Fig. 2g and h. We present the age label distribution for predicted retinal ages. These predictions tend to cluster around the corresponding chronological ages, as illustrated in Fig. 2i. Additionally, Fig. 2j provides a detailed examination of four distinct cases of age estimation, showcasing the respective age label distributions for each case.
a, b Scatterplots for the correlation of the retinal age and chronological age for the CHN cohort (a) and the UKB cohort (b). c, d Bland-Altman plots for the agreement between retinal age and chronological age for the CHN cohort (c) and the UKB cohort (d). e, f the distribution of retinal age gap by age groups. g, h The distributions of the retinal age gap for the CHN cohort (g) and the UKB cohort (h). i The average probability distribution of the age prediction on the age labels. j Showcases of the predicted retinal age distribution for individual samples.
We then compare our model with existing popular regression methods, including the direct regression method, classification-based methods35,36, and ranking-based methods37. The Mean-variance method improves classification models by adding concentration regularization, while the ranking-based method explicitly introduces ordinal information through a combination of binary classifiers. The POE methods model uncertainty using probabilistic embeddings. In addition, we include two baseline predictors: a simple mean predictor, which predicts the mean age of the training set for all test samples, and a conditional mean predictor based on sex, which predicts the mean age for each sex group. Table 1 presents the detailed comparison results. The mean predictor and the conditional mean predictor (sex-based) serve as naïve baselines, with MAE of 9.09 years (on all data) and 6.74 years (on the UKB cohort), respectively, demonstrating the limitations of simple demographic-based approaches. Among existing methods, classification-based models outperform direct regression methods across all data cohorts, consistent with previous findings35,38. The POE-CLS method achieves the best performance among the baselines; however, its accuracy is still inferior to our proposed model. Our method denoted as RLDL (Retinal Label Distribution Learning), achieves an MAE of 2.84 years, an RMSE of 3.70, and a Pearson’s correlation coefficient (R) of 0.950 when trained solely on snapshot images. These results highlight the superiority of our approach in capturing age-related features and improving prediction accuracy over both naïve and established baseline methods.
Ablation study of the proposed method
Here, we present the ablation study results of our model to demonstrate the impact of various components on its overall performance. The impact of each component is summarized in Table 2, with detailed analyses provided below.
Initially, the base model using only the coarse age prediction stage achieved a MAE of 3.08 and a Pearson correlation coefficient (R) of 0.946. Introducing the refined age prediction stage reduced the MAE by approximately 0.2, improving the precision of the model. The refined stage further centralized the predicted age distributions around the true age, as demonstrated in Fig. 2j. Supplementary Fig. 1 illustrates additional cases, showing that the refined predictions align more closely with the true age compared to the coarse predictions. When analyzed by age groups (Fig. 3e), the refined predictions consistently outperformed the coarse predictions in groups aged 30–40, 40–50, 50–60, 60–70, 70–80, and ≥80. The refined stage yielded lower MAEs, particularly for older individuals, where the refined model provided significant gains in accuracy. In younger age groups (≤30), coarse predictions performed better, likely due to greater variability in biological aging in this demographic, where coarser age bins may provide more stable predictions.
a UMAP plot of the image features for the different pre-trained models. b Evaluation of the pre-trained features using linear probing and k-NN. c UMAP plot of the image features for the baseline model, ordinal regularized model, and ordinal-aligned model. d Adaptation result when using different training and test cohort data. e Distribution of retinal age deviation for coarse and refined predictions across age groups.
The integration of ordinal feature alignment further improved the model’s performance. While the improvement in age prediction accuracy was modest, the feature space organization showed significant enhancements, as visualized in Fig. 3c. The aligned feature space demonstrated better clustering of age-related features, highlighting the benefit of incorporating ordinal constraints. To evaluate the model’s generalizability across cohorts, we conducted experiments by training the model on one cohort (e.g., UKB cohort) and testing it on another (e.g., the CHN cohort). As shown in Fig. 3e, this cross-cohort evaluation resulted in decreased prediction accuracy due to inherent differences in dataset characteristics. However, combining data from multiple cohorts during training improved the model’s robustness and prediction accuracy, emphasizing the value of diverse training datasets.
We explored the effect of different pre-training sources by visualizing the features using UMAP plots in Fig. 3a. Features pre-trained on temporal images showed a significant improvement in aligning with the age labels compared to those pre-trained on ImageNet or solely on snapshot retinal images. pre-training the model on temporal fundus images further reduced the MAE from 2.84 to 2.79, demonstrating the effectiveness of incorporating longitudinal data. The impact of pre-training sources was visualized using UMAP plots in Fig. 3a, where features pre-trained on temporal images exhibited stronger alignment with age labels compared to features pre-trained on ImageNet or solely on snapshot retinal images. Linear probing (LP) and k-nearest neighbors (k-NN) evaluations further supported these findings, as shown in Fig. 3b. Temporal image pre-training consistently outperformed both ImageNet and snapshot retinal image pre-training in terms of MAE, across both UKB and CHN cohorts.
Model visualization and explanation
To understand which areas of the retinal image influence the model’s age prediction, we employed both attention map analysis and systematic image manipulation experiments. For attention visualization, we adopt the approach developed by Chefer et al.39, which computes relevancy in Vision Transformers. This technique assigns initial relevance scores to local features using the Deep Taylor Decomposition principle and systematically distributes these scores back through the network layers. The attention maps, shown in Fig. 4, revealed that our model focuses particularly on regions surrounding major vessels and their adjacent tissue areas. Additional examples of attention map results can be found in supplementary materials.
To quantitatively evaluate the importance of different retinal image components for age prediction, we conducted controlled experiments using modified retinal images. Specifically, we created three types of images: vessel-only images that retain only the vascular structures, vessel-removed images where visible vessels were eliminated while preserving other features, and color-removed images that converted images to grayscale while retaining structural information. For vessel removal, we implemented three approaches: vessel inpainting (filling vessel regions with surrounding content), vessel masking (setting vessel regions to zero), and vessel randomizing (replacing vessel structures with those from unrelated images). Results for the primary method—vessel inpainting—are shown in the main text, while additional results for vessel masking and vessel randomizing are provided in the supplementary file.
The results, summarized in Fig. 3, show that vessel removal (inpainting) had the most significant impact on model performance, increasing the MAE from 2.84 to 4.33 years. This substantial degradation highlights the critical role of vascular structures and their surrounding regions in retinal age prediction. By contrast, color removal had a relatively minor effect, with the MAE increasing to 3.04 years, indicating that structural features contribute more to age prediction than color information. When using vessel-only images, the MAE increased to 3.32 years, suggesting that while vascular patterns are essential, they alone cannot fully account for all age-related features present in retinal images.
To further explore the impact of vessel removal methods, we evaluated vessel masking and vessel randomizing (Supplementary Fig. 5). Vessel masking, where vessels were replaced with zero values, resulted in an MAE of 3.22 years. Despite the removal of vessel content, the residual vessel structures remained visible due to contrast with the surrounding tissue, partially preserving vessel-related spatial information. In contrast, vessel randomizing—where unrelated vessel patterns were introduced—led to the most severe performance drop, with MAE increasing to 5.98 years. This result indicates that biologically inconsistent vascular structures significantly disrupt the model’s ability to extract meaningful age-related features.
To validate the biological relevance of vascular parameters, we applied Automorph34, a validated tool for automated retinal vascular morphology analysis. Significant correlations were observed between age and vascular metrics in healthy participants. Specifically, Fractal Dimension and Vessel Perfusion Density exhibited significant negative correlations with age (P < 0.001), with R2 values of 0.310 and 0.300, respectively, as shown in Fig. 4c. These findings confirm that age-related changes in retinal vasculature are measurable and provide critical biological features for age estimation.
Associations between retinal age gap and age-related diseases
Table 3 describes the baseline characteristics of study participants for clinical value analysis. Among the 45,436 participants, 44.65% were male with mean (SD) ages of 55.67 ± 8.21 years, an interquartile range (IQR) of 49–63 years, and an age range of 40-70 years. Participants in the incidence mortality group tended to be older, more likely male, had higher levels of deprivation, and were more often current or former smokers compared to those in the non-mortality group. Additionally, they were less likely to have a college-level education, less likely to meet physical activity recommendations, and had higher BMI and comorbidity index scores.
During a median follow-up of 11.4 (IQR: 11.3–11.6 years), 1143 cases of death were reported. The duration of follow-up years for thirty diseases slightly differed ranging from 11.3 years to 11.4 years due to the different eligibility of included participants for each disease. During the follow-up time, the number of newly diagnosed cases varied from 39 for multiple sclerosis (MS) to 5106 for dyspepsia. Figure 5c shows the associations between retinal age gaps and risk of mortality. The results revealed that each 1-year increase in retinal age gaps was associated with a 3% increase in the risk of mortality (hazard ratio [HR] = 1.03, 95% confidence interval [CI]: 1.01–1.05, P = 0.003) in the fully adjusted model. Furthermore, every 5-year and 10-year increase in retinal age gaps was associated with a 16% and 35% increase in the risk of mortality, respectively in the fully adjusted model (HR = 1.16, 95% CI: 1.05–1.28; HR = 1.35, 95% CI: 1.10–1.65). We also categorized retinal age gaps into four quarters and retinal age gaps in the third and fourth quartiles exhibited a 29% and 66% increased risk of mortality compared to individuals with first quartiles of retinal age gaps (HR = 1.29, 95% CI: 1.06–1.56; P = 0.012; HR = 1.66, 95% CI: 1.33–2.08; P < 0.001; respectively). Adjusted survival curves showed that higher quartiles of retinal age gaps exhibited lower survival rates compared with the lowest quartiles of retinal age gaps (Fig. 5a).
a Adjusted survival curves for risk of mortality among different quarters of retinal age gaps. HR = hazard ratio; CI = confidence interval. This survival curve adjusted for age, sex, Townsend, education, body mass index(BMI), smoking status, physical activity level, and Charlson Comorbidity index (CCI) scores index. b Model performance comparison for predicting 10-year mortality risk. The horizontal bar chart displays area under the curve (AUC) values with 95% confidence intervals for different prediction models. c Association between retinal age gap with risk of all-cause mortality. Model 1 adjusted for age, gender, and Townsend. Model 2 adjusted for covariates in Model 1 + education, smoking status, physical activity, BMI, and CCI scores.
To contextualize the predictive value of retinal age gap, we compared multiple models for predicting ten-year mortality risk using held-out validation data. As shown in Fig. 5b, the baseline model using only age achieved an AUC of 0.6947 (95% CI: 0.6593–0.7279), while sex alone showed limited discriminative ability (AUC = 0.5457, 95% CI: 0.5123–0.5801). Retinal age demonstrated improved predictive performance with an AUC of 0.7298 (95% CI: 0.6977–0.7620), outperforming the model combining age and sex (AUC = 0.7041, 95% CI: 0.6711–0.7363). The highest discrimination was achieved by the combined model incorporating age, sex, and retinal age gap (AUC = 0.7380, 95% CI: 0.7066–0.7679). Sensitivity analysis showed that the associations between retinal age gap and mortality remained significant after further adjustment of age square and log(age)(Supplementary Fig. 6). In addition, the retinal age acceleration residuals were also significantly associated with incident mortality(Supplementary Fig. 7).
Among thirty age-related diseases, six out of eight cardiometabolic diseases, one out of seven neurological diseases, eight out of twelve other systemic diseases, and all three age-related eye diseases showed significant results after controlling for FDR in the fully adjusted model 2 as shown in Fig. 6 (corrected P values < 0.05). Higher retinal age gaps were associated with a higher risk of hypertension (HR = 1.01, 95% CI: 1.00–1.02), atrial fibrillation (HR = 1.02, 95% CI: 1.00–1.04), heart failure (HR = 1.03, 95% CI: 1.01–1.06), high cholesterol (HR = 1.02, 95% CI: 1.01–1.04), stroke (HR = 1.04, 95% CI: 1.01–1.06), and coronary heart diseases (HR = 1.03, 95% CI: 1.02–1.04). Additionally, every one-year increase in retinal age gaps was associated with a 6% increase in epilepsy risk (HR = 1.06, 95% CI: 1.01–1.10).
Forest plot showing hazard ratios (HRs) and 95% confidence intervals (CIs) per 1-year increase in retinal age gap across multiple disease categories. Analyses were conducted using two models: Model 1 adjusted for age, sex, and socioeconomic status (Townsend index), and Model 2 additionally adjusted for education, smoking status, physical activity, body mass index (BMI), and Charlson Comorbidity Index (CCI).
Other eight systemic diseases including chronic kidney diseases, bronchiectasis, COPD, psoriasis, chronic liver diseases, fracture, dyspepsia, and treated constipation showed significant positive associations with retinal age gaps. Higher retinal age gaps were also associated with higher risks of three age-related eye diseases including cataract (HR = 1.11, 95% CI: 1.09–1.12), glaucoma (HR = 1.09, 95% CI: 1.06–1.11), and AMD (HR = 1.08, 95% CI: 1.05–1.11). Further sensitivity analyses between retinal age gap and age-related outcomes showed similar results (Supplementary Fig. 8 and Supplementary Fig. 9).
To evaluate whether our improved retinal age modeling translates to enhanced clinical utility, we conducted comparative analyses against the baseline Direct Regression model, which had a higher MAE. As illustrated in Fig. 7a and b, our model consistently yielded higher odds ratios across all disease categories: +4.0% for cardiometabolic, +6.3% for neurological, +4.2% for other systemic, and +18.4% for age-related eye diseases. On average, the mean odds ratio improved from 1.10 to 1.17. To further evaluate clinical relevance, we stratified patients into quintiles based on the retinal age gap and computed risk ratios between the highest (Q5) and lowest (Q1) quintiles. As shown in Fig. 7c, d, 73.3% (22 out of 30) of the assessed diseases demonstrated improved risk stratification, with a sample-size-weighted average increase of 2.4% in quintile-based risk ratios. These findings confirm that improved model accuracy leads to measurable benefits in downstream clinical applications.
a Odds ratio comparison between baseline model (Direct regression) and our model, with lines connecting same diseases to highlight paired differences. b Mean odds ratios comparison by disease categories, with percentage improvements. c Risk stratification improvement for individual diseases, showing the percentage increase in quintile-based risk ratio (Q5/Q1) compared to baseline. d Direct comparison of risk ratios between models, with points above diagonal line indicating improvements in disease risk stratification.
Discussion
In this study, we developed a predictive model for retinal age that demonstrated improved accuracy compared to previous methods. This approach also improved the model’s generalizability across diverse populations. We achieved this by incorporating temporal information from longitudinal retinal images and progressively learning age label distributions using data from multiple healthy cohorts. The retinal age gap, defined as the difference between the retina-predicted age and chronological age, was used as a metric to assess deviations from typical aging patterns. Our analysis revealed that higher retinal age gaps were associated with an increased risk of all-cause mortality. Furthermore, retinal age gaps were linked to a range of age-related eye diseases and systemic conditions. These findings support the potential of retinal age as a robust and reliable biomarker of biological aging, with applications in risk stratification for survival and age-related diseases.
Our methodological framework addresses several fundamental limitations in existing approaches to estimating retinal age. The temporal-order pre-training strategy captures dynamic aging patterns by combining masked image modeling with temporal sequence learning, moving beyond the static patterns detected in traditional cross-sectional analyses. To handle population heterogeneity, we developed a cross-domain feature alignment mechanism that ensures consistent embedding across diverse demographic groups while preserving age-related variations. Instead of relying on traditional regression methods, our approach employs label distribution learning, shifting the focus from single-point age predictions to estimating a more nuanced estimation of age distribution, better reflecting the complexity of biological aging. These innovations culminate in an MAE of 2.79 years—a substantial improvement over previous methods that typically exceed 3.5 years14,24,40. This accuracy gain demonstrates the value of integrating longitudinal data with advanced learning strategies.
A fundamental challenge in biological age modeling is the lack of a universal ground truth2. We adopted the commonly used assumption that biological equals chronological age in healthy populations29,41, as these individuals without chronic diseases tend to have normal aging rates. This approach provides a practical reference point by enabling standardized model training and facilitating comparison across studies in biological age estimation. By reducing modeling error and achieving a lower MAE in this healthy reference population, we enhanced our ability to detect deviations that may signal accelerated or decelerated aging. However, this approach has important limitations. Even within apparently healthy populations, individuals may harbor undiagnosed conditions that affect their biological aging rates. Natural variation in aging trajectories can arise from multiple sources, including genetic predisposition, environmental exposures, and lifestyle factors1. Our label distribution learning framework partially addresses these limitations by modeling age as a distribution rather than a point estimate, better reflecting the inherent variability in biological aging. Future studies should explore alternative reference frameworks, such as using longitudinal health outcomes or molecular aging markers, to provide complementary insights into the aging process42.
While achieving lower MAE may be viewed as a modeling benchmark, our findings highlight its direct clinical relevance. First, reducing age prediction error improves the signal-to-noise ratio of the retinal age gap, making it more sensitive to deviations that signify pathological or accelerated aging. A less precise model may introduce noise that obscures these deviations, thereby diminishing the clinical informativeness of the biomarker. Second, our downstream analyses (Fig. 7) demonstrate that improved MAE is accompanied by enhanced disease association strength and risk stratification performance. For example, we observed stronger associations with cardiometabolic, neurological, and ocular diseases, and more robust stratification across risk quintiles—improving discrimination in 22 of 30 conditions with a weighted average gain of 2.4%. These findings empirically validate the clinical value of reducing prediction error in biological age estimation, potentially supporting more effective risk stratification and preventive care.
The clinical significance of our findings extends beyond the improved accuracy in age estimation. The retinal age gap independently predicts all-cause mortality risk after adjusting for traditional risk factors, suggesting it captures unique aspects of biological aging and provides valuable insights into overall health status. Unlike prior studies that focused on single conditions14,28, our comprehensive analysis demonstrates that retinal age gaps were associated with a broad spectrum of age-related diseases, spanning both ocular and systemic pathologies, even after adjusting for chronological age and other risk factors. For ocular conditions, retinal age gaps were strongly associated with three major age-related eye diseases: cataracts, glaucoma, and AMD. This aligns with established evidence of age-related changes in retinal structure—including lens hardening, decreased optic nerve axons, reduced ganglion cells, and basement membrane thickening—which are hallmark features of these conditions43. Beyond ocular pathology, the retina effectively mirrors systemic aging processes. Neuronal cell loss and degeneration, particularly thinning of the retinal nerve fiber layer, are strongly correlated with neurodegenerative diseases22. Similarly, retinal vascular alterations—including venular dilation, arteriolar narrowing, and reduced vessel density and capillary complexity—reflect systemic vascular aging21. These associations establish the retina as a unique window into both local and systemic aging processes.
Although we initially hypothesized a significant role for neurovascular features, the attention maps suggest a nuanced emphasis on perivascular regions. Evidence from OCT-A imaging supports the biological relevance of these areas, highlighting the presence of microvascular structures critical for aging assessment44. To investigate further, we quantified microvascular features, including fractal dimension and vessel density, from segmented vascular images. These features showed a negative association with age, aligning with established findings that microvascular decline is a hallmark of aging. While standard fundus photography inherently limits the resolution of fine vascular details due to its imaging constraints and necessary downsampling for computational efficiency, recent studies34,45 demonstrate that deep learning models can effectively extract meaningful vascular features from fundus images, even at reduced resolutions. To ensure reliability, we implemented standardized quality control34 to preserve visible vessel structure integrity and mitigate artifact-related effects. The model’s performance and interpretation analysis suggest it captures aging-relevant microvascular features by detecting complex patterns in perivascular regions, which correlate with aging processes, rather than directly resolving smaller vessels.
Our findings have several clinical implications. At the individual level, retinal age could be applied in community-based screenings to identify individuals at risk of accelerated aging. Individuals who are exhibiting larger age gaps should be considered to have higher risks of aging and would be more likely to develop mortality and a wide range of age-related diseases. It will provide a reliable screening approach for accelerated aging and facilitate risk stratification of aged individuals in primary care settings and communities, which enables prioritizing targeted populations for anti-aging interventions. Additionally, it can boost the management of age-related chronic conditions by improving patient compliance in real-world scenarios. Considering most people are highly concerned about aging, this biological age indicator could provide a potential communication tool for achieving better health and following physicians’ pharmacy intervention46. Moreover, the retinal age gap representing the overall aging and mortality and morbidity risks may provide a single cross-sectional end-point outcome for anti-aging risk factor exploration in clinical trials.
There are certain limitations in this study. A primary constraint is the inherent resolution limitations of fundus photography, which hinder the precise visualization and analysis of fine microvascular structures that may be critical to understanding aging processes. Although our model achieves competitive accuracy relative to other aging biomarkers6,47,48, there remains scope for reducing prediction errors, particularly in younger and older age groups. Furthermore, while we employed diverse datasets for model development, our analysis of disease associations relies predominantly on UK Biobank data, which may limit the generalizability of findings to broader populations. The reliance on inpatient and mortality records for identifying incident cases likely underestimates true disease incidence, potentially biasing associations toward the null. Lastly, the observational nature of our study design precludes establishing causality between retinal age gaps and health outcomes, underscoring the need for further longitudinal and interventional studies to validate these relationships.
These findings lay the groundwork for several promising future research directions. While fundus imaging offers practical advantages in terms of accessibility and ease of use, investigation of higher-resolution imaging modalities, particularly OCT-A, could provide deeper insights into the vascular features driving age prediction. Technical advances in portable fundus imaging technology may enhance feasibility for home-based monitoring. Prospective studies are needed to identify modifiable factors influencing retinal age and to validate the clinical utility of retinal age assessment in risk stratification and intervention monitoring. Despite current limitations, the combination of improved prediction accuracy, robust disease associations, and practical implementation potential positions retinal age with fungus photography as a valuable biomarker for both clinical practice and aging research.
Methods
Ethical approval and data acquisition
The UK Biobank is a publicly available prospective cohort with over 50,000 UK residents recruited in 2006 (https://biobank.ndph.ox.ac.uk/showcase/browse.cgi) 45° fundus images were introduced in 2009 for the study subjects. Chinese cohort A was the normal participants from the Guangzhou Diabetic Eye Study (GDES) cohort, and Chinese cohort B was historical data recruiting healthy participants from Guangdong Provincial People’s Hospital49. Another database is an ongoing prospective cohort study that enrolled healthy participants. Participants were invited to take part in annual follow-up assessments including fundus images. The study was in accordance with the Helsinki Declaration with informed consent from all participants. Ethics approval for this study was obtained from North West Multi-centre Research Ethics Committee (References: 11/NW/0382) and St Vincent’s Hospital Melbourne (References: 2023/SVH00014).
Dataset
To develop the retinal age prediction model and validate its clinical applications, we utilized retinal fundus photographs from three CHN cohorts and two UKB cohorts, encompassing both healthy and general populations (Fig. 8). The age prediction model was developed on a combined dataset after quality control included 133,894 images from 32,282 healthy individuals, with approximately two-thirds from the CHN cohort and one-third from the UKB cohort. The dataset comprised both disc-centered (N = 15,935) and macula-centered (N = 117,959) images, with UK Biobank images consistently captured as macula-centered (field one) with a 45° field of view (see Supplementary Fig. 2).
Data from three Chinese cohorts and one UK Biobank cohort, all representing healthy individuals, were employed to train the model for estimating retinal age. The general population data from the additional segment of the UK Biobank cohort, encompassing a mix of healthy and unhealthy individuals, were utilized for the clinical disease association analysis.
To ensure statistical independence, we implemented a subject-level split for model development: 23,910 individuals (96,691 images) for training, 2391 individuals (10,732 images) for validation, and 5981 individuals (26,472 images) for testing. Additionally, longitudinal data from a subset of individuals was specifically utilized for temporal-order self-supervised learning pre-training, including 1735 individuals (13,880 images) for training and 416 individuals (3,328 images) for testing. Images from both eyes were included when available to maximize data utilization while maintaining subject-level data splits. For subsequent age-related disease analysis in the general population, we incorporated an additional UK Biobank cohort (UKB cohort B) comprising 45,436 participants.
Image quality control
A standardized quality control process was implemented for all retinal images to ensure anatomical integrity and diagnosability. This process utilized an image quality grading model based on EfficientNet-B450, trained on the EyePACS-Q dataset33, as described in the previous study34. Images were categorized into three quality levels—good, usable, and reject—based on illumination, artifacts, and diagnosability of general eye diseases, as assessed by two expert graders. Following the same settings from literature34, ungradable images were treated as the positive class in quality grading, with the probability of the ungradable category equivalent to that of the reject quality. Conversely, the gradable category probability was defined as the sum of the good and usable quality probabilities. Using thresholds of a mean value of 0.75 and a standard deviation of 0.1, we finalized the classification of images into rectified gradable and ungradable categories. Approximately 25.6% of the images across the combined dataset were classified as poor quality and excluded from further analysis. This proportion varied between cohorts, with 8.3% of the UKB cohort images and 28.4% of the CHN cohort images being ungradable (Supplementary Fig. 3). The final dataset used for model development consisted of 133,895 gradable images (84.4% of the original dataset).
Unlike previous studies that exclusively relied on good-quality images, we also included usable images to enhance generalizability and align with real-world clinical scenarios. While excluding usable images could mitigate challenges posed by suboptimal conditions, such as dim illumination or mild artifacts, it would also reduce the size and diversity of the dataset, potentially limiting the model’s applicability. By incorporating usable images, we aimed to create a dataset reflective of clinical reality, where a significant proportion of images fall into the usable category but may not meet stringent criteria for good quality. This approach ensures that the model is robust and applicable in clinical environments, where obtaining perfectly good-quality images is not always feasible.
DL framework for retinal age estimation
Considering the complicated aging process as well as the ambiguity issue in the age label space, a natural way is to give an age estimate with a particular confidence interval, such as a Gaussian distribution with a particular mean age and a standard deviation35,51. Therefore, we formulate the retinal age estimation as a two-stage label distribution learning problem. As shown in Fig. 1, in the first stage, the model uses global features from the image encoder \({E}_{s}\left(\cdot \right)\) to predict a coarse age distribution on roughly discretized age labels. Each coarse age prediction is associated with a query vector corresponding to an age group. Then, the model performs class attention52 between the age group query and local spatial features to generate fine-level features which are further combined with the coarse age prediction to give refined age predictions. Notably, when using CNNs as the image encoder, the global features and the local spatial features are the outputs from the last global average pooling layer and the last convolutional layer. While for Vits, they are class tokens and patch tokens, respectively. For ViTs, we treat the age group query as a prompt token and perform class attention with both the age group query and local patch features.
Formally, given a dataset with N images \({\mathcal{X}}={\{{x}_{i}\}}_{i = 1}^{N}\), the corresponding age labels \({\mathcal{Y}}={\{{y}_{i}\}}_{i = 1}^{N}\) range in [a, b]. The image encoder first transforms an input image xi into a spatial feature \({{\bf{F}}}_{i}\in {{\mathbb{R}}}^{H\times W\times D}\), and a global feature \({f}_{i}\in {{\mathbb{R}}}^{1\times D}\). We discretize the age classes as \({\hat{y}}_{i}=\,\text{R}\,\left({y}_{i}/{\delta }_{d}\right)* {\delta }_{d}\), where \(\,\text{R}\,\left(\cdot \right)\) denotes the round operator and δd is the age interval for tuning the discretization degree. Therefore, the total discretized age class number is \({C}_{{\delta }_{d}}=\,\text{R}\,\left(\frac{\left\vert b-a\right\vert }{{\delta }_{d}}\right)\). For the coarse-level age estimation, we set a large δd which determines the age group queries as \({{\bf{Q}}}_{coarse}\in {{\mathbb{R}}}^{{C}_{{\delta }_{d}}\times D}\). Then, we use an FC layer with softmax applied on the fi to calculate the coarse age distribution \({p}_{i}\in {{\mathbb{R}}}^{1\times {C}_{{\delta }_{d}}}\).
Different from the previous study24 using fixed label distribution as ground truth, we directly learn the distribution from training data with discretized age labels:
where mi is the expected value of the predicted age distribution pi, The first term is the cross-entropy loss which helps the model converge in an early training stage, the last two terms encourage the learned distribution to be centered and concentrated at the true age labels.
In the refining stage, the mean value of coarse age distribution mi is used to select the age group query from Qcoarse to involve the computation of fine-level feature:
where \(\,\text{A}\,\left(\cdot \right)\) denote the attention block with θa as the parameters. The key and value vectors in the attention function come from Fi. Finally, we concatenate the \({f}^{{\prime} }\) with the mapped coarse age distribution as the final feature embedding to predict the fine-level age distribution with a small δd of 1:
where \(\,\text{f}\,\left(\cdot \right)\) denotes an FC layer with parameters θf, \(\,\text{mlp}\,\left(\cdot \right)\) represents a multilayer perceptron with one hidden layer and the parameter is θm. The training loss is the same with Eq. (1).
Although existing studies38,51 show that formulating regression as a classification task to learn the label distribution yields better performance, the ordinal information of age relations is lost in feature space. This issue becomes more problematic when training data originates from disparate data sources, as the domain variance further disrupts the coherence of the learned features. In Fig. 1 (b), we visualize the intermediate feature learned on our fundus image dataset by data source and age label, respectively. As can be seen, the original model produces scattered and inconsistent features for ordinal age labels, while the features exhibit a clear gap for different data sources.
To address the above issues, we propose to introduce ordinal constraints in the label distribution learning and perform feature alignment to eliminate the domain variance. The key idea of imposing ordinal constraints in embedding space is to construct a set of triplets and enforce the feature distance to be consistent with the relative age gap. Specifically, for each batch of input data {x1, …, xB}, we first compute their pairwise feature distance which outputs a distance matrix \(D\in {{\mathbb{R}}}^{B\times B}\). Then, we construct feature triplets and calculate the distance gap by subtracting the shifted distance matrix \({D}^{{\prime} }\) from the original D. In this case, each sample will have a chance to serve as the anchor to be compared with other samples. We formulate the ordinal constraint as following margin loss:
where \(D\left[\cdot \right]\) denotes metric of Euclidean distance, m is a dynamic margin depends on the relative age gap between \(\left\vert {\hat{y}}_{i}-{\hat{y}}_{j}\right\vert\) and \(\left\vert {\hat{y}}_{i}-{\hat{y}}_{{j}^{{\prime} }}\right\vert\). To align image features from different data sources, we directly select samples from the same class and push them closer in the embedding space by minimizing the intra-class distance on both coarse-level features and fine-level features:
where the \({{\mathbb{I}}}_{c}\left(i,j\right)\) is an indicator function.
Compared to single snapshot images, longitudinal fundus image sequences, which inherently capture changes in the fundus structure over time, provide a more comprehensive depiction of aging processes. Modeling this temporal evolution can potentially improve retinal age prediction. However, temporal fundus images can be limited in practice because the individuals are often lost to follow-up. Learning a temporal model on such a small dataset usually degrades the model’s performance. To address this, we propose a joint pre-training strategy using both limited temporal sequences and abundant snapshot fundus images. We aim to leverage the chronological information from temporal sequences to augment our model’s feature representation.
As illustrated in Fig. 1, the proposed pre-training mechanism unfolds in two phases. Initially, an image encoder is trained with masked image modeling on snapshot images, focusing on spatial detail extraction. Subsequently, we introduce a ranking head atop the encoder, trained to discern the chronological order of temporal image pairs. Notably, during this second pre-training phase, we freeze the encoder’s lower layers to maintain their focus on spatial details, while the upper layers are trained for higher-order semantic understanding and pattern recognition.
Model training
In the absence of a universally accepted standard for biological age determination2, chronological age was adopted as the reference standard in healthy populations. The training dataset consisted of 133,895 retinal images from healthy subjects without documented systemic diseases across four cohorts, including 2937 longitudinal sequences with an average of five images per sequence. The training process includes two-stages, incorporating pre-training and fine-tuning phases to optimize performance.
The pre-training stage focused on learning general retinal features through sequential steps. First, masked image modeling was employed to learn spatial features from snapshot images by randomly masking patches. This process utilized a Vision Transformer Base (ViT-B) architecture with a patch size of 16 × 16, following the settings of ref. 53, and was trained for 100 epochs. Subsequently, temporal order learning was applied to sequential images to capture age-related progression patterns. During this step, the first half of the image encoder was frozen, while the remaining part, along with newly added fully connected layers, was trained for 30 epochs to optimize temporal order capturing.
In the fine-tuning stage, the pre-trained model was refined for retinal age prediction by adding a regression head to the pre-trained image encoder and training the entire network. Input images were resized to 384 × 384 pixels, and data augmentation techniques, including random resized cropping, color transformation, and image flipping, were applied to enhance generalizability. We utilized the ADAM optimizer with a batch size of 80 and a maximum of 50 epochs, with early stopping criteria to prevent overfitting. The initial learning rate was set to 3 × 10−4, with a layer-wise learning rate decay factor of 0.65 and cosine learning rate scheduling to ensure stable convergence. These training parameters were consistently applied to all comparison models and the temporal order learning step.
Evaluation metrics
The performance of the DL prediction model was assessed by its accuracy and the validity in prediction of age-related health outcomes. MAE and Pearson correlation between predicted retinal age and chronological age were used to assess the accuracy. A smaller MAE indicates a higher accuracy of the aging biomarker. Root Mean Square Error (RMSE) was also included to evaluate the accuracy. Retinal age gap defined as the deviation from the normal aging process was calculated as retina-predicted age minus chronological age. The Hazard Ratio (HR) and its corresponding 95% Confidence Interval (CI) were assessed in a Cox regression analysis to assess the associations of retinal age gaps with mortality and multiple morbidities.
Clinical value verification of disease association and mortality prediction
We evaluated the clinical relevance of the retinal age model by analyzing the association between the retinal age gap and both all-cause mortality and a wide spectrum of age-related diseases. The retinal age gap was defined as the difference between model-predicted retinal age and chronological age, with positive values reflecting accelerated biological aging. Mortality outcomes were ascertained through linkage with national death registries.
Disease status was determined using both prevalent and incident diagnoses. Baseline conditions were identified through inpatient hospital records, and participant-reported diagnoses confirmed by healthcare providers. Incident cases were captured via longitudinal hospital admission records and death registries. Each participant was followed from baseline until the earliest of disease onset, death, or censoring due to loss to follow-up. The analysis included 30 conditions spanning four major disease categories. Cardiometabolic diseases encompassed hypertension, atrial fibrillation, heart failure, peripheral vascular disease, hypercholesterolemia, stroke, diabetes, and coronary heart disease. Neurological disorders included anxiety, depression, epilepsy, multiple sclerosis, Parkinson’s disease, schizophrenia, and dementia. Additional systemic conditions—such as chronic kidney disease, chronic obstructive pulmonary disease, and various gastrointestinal disorders—were also evaluated, alongside age-related ocular diseases including cataract, glaucoma, and age-related macular degeneration. Full diagnostic classifications and coding criteria are provided in Supplementary Table 1 and Supplementary Table 2.
We assessed the predictive performance of the retinal age model for 10-year all-cause mortality using data from the UK Biobank. The dataset was randomly split into a training set (70%) and a held-out test set (30%) using stratified sampling to preserve the mortality class distribution. Logistic regression models were developed using individual risk factors (chronological age, sex, retinal age) and their combinations. In particular, retinal age was defined as the sum of chronological age and the model-predicted age gap. Continuous predictors were standardized prior to model fitting, and sex was encoded as a binary variable. To account for class imbalance, we applied inverse frequency-based class weighting during training. Model discrimination was evaluated on the test set using the area under the receiver operating characteristic curve (AUC). For each model, 95% confidence intervals for AUC were estimated via bootstrapping with 1000 iterations to enable robust comparison across predictor sets.
Clinical verification of improved retinal age estimation
To evaluate whether the improved accuracy of our retinal age model (MAE = 2.79 years) enhances clinical utility, we conducted comparative analyses against a baseline model (MAE = 3.35 years) using downstream disease prediction and risk stratification metrics. We assessed associations between retinal age gap and 10-year incidence of 30 age-related diseases using logistic regression. For each disease, we restricted analyses to eligible participants and standardized the age gap. Models were adjusted for chronological age, and odds ratios (ORs) with 95% confidence intervals were estimated for both the improved and baseline models. Percent improvements in OR strength (log OR) were calculated to quantify gains in predictive association across disease categories. To assess risk stratification capability, we divided participants into quintiles based on predicted age gaps. For each disease, we computed incidence within each quintile and derived risk ratios between the top (Q5) and bottom (Q1) quintiles. Similar analyses were performed using extreme groups (top and bottom 10%). Comparisons were made between the improved and baseline models across all diseases with sufficient case counts (≥20). We further examined statistical significance by calculating the proportion of diseases with p < 0.05 for both models. Relative improvements in odds ratios (ORs) and risk ratios were summarized across disease categories, with sample-size weighting used to compute overall improvements in stratification.
Statistical analysis
In Cox regression models, we adjusted for covariates, with Model 1 being adjusted for baseline age, gender, and Townsend, and Model 2 being additionally adjusted for education status, lifestyle factors including smoking status, physical activity, body mass index (BMI), and Charlson Comorbidity index (CCI) score. Physical activity was assessed using self-reported questionnaires from the validated short International Physical Activity Questionnaire (IPAQ). Physical activity was defined as meeting recommendations or not according to the criteria, whether a person met the UK Physical Activity Guidelines of 150 minutes of walking or moderate activity per week, or 75 min of vigorous activity54. The CCI score, a well-established tool, was incorporated into the model to account for major comorbidities55. The CCI is calculated by summing weighted scores assigned to specific comorbid conditions of their relative risk of mortality. An overview of the CCI calculation and the corresponding field IDs is provided in supplementary materials. All variables were assessed for the proportional hazards assumption before they were fitted into Cox proportional hazards regression models. To ensure the robustness of our findings across these multiple comparisons, we applied the Benjamini-Hochberg procedure to control the false discovery rate (FDR) at a 5% level, minimizing the risk of false positives and enhancing the reliability of our results56. Sensitivity analyses were conducted to validate the results. Firstly, additional analysis was performed by adjusting for age square and log(age) in addition to the covariates included in Model 1(Model 3) and the covariates included in Model 2 (Model 4). We also investigated whether retinal age acceleration residuals (calculated as the residual resulting from regressing retinal age on chronological age) were associated with mortality and other age-related outcomes. A two-sided p-value of <0.05 was considered statistically significant. Analyses were performed using R (version 3.3.0, R Foundation for Statistical Computing, www.R-project.org, Vienna, Austria) and Stata (version 13, StatCorp, Texas, USA).
Data availability
The UK Biobank data are available for approved projects (application process detailed at https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access) through the UK Biobank Access Management System (https://www.ukbiobank.ac.uk). We have deposited the derived data fields and model predictions following UK Biobank policy, which will be available through the UK Biobank Access Management System.
Code availability
To develop the Retinal age model we used the Pytorch deep learning framework, available at https://pytorch.org. The code for the development of this study is available for academic and non-commercial use. Researchers can sign the license, complete a code access form provided at our GitHub repository (https://github.com/Zakiyi/RLDL) and contact Z. Y. [zhen.yu@monash.edu] to access the code. For requests from verified academic researchers, access will be evaluated by the code access committee and be granted within one month.
References
Elliott, M. L. et al. Disparities in the pace of biological aging among midlife adults of the same chronological age have implications for future frailty risk and policy. Nat. Aging 1, 295–308 (2021).
Chen, R. et al. Biomarkers of ageing: current state-of-art, challenges, and opportunities. MedComm–Future Med. 2, e50 (2023).
Jylhävä, J., Pedersen, N. L. & Hägg, S. Biological age predictors. EBioMedicine 21, 29–36 (2017).
Jackson, S. H., Weale, M. R. & Weale, R. A. Biological age—what is it and can it be measured? Arch. Gerontol. Geriatr. 36, 103–115 (2003).
Horvath, S. & Raj, K. DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nat. Rev. Genet. 19, 371–384 (2018).
Peters, M. J. et al. The transcriptional landscape of age in human peripheral blood. Nat. Commun. 6, 1–14 (2015).
Lehallier, B. et al. Undulating changes in human plasma proteome profiles across the lifespan. Nat. Med. 25, 1843–1850 (2019).
Argentieri, M. A. et al. Proteomic aging clock predicts mortality and risk of common age-related diseases in diverse populations. Nat. Med. 30, 2450–2460 (2024).
Ieki, H. et al. Deep learning-based age estimation from chest x-rays indicates cardiovascular prognosis. Commun. Med. 2, 159 (2022).
Gaser, C. et al. Brainage in mild cognitive impaired patients: predicting the conversion to alzheimer’s disease. PloS One 8, e67346 (2013).
Yin, C. et al. Anatomically interpretable deep learning of brain age captures domain-specific cognitive impairment. Proc. Natl. Acad. Sci. USA 120, e2214634120 (2023).
Lee, J. et al. Deep learning-based brain age prediction in normal aging and dementia. Nat. Aging 2, 412–424 (2022).
Xia, X. et al. Three-dimensional facial-image analysis to predict heterogeneity of the human ageing rate and the impact of lifestyle. Nat. Metab. 2, 946–957 (2020).
Zhu, Z. et al. Retinal age gap as a predictive biomarker of stroke risk. BMC Med. 20, 466 (2022).
Hu, W. et al. Retinal age gap as a predictive biomarker of future risk of parkinson’s disease. Age Ageing 51, afac062 (2022).
Zhu, Z. et al. Association of retinal age gap with arterial stiffness and incident cardiovascular disease. Stroke 53, 3320–3328 (2022).
Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).
Zhu, Z. et al. Retinal age gap as a predictive biomarker for mortality risk. Br. J. Ophthalmol. 107, 547–554 (2023).
Nusinovici, S. et al. Application of a deep-learning marker for morbidity and mortality prediction derived from retinal photographs: a cohort development and validation study. Lancet Health. Longev. 5 (2024).
Zhu, Z. et al. The Association of Retinal age gap with metabolic syndrome and inflammation.J Diabetes 15, 237–245 (2023).
Flammer, J. et al. The eye and the heart. Eur. heart J. 34, 1270–1278 (2013).
London, A., Benhar, I. & Schwartz, M. The retina as a window to the brain—from eye research to CNS disorders. Nat. Rev. Neurol. 9, 44–53 (2013).
Zhu, Z. et al. Oculomics: Current concepts and evidence. Prog. Retin. Eye Res. 101350 (2025).
Liu, C. et al. Biological age estimated from retinal imaging: a novel biomarker of aging. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22, 138–146 (Springer, 2019).
L¢pez-Ot¡n, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. Hallmarks of aging: an expanding universe. Cell 186, 243–278 (2023).
Ferraro, K. F., Kemp, B. R. & Williams, M. M. Diverse aging and health inequality by race and ethnicity. Innov. aging 1, igx002 (2017).
Melzer, D., Pilling, L. C. & Ferrucci, L. The genetics of human ageing. Nat. Rev. Genet. 21, 88–101 (2020).
Zhang, S. et al. Association of retinal age gap and risk of kidney failure: a UK biobank study. Am. J. Kidney Dis. 81, 537–544 (2023).
Grimbly, M. J. et al. Estimating biological age from retinal imaging: a scoping review. BMJ Open Ophthalmol. 9, e001794 (2024).
Peng, Q. et al. Predictive Potential of Retina‐Based Biological Age in Assessing Chronic Obstructive Pulmonary Disease Risk. Clin. Exp. Ophthalmol. (2025).
Nusinovici, S. et al. Retinal photograph-based deep learning predicts biological age, and stratifies morbidity and mortality risk. Age Ageing 51, afac065 (2022).
Yu, Z. et al. Retinal age estimation with temporal fundus images enhanced progressive label distribution learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 629–638 (Springer, 2023).
Fu, H. et al. Evaluation of retinal image quality assessment networks in different color-spaces. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part I 22, 48–56 (Springer, 2019).
Zhou, Y. et al. Automorph: automated retinal vascular morphology quantification via a deep learning pipeline. Transl. Vis. Sci. Technol. 11, 12–12 (2022).
Pan, H., Han, H., Shan, S. & Chen, X. Mean-variance loss for deep age estimation from a face. In Proc. of the IEEE conference on computer vision and pattern recognition, 5285–5294 (2018).
Li, W., Huang, X., Lu, J., Feng, J. & Zhou, J. Learning probabilistic ordinal embeddings for uncertainty-aware regression. In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13896–13905 (2021).
Cao, W., Mirjalili, V. & Raschka, S. Rank consistent ordinal regression for neural networks with application to age estimation. Pattern Recognit. Lett. 140, 325-331 (2020).
Zhang, S., Yang, L., Bi Mi, M., Zheng, X. & Yao, A. Improving deep regression with ordinal entropy. In The Eleventh International Conference on Learning Representations, https://openreview.net/forum?id=raU07GpP0P (2023).
Chefer, H., Gur, S. & Wolf, L. Transformer interpretability beyond attention visualization. In Proc. of the IEEE/CVF conference on computer vision and pattern recognition, 782–791 (2021).
Ahadi, S. et al. Longitudinal fundus imaging and its genome-wide association analysis provide evidence for a human retinal aging clock. Elife 12, e82364 (2023).
Chen, R. et al. Deep neural network-estimated age using optical coherence tomography predicts mortality. GeroScience 46, 1703–1711 (2023).
Moqri, M. et al. Validation of biomarkers of aging. Nat. Med. 30, 360–372 (2024).
Salvi, S., Akhtar, S. & Currie, Z. Ageing changes in the eye: this article is part of a series on ageing edited by Professor Chris Bulpitt. Postgrad. Med. J. 82, 581–587 (2006).
Spaide, R. F., Fujimoto, J. G., Waheed, N. K., Sadda, S. R. & Staurenghi, G. Optical coherence tomography angiography. Prog. Retin. Eye Res. 64, 1–55 (2018).
Zekavat, S. M. et al. Deep learning of the retina enables phenome-and genome-wide analyses of the microvasculature. Circulation 145, 134–150 (2022).
Chen, R. et al. Glycemic status and its association with retinal age gap: Insights from the UK biobank study. Diabetes Res. Clin. Pract. 2023;202:110817.
Weidner, C. I. et al. Aging of blood can be tracked by DNA methylation changes at just three CpG sites. Genome Biol. 15, 1–12 (2014).
Cole, J. H. et al. Brain age predicts mortality. Mol. Psychiatry 23, 1385–1392 (2018).
Zhang, S. et al. Design and baseline data of the diabetes registration study: Guangzhou diabetic eye study. Curr. Eye Res. 48, 591–599 (2023).
Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, 6105–6114 (PMLR, 2019).
Li, Q. et al. Unimodal-concentrated loss: Fully adaptive label distribution learning for ordinal regression. In Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 20513–20522 (2022).
Touvron, H., Cord, M., Sablayrolles, A., Synnaeve, G. & Jégou, H. Going deeper with image transformers. In Proc. of the IEEE/CVF International Conference on Computer Vision, 32–42 (2021).
He, K. et al. Masked autoencoders are scalable vision learners. In Proc. of the IEEE/CVF conference on computer vision and pattern recognition, 16000–16009 (2022).
Marinšek, M., Bedenik, K. & Tekavc, J. Cross-cultural adaptation of the international physical activity questionnaire (IPaq) for use in education. Slovenian Med. J. 91, 355–362 (2022).
Charlson, M. E., Pompei, P., Ales, K. L. & MacKenzie, C. R. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J. Chronic Dis. 40, 373–383 (1987).
Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N. & Golani, I. Controlling the false discovery rate in behavior genetics research. Behav. Brain Res. 125, 279–284 (2001).
Acknowledgements
This project received grant funding from the NHMRC Investigator Grant (APP2010072, APP2041559). The contents of the published material are solely the responsibility of the Administering Institution, a participating institution or individual authors and do not reflect the views of the NHMRC. Z. Ge is supported by Airdoc Research Australia Centre Funding and NVIDIA AI Technology Centre. This work was supported by the Global STEM Professorship Scheme (P0046113), the Fundamental Research Funds of the State Key Laboratory of Ophthalmology, Project of Investigation on Health Status of Employees in Financial Industry in Guangzhou, China (Z012014075). The Centre for Eye Research Australia receives Operational Infrastructure Support from the Victorian State Government. R.C. is supported by the Melbourne Research Scholarship established by the University of Melbourne. The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Author information
Authors and Affiliations
Contributions
All authors contributed extensively to this work. Z.G., Z.Z., and M.H. co-supervised the research project, conceptualized the research scope, managed project administration, and oversaw funding acquisition and algorithm validation. Z.Z., Z.G., Z.Y., and R.C. participated in the algorithm design. Z.Y. conducted the image quality control, algorithm development, and model interpretation. R.C. took the lead in cleaning and preprocessing the data, performing the clinical verification analysis, and leading the clinical results interpretations. Z.Y. and R.C. co-drafted the manuscript. P.G. and W.W. supported model evaluation. I.R. and H.A. assisted in methodology development. X.Z. and X.S. contributed to validation. L.Z., X.Y., H.Y., and W.H. provided additional feedback on the data analysis. H.L. and P.v.W. contributed to the manuscript review. All other authors reviewed and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Yu, Z., Chen, R., Gui, P. et al. A cross population study of retinal aging biomarkers with longitudinal pre-training and label distribution learning. npj Digit. Med. 8, 344 (2025). https://doi.org/10.1038/s41746-025-01751-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41746-025-01751-7










