Main

Stroke is one of the leading causes of death and long-term disability worldwide1. Conventional stroke risk assessments rely on using information about clinical risk factors, mostly from self-reported data such as smoking and history of ischaemic stroke2,3,4,5, and fall short in accurately identifying those at risk. Prior studies demonstrated that the accuracy of conventional stroke risk prediction models was just modest (concordance index (C-index) 0.58–0.73), especially in multiethnic populations2,6.

Brain imaging such as magnetic resonance imaging (MRI) scans may detect the presence or absence of subclinical cerebrovascular disease7, and incorporating these brain imaging features into risk assessment may be helpful to more precisely identify individuals at high risk of stroke7,8. For example, silent brain infarctions (SBIs), which affect nearly 20% of the general population9,10,11,12, indicate underlying ischaemic cerebrovascular disease and are associated with an increased risk of future stroke12,13. Thus, detection of SBI even in asymptomatic patients could potentially allow physicians to refine stroke risk classification and allow patients to be better managed. Scientific statements from the American Heart Association and American Stroke Association suggest that patients with SBI should follow primary prevention guidelines to prevent symptomatic stroke5,14,15 (Fig. 1a). However, identification of SBIs relies primarily on brain imaging such as MRI and computed tomography (CT), which is impractical and not cost-effective for general stroke screening. Thus, the American Heart Association and American Stroke Association do not recommend screening asymptomatic general population with MRI to detect SBI14. This underscores a key clinical gap—how to detect SBI in a simple and cost-effective manner in the general population without the need for brain imaging scans5,14,16.

Fig. 1: Study design of the DeepRETStroke system.
figure 1

a, The established primary care workflow: either high-risk patients or those with SBI were recommended to follow guideline-based primary prevention strategies. b, A schematic overview of the DeepRETStroke system. The DeepRETStroke system encodes a domain-specific foundation model representing eye–brain connections, which can be applied to several downstream tasks such as SBI detection and future stroke prediction. EHR, electronic health record. c, Multicentre datasets used to develop and validate the DeepRETStroke system. d, The design of the real-world prospective observational study to evaluate patient outcomes. pys, person-years. Created with BioRender.com.

Recent advances in medical imaging and deep learning (DL) have highlighted the retina as a unique window to the brain17. The retinal vasculature shares embryological, anatomical and physiological similarities with the cerebral vasculature, offering a non-invasive surrogate to detect and predict early cerebrovascular changes18. Retinal photography as a non-invasive retinal imaging approach is now widely used across various clinical and community settings for screening eye diseases such as diabetic retinopathy19, but has also been used with DL techniques to detect various systemic and neurological conditions20,21,22.

We introduce DeepRETStroke, a DL system that encodes a domain-specific foundation model for representing eye–brain connections. It utilizes retinal photographs for downstream clinical tasks of detecting SBI and predicting future stroke events, demonstrating the capability of this retinal image-based system to enhance stroke risk assessment in the community. In multiethnic and multicountry datasets, we conducted our study by three stages. First, we used DeepRETStroke to detect SBI from retinal images. We then used SBI features detected from retina to improve incident stroke risk prediction and fine-tune models to predict recurrent stroke. Finally, we conducted a real-world proof-of-concept study to demonstrate the effectiveness of DeepRETStroke in guiding stroke prevention strategies compared with clinical risk prediction models.

Results

Study sample and modelling strategy

The DeepRETStroke system was pretrained using 895,640 retinal photographs from the Shanghai Integrated Diabetes Prevention and Care System (Shanghai Integration Model) and the China National Diabetic Complications Study. The main aim of this study was twofold: (1) to output participant-level SBI detection results (that is, SBI or no SBI) and (2) to output participant-level incident stroke risk results (that is, 5-year stroke risk), by recognizing SBI features from fundus images through knowledge transfer during the joint training. We also fine-tuned the model to enable the prediction of recurrent stroke. Therefore, we trained, validated and tested the DeepRETStroke system for detecting SBI and predicting incident or recurrent stroke from retrospectively collected retinal photographs and clinical traits from diverse datasets from China, Singapore, Malaysia, the USA, the UK and Denmark. Figure 1b gives an overview of the construction and validation of DeepRETStroke (that is, fundus model). To evaluate the performance, we then developed the metadata model and the combined model for comparing with the DeepRETStroke system. For the SBI detection, the metadata model was a logistic-regression classifier with a series of cardiovascular risk factors that were available in these datasets at baseline (more details provided in Methods). For the incident/recurrent stroke prediction, the metadata model was a Cox-proportional hazards model with the same risk factors as those used in the SBI detection task. The combined model was based on both the fundus model and the metadata model. Figure 1c shows the entire validation design of DeepRETStroke in multicountry datasets. Finally, we conducted a prospective real-world study to test the effectiveness of the DeepRETStroke system in predicting recurrent stroke (Fig. 1d). For all participants, we used two macular-centred retinal photographs (that is, one for each eye). More details on image quality control and retinal photograph enhancement can be found in Methods.

SBI detection

For the detection of SBI, we developed and internally validated the DeepRETStroke system based on the participants with MRI scans from the Shanghai Diabetes Prevention Program (SDPP) cross-sectional study and externally validated it on five multiethnic datasets. The clinical characteristics of participants are summarized in Table 1. As shown in Fig. 2 and Extended Data Table 1, the fundus model performed well in the internal datasets with an area under the curve (AUC) of 0.797 (95% confidence interval (CI) 0.500 to 0.995), higher than that of the metadata model (0.633, 95% CI 0.533 to 0.748). The sensitivity of the fundus model was 0.800 (0.333 to 1.000) and the specificity was 0.781 (0.730 to 0.830). In independent external datasets, the fundus model achieved AUCs ranging from 0.751 to 0.792, demonstrating the accurate detection performance of DeepRETStroke.

Table 1 Characteristics of the developmental, internal and external validation datasets for the detection of SBI
Fig. 2: Performance of the DeepRETStroke system for the detection of SBI.
figure 2

a, The AUROC of the DeepRETStroke system for the detection of SBI. b, The sensitivity of the DeepRETStroke system for the detection of SBI. c, The specificity of the DeepRETStroke system for the detection of SBI. The error bars represent bootstrapped (n = 1,000) 95% CIs. PRECISE, Polyvascular Evaluation for Cognitive Impairment and Vascular Events; MeLODY, Multiethnic Lifestyle, Obesity, and Diabetes Registry in Malaysia; UKB, UK Biobank; I-OPTA, Identification of patient-reported barriers to treatment with anti-VEGF for neovascular AMD.

Source data

Incident stroke prediction

For the prediction of incident stroke, the DeepRETStroke system was developed and internally validated based on the participants without MRI scans from the SDPP cohort (that is, internal dataset), using the soft labels obtained in the recognition of SBI features, allowing the incorporation of SBI features to augment stroke risk prediction. Then, DeepRETStroke was validated on 11 external multiethnic datasets. The clinical characteristics of the cohort participants are summarized in Table 2. Figure 3 and Extended Data Table 2 show the model performance of DeepRETStroke in predicting incident stroke over a 5-year period. On the internal validation dataset, the fundus model achieved an AUC of 0.901 (95% CI 0.846 to 0.940) and C-index of 0.910 (95% CI 0.853 to 0.950). On the external validation datasets, the AUCs ranged from 0.728 to 0.895. Of note, the AUCs of the fundus model outperformed the metadata model, indicating the possibility of inputting only fundus images to make an accurate prediction of incident stroke via the DeepRETStroke system.

Table 2 Characteristics of the developmental, internal and external validation datasets for the prediction of incident stroke
Fig. 3: Performance of the DeepRETStroke system for the prediction of incident stroke.
figure 3

a, The time-dependent AUROC for the fifth year of the DeepRETStroke system for the prediction of incident stroke. b, The C-index of the DeepRETStroke system for prediction of incident stroke. The error bars represent bootstrapped (n = 1,000) 95% CIs. CUHK-STDR, The Chinese University of Hong Kong-Sight-Threatening Diabetic Retinopathy; SEED, the Singapore Epidemiology of Eye Diseases study; MeLODY, the Multiethnic Lifestyle, Obesity, and Diabetes Registry in Malaysia Diabetes Registry in Malaysia cohort; UKB, UK Biobank; NICOLA, The Northern Ireland Cohort for the Longitudinal study of Ageing; I-OPTA, Identification of patient-reported barriers to treatment with anti-VEGF for neovascular AMD.

Source data

In addition, we used time-dependent analysis at 1–5 years to assess the prognostic accuracy of the three models for stroke prediction. The results of all validation cohorts are shown in Extended Data Fig. 1. Most AUCs of the fundus model were higher than those of the metadata model (that is, clinical traits). In the results of external validation, the performance of the metadata model in the first two years was inconsistent with the fundus model across multiple external datasets, but in the last three years, most AUCs of the fundus model were better than those of the metadata model, which reflects the high concordance and strong calibration of DeepRETStroke system in the long run.

Furthermore, we conducted subgroup analyses for incident stroke prediction according to the baseline health status of the cohort participants (diabetes, hypertension and carotid atherosclerosis). As shown in Extended Data Table 3, the ability of the model to predict incident stroke was consistent among participants with and without diabetes, hypertension and carotid atherosclerosis in the internal cohort and most external validation cohorts. These results demonstrated robust performances of our DeepRETStroke system in predicting incident stroke.

Recurrent stroke prediction

To broaden the application scope and enhance the versatility of the system, the model is further fine-tuned to enable the prediction of recurrent stroke. Extended Data Table 4 presents the clinical characteristics of the cohort participants in developing and validating the system. Extended Data Table 5 shows the model performance of predicting recurrent stroke in 5 years. The fundus model achieved an AUC of 0.769 (95% CI 0.375 to 1.000) and the combined model achieved 0.833 (95% CI 0.500 to 1.000), on the internal dataset. Likewise, the AUCs of the fundus model on the external dataset are higher than those of the metadata model (0.727 versus 0.705).

Real-world prospective exploratory study

To further evaluate the outcome of the integration with clinical workflows, we additionally conducted a real-world study within a prospective cohort, where 218 patients with prior stroke or SBI received either integrated management (IM; integrated hospital-community management programme) or not, according to participant preference and clinical considerations (Fig. 1d). There were 56 participants in the IM group and 162 participants in the non-IM group (Supplementary Table 1). Participants in the IM group were provided with regular clinical and metabolic measurements, advised by specialists in comprehensive hospitals and received lifestyle guidance and peer support at community health service centres. Participants in this programme were followed up for stroke event over a 1-year period. Both the fundus model and the metadata model for recurrent stroke divided the IM group and the non-IM group into low-risk and high-risk groups according to cohort median risk23. We calculated the adjusted relative reduction (aRR) of incidence between the DeepRETStroke fundus model and the metadata model (Table 3). After adjustment for covariates including demographics, anthropometric indices and biochemical measurements, in the IM group, the difference in the incident stroke events between the fundus model and metadata model was not statistically significant in both the low-risk group (aRR −38.86%, 95% CI −91.5% to 297.93%) and the high-risk group (aRR 48.09%, 95% CI −62.42% to 598.57%). However, patients from the fundus high-risk group of the non-IM group had more incident stroke events compared with the metadata high-risk group (202.17 versus 53.93 per 1,000 person-years, aRR 543.61%, 95% CI 53.68% to 2,572.37%), while the fundus low-risk group had fewer incident stroke events compared with the metadata low-risk group (27.14 versus 136.94 per 1,000 person-years, aRR −97.14%, 95% CI −99.49% to −90.81%). Under comprehensive interventions (that is, intensive intervention for the high-risk group and non-intensive intervention for the low-risk group, stratified by median risk), compared with the metadata model based on clinical traits, the fundus model was associated with 82.44% (95% CI 1.58% to 324.47%) fewer recurrent stroke events (Table 3).

Table 3 Associations between risk identification model and recurrent stoke events

Explainability analysis

We then visualized the interpretability of the DeepRETStroke system. We utilized GradientShap24 and the occlusion method25 for visualizing the interpretability of the output prediction. The interpretability summary plot of fundus images for SBI detection and incident stroke prediction are shown in Extended Data Figs. 2 and 3, respectively, highlighting the key anatomical structures associated with SBI lesions and stroke, such as retinal vasculature.

Discussion

Improved precision in stroke risk prediction with simpler, practical and cost-effective methods will lead to reduced morbidity, disability and mortality related to stroke in the general population. To address this important public health and clinical question, we developed a retinal image-based DL system—DeepRETStroke—to detect SBI, a subclinical cerebrovascular disease phenotype, and to use this information to augment stroke risk prediction. Using a large diverse multicountry, multiethnic dataset from China, Singapore, Malaysia, the USA, the UK and Denmark, we trained, validated and externally tested DeepRETStroke. Our main findings are, first, that DeepRETStroke was able to precisely detect the presence of SBI with an AUC of 0.797. Second, by incorporating SBI detection, DeepRETStroke could predict up to 5-year risk of stroke, with an AUC of 0.901 for incident stroke and an AUC of 0.769 for recurrent stroke. We showed largely consistent results in external validation cohorts, in which DeepRETStroke was also able to effectively detect SBI and predict stroke with improved discrimination compared with clinical traits (that is, metadata). Finally, compared with the current assessment based on clinical traits, we showed in a prospective proof-of-concept study that information from DeepRETStroke was able to stratify stroke risk, which was associated with 82.44% (95% CI 1.58% to 324.47%) less recurrent stroke events with appropriate comprehensive interventions.

While some previous studies have investigated the potential of using retinal imaging and artificial intelligence (AI) techniques to assess conventional risk factors and to predict vascular risk including stroke26,27,28,29,30, our study was unique in at least four important aspects. First, we focused on detecting preclinical and subclinical cerebrovascular disease in the form of SBI, which may affect up to 20% of the population. Detection of SBI and the information from its detection allow DeepRETStroke to more directly and precisely predict future clinical stroke events. This contrasts with other studies using traditional DL algorithms to use relatively simplistic binary outcome labels (presence versus absence of stroke), which may lead to an underutilization of relevant subclinical cerebrovascular phenotypes and imaging characteristics. As a result, the effectiveness of some of these algorithms in predicting stroke outcomes has been, at best, moderate31. By leveraging information from brain imaging features such as SBI, our DeepRETStroke showed superior performance in stroke risk prediction. Second, our study incorporated large-scale, international representative multiethnic cohorts and had notably longer follow-up of up to 5 years to predict incident stroke. Third, DeepRETStroke could also predict recurrent stroke in patients with prior stroke, a clinical gap that is useful to stroke neurologists. Finally, we validated findings in an independent prospective cohort, showing that using DeepRETStroke algorithms may help to identify high-risk populations for implementing preventive strategies that may lead to lower risk of future stroke.

Growing evidence suggests compelling biological connections between the brain and the eye that warrant consideration in our study. First, the retina, as an extension of the central nervous system, offers a unique opportunity for non-invasive assessment of cerebrovascular health32,33,34. Second, retinal imaging provides valuable insights into microvascular pathology35, reflecting systemic vascular conditions such as hypertension36, diabetes37 and atherosclerosis36,38. Third, SBIs and strokes share common underlying vascular aetiologies, including small vessel disease and cerebral microinfarcts39. Therefore, given the anatomical and physiological similarities between retinal and cerebral vasculature, alterations observed in retinal microvasculature, such as arteriolar narrowing, venular dilation and microaneurysms, may serve as biomarkers for the early detection of SBI and stroke40,41,42,43,44.

SBI, in particular, which has been reported to affect nearly 20% of individuals in the community9,10,11,12, frequently remains unidentified until an incidental brain MRI discovery10. Because of its prevalence, it is common to misclassify individuals with undetected SBIs as ‘healthy’ and ‘low risk’, which could underestimate their stroke risk45,46,47. However, given the high cost and limited availability of MRI, screening the large-scale asymptomatic general population with MRI to detect SBI is prohibitively expensive and impractical48. Therefore, our DeepRETStroke system, which uses a non-invasive retinal imaging modality35, may serve as an assistive and initial screening tool for SBI detection to augment stroke risk prediction in the community without the need for brain imaging.

Strengths of our study include the following. Our study had one of the largest pretrained dataset of retinal photographs and used multiethnic, international cohorts for external validation. Moreover, our algorithm, by learning from small-scale imaging data and using knowledge transfer enhances stroke risk prediction through the recognition of SBI features. However, there are also several limitations that may merit further considerations. First, although we have tested its generalizability in multiple datasets, the dataset used for model development was a solely Chinese cohort, due to difficulties in data sharing of primary retina and MRI imaging data between countries. Second, some inherent biases, including selection bias, unbalanced training data, bias in human labelling, racial and ethnic bias, and unknown confounders, cannot be eliminated and evaluated in our data. Third, as the labelling of our training dataset was based on clinician-derived diagnosis, potential variability in labelling definitions and protocols across cohorts exists, which may have the adverse effect on the development and validation of the DL algorithm. Nevertheless, the semi-supervised learning strategy adopted in DeepRETStroke development can enhance the training set with more easily diagnosable samples, which to some extent mitigate this concern. Fourth, our prospective cohort may be limited by its sample size. Further prospective studies are needed to validate the outcome of the integration with the DeepRETStroke system into clinical practice.

In conclusion, we trained, validated and externally tested a retinal image-based DeepRETStroke system to detect SBI, a common subclinical cerebrovascular disease phenotype, and used this information to augment a prediction algorithm of incident stroke as well as recurrent stroke. We showed the potential of DeepRETStroke in a proof-of-concept prospective cohort study to enhance stroke risk stratification that could be used to guide stroke prevention strategy, without the need for brain imaging.

Methods

Ethical approval

The study received approval from the Ethics Committee of Shanghai Sixth People’s Hospital (approval no. 2023-KY-023 [K]). For the development and validation of the DeepRETStroke system, deidentified retrospective data were used, without the active involvement of participants. For the real-world prospective study, all participants provided informed consent before their involvement. All included studies adhered to the tenets of the Declaration of Helsinki and had respective local ethical committee approval.

Study sample

The primary objective of this DeepRETStroke system is to utilize retinal images for both the detection of prevalent SBI and the prediction of incident stroke. The model is further fine-tuned to enable the prediction of recurrent stroke. For SBI detection, we used retinal photographs from six independent datasets for model development and validation. We included participants who underwent retinal photography, brain MRI or CT and had no history of overt stroke. Similarly, for incident stroke prediction, we used retinal photographs from 12 independent datasets for model development and validation. For this task, we included participants who underwent retinal photography, had no history of stroke and had been followed for certain period of time. Furthermore, for recurrent stroke prediction, we used retinal photographs from two independent datasets for model development and validation. For this task, we included participants who underwent retinal photography and had prior stroke history. Image-level data were further filtered via image quality control, and there is no participant-level or image-level overlap between the developmental and validation sets. Details of these datasets are demonstrated in Supplementary Methods and Supplementary Table 2.

Image quality control

The retinal images were captured using a variety of standard fundus cameras, including Topcon TRC-NW6 (Topcon) and Canon CR1–Mark II (Canon). Following the exclusion criteria proposed by Carol et al.21, we selected the fundus images as follows: if more than 25% of the peripheral area of the retina was unobservable or the central region of the retina had substantial artefacts that would affect analysis, the photograph was excluded from the dataset. After the image quality control, fundus images were transferred to the AI team to develop and validate our DeepRETStroke system.

Fundus image enhancement

To extract non-specific vascular features that are highly related to our target vascular-related systemic diseases on the fundus photo, a series of image enhancements were proposed to improve the performance of our deep model. In the first step, contrast-limited adaptive histogram equalization was used to enhance the contrast of the image while suppressing noise49. We first transformed the input fundus photos from RGB (red, green, blue) to LAB (lightness, green–red, blue–yellow) colour space and divided the images into fixed-size pieces. Then, contrast-limited adaptive histogram equalization was applied on the lightness channel of each piece with its own unique distribution, respectively. After that, the processed images were converted back to RGB space. In the second step, colour normalization was utilized to reduce the colour changes between fundus images caused by different conditions, including the models of the fundus camera, the photography settings, the exposure changes and so on. Following the method proposed by the Kaggle Diabetic Retinopathy Detection Competition Report50, each fundus image was processed with a Gaussian filter (\({P}_{\rm{c}}=\alpha \times P+\beta \times {\rm{Gauss}}(P,s)+\delta\)), where \(P\) represents the fundus image, \({{\mathrm{Gauss}}}(P,s)\) represents applying a Gaussian filter with a standard deviation of \(s\) on \(P,\) and α, β and δ represent the parameters. We used \(\alpha =4,\beta =-4,\,s=5\) and \(\delta =128\). After that, every enhanced fundus image was resized to 512 × 512 for the following training and validation.

Definition and criteria for disease diagnosis

In our study, diagnoses of SBI were made on the basis of clinical diagnostic criteria as follows: physician-diagnosed cerebral infarcts based on brain CT or MRI without any corresponding stroke episode (that is, self-reported history of stroke)51 (Supplementary Table 3). Diagnoses of stroke were made on the basis of American Heart Association and American Stroke Association criteria52 and were ascertained as follows: any physician-diagnosed fatal or non-fatal stroke reported by self-report or documented in hospital records during the follow-up period, provided there was no history of overt stroke at baseline. Diagnoses of recurrent stroke were defined as follows: any physician-diagnosed stroke occurring during the follow-up period in individuals with a history of stroke diagnosed at baseline.

Development of the DeepRETStroke system

The DeepRETStroke system encodes a domain-specific foundation model representing eye–brain connections, which was built on a three-stage pretraining strategy. The development of the system was completed according to steps shown in Extended Data Fig. 4. The network architecture is shown in Supplementary Fig. 1. For the first stage, RETFound31 was used as the primitive encoder of our system, and Masked AutoEncoder53—an unsupervised learning algorithm—was used to improve the model’s generalizability across different racial groups with two large Chinese datasets—the Shanghai Integration Model cohort (173,346 participants with 693,384 images) and the China National Diabetic Complications Study cohort (50,564 participants with 202,256 images)54.

For the second stage, we adopted a model initialization step to address the cold start challenge in the next phase of semi-supervised learning. Here, the SDPP dataset (specifically refered to the participants without MRI scans from the SDPP cohort) was used to train the encoder and Stroke Predictor for incident stroke prediction by predicting the risk of onset for the next 5 years. This allows the model to acquire prior knowledge about eye–brain connections before progressing to the subsequent stage.

Based on this foundation, we initiated the third stage to further enhance the representation capabilities of our domain-specific foundation model. Specifically, there existed training iterations to continuously optimize the Encoder of the system, with each iteration consisting of two steps: ‘semi-supervised learning’ and ‘knowledge transfer’. At the same time, the ‘semi-supervised learning’ step in an iteration consists of several rounds, with each round consisting of two steps: ‘update of labelled database’ and ‘update of SBI Detector’. For a training iteration, the first step was to perform semi-supervised learning with several training rounds. We adopted the collaborative training strategy in semi-supervised learning55 by treating the left and right eye images of all participants as two sufficiently redundant and conditionally independent views. Before the start of the semi-supervised learning step, the labelled database contained only samples from the SDPP-MRI dataset (specifically refered to the participants with MRI scans from the SDPP cross-sectional study), while the unlabelled database contained all samples from the SDPP dataset. After the start of the semi-supervised learning step, for one training round, we first used the labelled database to update SBI Detector and then used this model to make predictions on the unlabelled database. After prediction, we selected samples with high prediction confidence from the results and labelled them with ‘pseudo-labels’. Here, based on our internal validation database, we chose the minimum predictive probability score that can maintain the predictive precision of positive samples (the proportion of true positive samples in the samples predicted as positive) above 0.75 as the high prediction confidence standard for positive samples, and the maximum predictive probability score that could maintain the prediction precision of negative samples (the proportion of true negative samples in the samples predicted as negative) above 0.75 as the high prediction confidence standard for negative samples. These samples were removed from the unlabelled database and added to the labelled database. Then, we used this expanded labelled database to further update SBI Detector. Therefore, along with the training rounds, samples from the unlabelled database were continuously added to the labelled database until there are no samples in the unlabelled database or no samples that could be assigned pseudo-labels, which also indicates the end of the semi-supervised learning step. Afterwards, the updated SBI Detector made predictions for the entire SDPP dataset, directly assigning the predicted ‘disease probability distribution’ as a ‘soft label’ to serve as the information on possible SBI at baseline for each sample, and starting the ‘knowledge transfer’ step. In this step, we once again used the SDPP dataset for incident stroke prediction. But, unlike the model initialization step, here we used the soft labels to construct a new auxiliary task along with the primary prediction task for joint training. To be specific, SBI Learner was trained to fit the probability distribution of SBI at baseline indicated by the soft label, while Stroke Predictor was trained to predict the future incident stroke. Through this approach, the encoder could simultaneously learn the cerebrovascular conditions of a certain participant both at baseline and during the follow-up period, thereby improving the prediction of incident stroke. Afterwards, the encoder optimized in this knowledge transfer step could be regarded as an improved feature extractor for the next training iteration before the entire system development was ended. It is worth noting that we did not directly let the encoder learn labelled data in SBI detection but instead used the soft labels output by SBI Predictor to update the encoder in stroke risk prediction. This is because, in the early stage of semi-supervised learning in SBI detection, only a small amount of labelled data can be used as training samples, and updating the encoder at this time poses a substantial risk of overfitting. For incident stroke risk prediction, the synchronized prediction task of incident stroke could be regarded as a constraint for soft label learning, thus reducing the possibility of overfitting of the encoder.

After this three-stage pretraining, the domain-specific foundation model that represents eye–brain connections has been successfully developed. Therefore, for the other cerebrovascular related diseases, the encoder of our system can also be used as a pretrained encoder with strong prior knowledge, thereby playing a certain gain role in the training of other detection or prediction tasks. On this basis, when it comes to recurrent stroke prediction, we selected the developed encoder and Stroke Predictor as a pretrained model to perform fine-tuning on the participants from Nicheng Diabetes Screening Project (NDSP) who had stroke history at baseline, thus developing a specific model for recurrent stroke prediction.

Development of the metadata model and combined model

To evaluate the performance of the DeepRETStroke system, two other models with different sets of input data were developed, the metadata model and the combined model. For the SBI detection, the metadata model was a logistic-regression classifier with a series of conventional cardiovascular risk factors at baseline, including age, gender, smoking status (yes/no), body mass index (BMI), systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, baseline hypertension (yes/no) and baseline diabetes (yes/no). For the incident/recurrent stroke prediction, the metadata model was a Cox-proportional hazards model with the same risk factors as those used in the SBI detection task. The metadata features were normalized and standardized before model training.

For the development of combined models, we first froze the developed encoder and used it to extract high-dimensional retinal features. Then, the extracted features were concatenated to the risk factors used in the metadata model to form new combined features. After that, classifiers (with more input dimensions for the addition of the risk factors) of SBI Detector and Stroke Predictor were used and trained as the combined model of SBI detection and incident/recurrent stroke prediction, respectively.

Implementation details of DeepRETStroke system developmental process

We adopted the retinal feature encoder of RETFound31 as the prototype of our encoder for subsequent pretraining and fine-tuning. It is a large vision transformer56 (ViT-large) with 24 transformer blocks and an embedding vector size of 1,024. Each transformer block is composed of multiheaded self-attention and multilayer perceptrons (MLPs), taking feature vectors as input and generating high-level features. For SBI Detector, we used two logistic-regression models with L2 regularization. SBI Learner was composed of a two-layer MLP with a two-dimensional output while Stroke Predictor used the same two-layer MLP with a five-dimensional output instead. For the first stage of pretraining, the objective is the classical MAE algorithm to reconstruct retinal images from the highly masked version, with a mask ratio of 0.75. For the second stage of pretraining, the objective function was as follows:

$${{{\mathrm{Loss}}}}_{{{\mathrm{stage}}}2}=\,\frac{1}{N}\frac{1}{{N}_{{\mathrm{T}}}}\sum _{i,t}-\left[{y}_{i,t}\log \left[{p}_{\theta ,t}\left({x}_{i}\right)\right]+\left(1-{y}_{i,t}\right)\log \left[1-{p}_{\theta ,t}\left({x}_{i}\right)\right]\right],$$

where \(t=1,2,3,4,5\) represents years; \({N}_{{\mathrm{T}}}=5\) is for participants with stroke occurring within 5 years or without stroke over 5 years, while \({N}_{{\mathrm{T}}} < 5\) is for participants censored within 5 years; \(i=1,2,3,\ldots ,N\) is the index of each participant; \({x}_{i}\) is the input fundus image of \({i}{\rm{th}}\) participant; \({p}_{\theta ,t}\left({\rm{\cdot }}\right)\) is the estimated probability of occurring the incident stroke before the timepoint \(t\); \({y}_{i,t}=1\) is for incident stroke occurrence before timepoint \(t\); and \({y}_{i,t}=0\) is for no incident stroke within a t-year time window. For the third stage of pretraining, the objective function of SBI detection was the classical cross-entropy loss for binary classification. The objective function of incident stroke prediction was just the loss function in the second stage plus a Kullback–Leibler divergence loss by the soft label:

$${{{\mathrm{Loss}}}}_{{{\mathrm{stage}}}3}={{{\mathrm{Loss}}}}_{{{\mathrm{stage}}}2}+\alpha \times \frac{1}{N}\sum _{i}-\left[{s}_{i,0}^{{\mathrm{T}}}\log \left[{q}_{i,0}^{{\mathrm{T}}}\left({x}_{i}\right)\right]+{s}_{i,1}^{{\mathrm{T}}}\log \left[{q}_{i,1}^{{\mathrm{T}}}\left({x}_{i}\right)\right]\right],$$

where \(i=1,2,3,\ldots ,N\) is the index of each participant; \({x}_{i}\) is the input fundus image of the ith participant; \({s}_{i,\,j}^{{\mathrm{T}}}\left({\rm{\cdot }}\right)\) is the value of soft label output by SBI Detector on class \(j\); \({q}_{i,\,j}^{{\mathrm{T}}}\left({{\cdot }}\right)\) is the value of softmax output by SBI Learner on class \(j\); and \(\alpha\) is the parameter to control the effects of the soft label on the learning of this stage. For the fine-tuning of recurrent stroke prediction, the objective function was the loss function in the second stage with the replacement of the predicted event from incident stroke to recurrent stroke.

For the developmental details of the system, we first trained the encoder and the Stroke Predictor for 30 epochs in model initialization step and selected the model with the best C-index on the validation set of incident stroke prediction to be used in the SBI detection. Then, we trained SBI Detector with semi-supervised learning strategy until there were no unlabelled samples remained or SBI Detector could not find any samples with results of high confidence. After that, we trained the encoder, SBI Learner and Stroke Predictor with knowledge transfer strategy for 30 epochs. In this step, if the validation result of incident stroke prediction had surpassed the best result from the previous iterations, we would select the corresponding best model to repeat the training process of the third stage; otherwise, the entire system development would be ended. In recurrent stroke prediction, we performed the fine-tuning on the copy of the encoder and Stroke Predictor for recurrent stroke prediction. The total training epoch is 30, and the model with the best C-index on its validation set was selected.

The entire system was implemented using PyTorch. During the developmental process, training was performed in batches of 128 images after data augmentation, including random horizontal and vertical flips, random rotation and random Gaussian noise addition. We used the Adam optimizer with a learning rate warming up (from 0 to a learning rate of 1 × 10−3) for ten epochs. The parameter \(\alpha\) of the objective function of the third stage was set to 0.3. No samples were overlapping at the patient level in training and validation sets.

Data from UK Biobank and Age-Related Eye Disease Study (AREDS) were accessed under application numbers 104443 and 26125 by Shanghai Jiao Tong University and the Ohio State University, respectively. For domestic datasets from Shanghai (SDPP, NDSP), Beijing (Peking Union Diabetes Management, PUDM), Wuhan (Wuhan Tongji Health Management, WTHM) and Wuxi (The Eastern China Health Management, ECHM), the principal investigator of each study provided data and supervised the data analyses, ensuring the data were appropriately analysed within research team, ensuring that no external entities had access to the data. For other datasets, we delivered a docker program and an instruction guide to principal investigators and researchers at each study site, who conducted the external validation and the related analyses within each cohort and locally following the same instruction guide (https://docs.docker.com/get-started/overview/). After this, the analyses were performed, and the summary statistics and performance matrices (for example, AUC) were then sent back to the requesting team at Shanghai Sixth People’s Hospital. No raw data transfer occurred across countries.

Real-world study of the DeepRETStroke system

For the real-world study within a community-based prospective cohort study of Chinese adults, 215 participants with prior stroke and 3 participants with SBI were screened in November 2022 (Extended Data Fig. 5). Among these patients, 56 received IM, while 162 did not. IM group were provided regular clinical and metabolic measurements, advised by specialists in comprehensive hospitals, received lifestyle guidance and peer support at community health service centres. Details of biochemical measurements and anthropometric data collection included body weight, waist circumference, blood pressure, lipid profile and related factors of cardiometabolic diseases.

Explainability analysis of the DeepRETStroke system

We utilized GradientShap24 and the occlusion method25 for visualizing the interpretability of the output predictions from the DeepRETStroke system. GradientShap approximates SHapley Additive exPlanations values by computing the expectations of gradients by randomly sampling from the distribution of baselines or references. It adds white noise to each input sample \(n\) times, selects a random baseline from baselines’ distribution and a random point along the path between the baseline and the input, and computes the gradient of outputs with respect to those selected random points. The occlusion method is a perturbation-based approach to compute attribution, involving replacing each contiguous rectangular region with a given baseline or reference and computing the difference in output. For features located in multiple regions (hyperrectangles), the corresponding output differences are averaged to compute the attribution for that feature. For the output results of GradientShap and occlusion method, we selected green and red colour to show the region where the models pay more attention, respectively.

Statistical analysis

To evaluate the performance of DeepRETStroke system, for SBI detection, we used ROC curves of sensitivity versus 1 − specificity. The area under the receiver operating characteristic (AUROC) was calculated with 95% CIs by the non-parametric bootstrap method (1,000 random resampling with replacement). Sensitivity and specificity were estimated by the best cut-off value of the output scores (Youden index for this evaluation) on the validation set. The significant differences between AUROC were computed using Delong methods57. For incident/recurrent stroke prediction, we used Harrell’s C-index and time-dependent AUROC58. Each method was adjusted for censoring by weighing with the inverse probability of censoring and calculated for data before a given cut-off time τ with 95% CIs by the non-parametric bootstrap method. A two-sided significance level of 5% was considered as statistically significant.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.