Background & Summary

Cardiovascular diseases account for approximately 18.6 million deaths annually, ranking as one of the leading causes of mortality worldwide1,2,3. Atherosclerosis, the underlying pathology of many CVDs, often progresses silently until significant vascular stenosis or occlusion precipitates severe clinical events such as myocardial infarction or stroke4. Early diagnosis and timely intervention are therefore critical to halt disease progression, reduce the risk of acute events, improve patient prognoses, and alleviate pressures on healthcare systems.

Carotid intima-media thickness (CIMT) is a key predictive biomarker for assessing atherosclerosis and cardiovascular risk, with alterations in CIMT reflecting changes in vascular health. Studies have demonstrated that a 0.1 mm increase in CIMT correlates with a 1.12-fold escalation in mortality risk5,6. Accurate measurement and monitoring of CIMT are essential for early detection of cardiovascular diseases and for guiding clinical interventions. However, traditional B-mode ultrasound methods are constrained by high equipment costs, the necessity for specialized expertise, time-intensive procedures, and limited accessibility, impeding their widespread adoption in routine CIMT screening.

Emerging research indicates that the structure and function of retinal vessels are significantly influenced by systemic diseases such as atherosclerosis, hypertension, diabetic nephropathy and neurocardiovascular conditions7,8,9,10, rendering them sensitive indicators of systemic arterial health. Anatomically, the central retinal artery—a branch of the internal carotid artery within the cranium—shares a close relationship with the carotid artery. As depicted in Fig. 1, hemodynamic changes in the carotid artery can directly affect the morphology and function of retinal vessels, manifesting as characteristic pathological alterations, including retinal arterial narrowing, reduced microvascular density, venous dilation, arteriovenous crossing compression, cotton wool spots, punctate hemorrhages, and hard exudates. These manifestations are particularly evident in cases of carotid atherosclerosis or stenosis11,12. Consequently, abnormalities in the microvascular structures observed in fundus images can serve as imaging biomarkers for carotid artery lesions, providing early warning indicators for carotid and other cardiovascular diseases13. The detection and identification of retinal microvascular abnormalities offer a promising avenue for the precise assessment of early cardiovascular risk.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Diagram of the Relationship Between CIMT Thickening and Fundus Lesions.

Fundus images, rich in detail and complex features, present challenges for lesion recognition and analysis due to their high-dimensional nature. Traditional diagnostic methods are limited in their ability to detect subtle lesions, such as retinal arterial narrowing and decreased microvascular density, especially when multiple lesions coexist. Deep learning technologies have robust capabilities for analyzing large volumes of image data, allowing it to accurately capture these minute and complex lesion features, thereby providing more precise evidence for clinical diagnosis. In the context of early lesion detection and prediction, deep learning exhibits immense potential, not only compensating for the shortcomings of traditional methods but also offering new possibilities for early disease intervention and prevention14,15.

The rapid advancement of deep learning has catalyzed the emergence of “ophthalmomics,” a research domain that leverages fundus images to predict systemic diseases. Utilizing the powerful feature recognition capabilities of deep learning, ophthalmomics has achieved significant progress in predicting various systemic diseases, including stroke16, Alzheimer’s disease17,18, myocardial infarction19, coronary heart disease20, diabetic nephropathy21,22, atherosclerosis23,24, hypertension8, anemia25, diabetes26, and hyperlipidemia27, as shown in Fig. 2. These developments underscore the substantial potential of fundus imaging in systemic disease prediction. Nevertheless, research specifically focusing on predicting CIMT using fundus images remains scarce. The lack of publicly available, high-quality datasets on systemic diseases hampers the study of early diagnostic techniques for CIMT and impedes the exchange and advancement within the field of ophthalmomics.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Research on Systemic Diseases Based on Oculomics.

To address these challenges, we have established a publicly accessible dataset comprised of high-quality bilateral fundus images, CIMT measurements, and comprehensive clinical information, including age and gender. This dataset provides a multimodal resource comprising high-quality bilateral fundus images, CIMT measurements, and demographic information (e.g., age and gender). It helps address the current scarcity of publicly available multimodal datasets for cardiovascular research. Researchers can utilize this dataset to evaluate the performance, robustness, and generalizability of artificial intelligence (AI) models in various clinical scenarios. Ultimately, this dataset may support the future development of non-invasive early screening tools for cardiovascular diseases, especially in settings with limited resources. One limitation of this dataset is the lack of detailed medical history, such as hypertension and diabetes, which may significantly influence the accuracy of CIMT predictions. Future studies will incorporate these clinical variables to further improve predictive performance.

Methods

Dataset collection

This study conducted a retrospective analysis of patient data from April 2015 to February 2024, collected from the Department of Endocrinology at the Second Affiliated Hospital of Anhui Medical University. This dataset predominantly comprises individuals of Chinese ethnicity from Anhui Province, China. The inclusion criteria were as follows: (1) inpatients in the Department of Endocrinology; (2) patients who underwent both fundus imaging and carotid ultrasound during hospitalization; and (3) based on ultrasound measurements, carotid intima-media thickness (CIMT) < 0.9 mm was classified as the normal group, while CIMT ≥ 0.9 mm was classified as the thickened group. The exclusion criteria included: (1) patients with poor image quality who could not provide valid research data. Importantly, participants with ocular diseases, including retinal or vitreous disorders, were not excluded if the image quality was adequate for analysis; (2) patients lacking images of both eyes; and (3) patients without detailed CIMT measurement records.

In this study, a CIMT threshold of ≥0.9 mm was chosen based on previous clinical evidence from a Chinese cohort, which demonstrated a significant relationship between this threshold and elevated cardiovascular risk, including the incidence of new-onset hypertension28. While recognizing the variability of CIMT thresholds across ethnicities and age groups29, our selection aligns closely with validated research specific to our studied population.

Fundus images were captured using a Canon CR-2 PLUS AF non-mydriatic digital fundus camera. CIMT measurements were performed using a Siemens ACUSON S2000 ultrasound diagnostic system equipped with an L16 probe (frequency range of 5–12 MHz). Measurements were taken at the distal wall of the common carotid artery, 1–2 cm proximal to the carotid bifurcation, focusing on flat areas to ensure a clear intima-media interface. CIMT data were provided by the hospital information center and included patient ID, name, gender, and ultrasound diagnostic results.

The study was exempted from informed consent from the Ethics Committee of the Second Affiliated Hospital of Anhui Medical University (Ethics Approval Number: YX2023-2011(F1)). All data were derived from previous clinical diagnoses and treatments, posing no greater risk than daily life or routine physical examinations. The exemption did not compromise the rights or health of the participants. All data were fully anonymized, contained no personal privacy information, and had no commercial interests.

The final dataset comprised 2,903 cases of bilateral fundus images, collectively referred to as the China-Fundus-CIMT dataset. The dataset’s normal group included 849 cases (568 males and 281 females), while the thickened group comprised 2,054 cases (1,294 males and 760 females). Overall, the proportion of males was significantly higher than females (Fig. 3a). Statistical analysis of age and gender across the groups revealed that CIMT in the thickened group was significantly higher than in the normal group (p < 0.001; Fig. 3b). Additionally, the mean age of the thickened group was significantly higher than that of the normal group for both males and females (p < 0.0001; Fig. 3c and Table 1). Further analysis using a heatmap of age-specific CIMT distribution indicated a progressive increase in CIMT with advancing age (Fig. 3d).

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Detailed statistics of the China_Funds_CIMT dataset. Notes: (a) Participant count and gender distribution across different CIMT thickness groups. (b) Comparison of CIMT thickness between the normal and thickened groups. (c) Gender-specific age distribution of CIMT thickness in the normal and thickened groups. (d) CIMT thickness distribution across various age ranges.

Table 1 Age and Gender Statistics Across Different Groups in the China-Fundus-CIMT Dataset.

Data preprocessing

Image cropping and renaming

Ensuring high data quality is pivotal for maximizing model performance. To meet rigorous quality standards, we removed black regions in fundus images containing patient information and left/right eye identifiers. A Python script was employed to crop all images to a standardized resolution of 512 × 512 pixels. Images were systematically renamed based on eye identifiers: left-eye images were labeled as “PatientID_L.png” and right-eye images as “PatientID_R.png.” This preprocessing step ensures consistency and standardization, establishing a solid foundation for subsequent model training.

Data quality control

A total of 17,908 color fundus images were collected from patients at our hospital. All images were independently assessed by two experienced clinical professionals according to standardized quality assessment criteria. The evaluation focused on four key aspects: (1) clarity of retinal structures, including the optic disc, macula, and retinal vasculature; (2) exposure adequacy, ensuring neither overexposure nor underexposure; (3) absence of obstructions or artifacts, such as reflections, light bands, or motion blur; and (4) completeness of the retinal field coverage. Images were classified as low quality if they exhibited significant deficiencies in any of these criteria. Any discrepancies between the two reviewers were resolved through consensus with a senior ophthalmologist. Representative examples of both high-quality and low-quality images are presented in Fig. 4a. Based on this manual classification, the dataset was partitioned into three subsets: a training set comprising 6,618 high-quality and 9,890 low-quality images; a validation set including 500 high-quality and 500 low-quality images; and a test set comprising 200 high-quality and 200 low-quality images used for further evaluation of model performance.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Fundus image quality assessment. Notes: (a) represents a schematic diagram of high-quality and low-quality images, (b) represents the confusion matrix of the validation group, (c) represents the ROC curve of the validation group, (d) represents the confusion matrix of the test group, and (e) represents the ROC curve of the test group.

An automated fundus image quality classification model was subsequently developed using the SE-ResNeXt deep neural network architecture. Prior to training, all images were resized to a standardized resolution and underwent data augmentation techniques, including random rotations, horizontal and vertical flipping, and brightness adjustments, to enhance the model’s generalization capability. To address class imbalance between high- and low-quality images, a weighted cross-entropy loss function was applied during training. The model was optimized using the Adam algorithm, with an initial learning rate of 1e-3, reduced by a factor of 0.1 every 20 epochs, over a total of 200 epochs with a batch size of 32. Model performance was evaluated on both the validation and test datasets. The model achieved an average accuracy of 94.10% (Fig. 4b) and an AUC of 98.51% (Fig. 4c) on the validation set, and an average accuracy of 93.75% (Fig. 4d) with an AUC of 98.77% (Fig. 4e) on the test set.

Fundus images were initially evaluated using a quality assessment model. Images that met the quality standards were retained directly, whereas those failing the initial evaluation underwent processing with an image restoration model30. Further details about the restoration model can be found at https://github.com/liamheng/Annotation-free-Fundus-Image-Enhancement. The restored images were subsequently re-evaluated using the quality assessment model. Images that still did not meet the quality criteria after restoration were discarded. The detailed workflow for image quality evaluation and restoration is illustrated in Fig. 5.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Funds image Pre-processing.

Preprocessing of ultrasound report data

To extract CIMT information from ultrasound reports, we employed regular expressions for automated text parsing. Two researchers then independently verified the extracted data to ensure accuracy. The processed data were systematically recorded in an Excel file containing the following fields: patient ID, gender, age, and CIMT measurements.

In line with prior research, a CIMT value of ≥0.9 mm is commonly used as the threshold for defining carotid intima-media thickening. However, potential measurement variability arising from ultrasound equipment limitations and operator subjectivity may introduce errors. To address this, we performed a separate analysis to assess the impact of including 0.9 mm values in the thickened group. This additional evaluation enhanced the reliability of the dataset and strengthened the robustness of our conclusions.

Dataset construction

The China-Fundus-CIMT dataset was constructed by matching CIMT measurements obtained from ultrasound reports with quality-assessed fundus image data using patient IDs. To meet the requirements of the study, the dataset was further divided into training, validation, and test sets. The validation set comprised 100 subjects from the normal group and 100 from the thickened group, while the test set included 50 subjects from each group. The training set consisted of 699 subjects in the normal group and 1,904 in the thickened group.

To ensure compatibility with neural network models, the data were standardized as follows: gender information was encoded using One-Hot encoding, and age data were both standardized and normalized. The preprocessed data were saved in JSON format to maintain consistency and facilitate efficient data loading. These steps provide a structured and high-quality foundation for model training and analysis.

Data Records

The China-Fundus-CIMT multimodal dataset is publicly available on the Figshare31 platform to facilitate researcher access. All fundus images are stored in a single directory and have been standardized to a resolution of 512 × 512 pixels. Associated clinical and imaging metadata are organized in a JSON file, where the primary key corresponds to the patient ID. This file encompasses multiple fields (detailed in Table 2), offering a structured and comprehensive resource to support multimodal data analysis and model development.

Table 2 Explanation of Each Field in the Fundus_CIMT Dataset json File.

Technical Validation

To systematically evaluate the role of fundus images in predicting CIMT thickening, we conducted a series of experiments, starting with a baseline model based on individual fundus images. We comprehensively analyzed factors influencing model performance, including various image processing strategies (e.g., inclusion or exclusion of boundary values) and the incorporation of multimodal clinical information such as age and gender. The model architecture is depicted in Fig. 6. By comparing model performance under different experimental conditions, we sought to elucidate the potential of fundus images and their associated clinical data in predicting CIMT thickening, and offer valuable insights and a scientific foundation for future advancements in this field.

Fig. 6
Fig. 6The alternative text for this image may have been generated using AI.
Full size image

Flowchart of the Multi-Modal Model Training Process. Note: This flowchart illustrates the structure of a multimodal neural network, with the blue-shaded region representing the module integrating demographic features such as age and gender. When the blue-shaded region is removed, the model relies solely on fundus image inputs and becomes an unimodal neural network architecture.

Impact of CIMT boundary values on model performance

The accuracy of CIMT measurements can be influenced by limitations in ultrasound precision and the subjective judgment of operators, potentially introducing variability that affects model stability and the reliability of predictions. To systematically investigate the impact of boundary values on model performance, this study utilized the data_info.json file for data extraction and constructed two experimental models: one with and the other without boundary values. This approach aims to provide a comprehensive assessment of how boundary values influence model performance.

In the first model, all data, including boundary values, were incorporated. In the second model, boundary values were excluded during the data preprocessing stage, and only more stable measurement data were used for training. A comparative analysis of the two models was conducted, focusing on their accuracy, stability, and adaptability to boundary cases. This study highlights the effects of boundary values and measurement variability on model performance, offering critical insights for refining model training strategies and improving overall robustness.

Impact of multimodal clinical information on model performance

Considering the statistically significant differences in age and gender between the thickened and normal groups, this study explores the influence of incorporating multimodal clinical information on model predictive performance using data from the China-Fundus-CIMT.json file. Building on the baseline model, statistically significant clinical variables, such as age and gender, were added as supplementary input features. These features were fused with image data at the fully connected layer to construct and validate a multimodal fusion model. The performance of this multimodal model is compared with an unimodal model, which uses only image data.

The comparison focuses on key aspects such as prediction accuracy, model robustness, and adaptability to intergroup feature differences. This research aims to provide a comprehensive evaluation of the benefits of multimodal information fusion in enhancing prediction precision and generalization capacity, offering valuable guidance for model optimization and practical applications in related fields.

Experimental setup and methods

All experiments were conducted on a Windows 11 system equipped with a GTX 4090 graphics card with 24 GB of memory, using PyTorch 2.3 and CUDA 12.1 for model training and inference. To address class imbalance, a weighted binary cross-entropy loss function was employed. The class weights were determined according to the inverse of the class sample sizes, ensuring balanced learning during model optimization.

In the unimodal model, a Siamese SE-ResNeXt neural network was used to extract features from bilateral fundus images. The SE-ResNeXt architecture was selected based on our previous study32, where it consistently outperformed traditional ResNet in predicting carotid intima-media thickness (CIMT) from fundus images. The integration of the Squeeze-and-Excitation (SE) module further enhances the network’s ability to recalibrate channel-wise feature responses, allowing it to focus on clinically relevant vascular structures. Compared with other advanced architectures such as EfficientNet and Vision Transformers, SE-ResNeXt offers a better trade-off between performance and computational efficiency on this dataset.

For the multimodal approach, age variables were processed through one-hot encoding, normalized, and passed into a fully connected layer with 128 neurons. This output was then fused with the model’s fully connected layer to form the multimodal input. The training process consisted of two stages: in the first stage, the Adam optimizer was used with an initial learning rate of 0.001, decayed by a factor of 10 every 20 epochs; in the second stage, training resumed from the best-performing model of the first stage, with the learning rate reset to 0.00001 and similarly reduced every 20 epochs. There are 100 epochs for each stage. To improve generalization, data augmentation techniques such as horizontal and vertical flipping, random rotations (0°–20°), and adjustments to color and saturation were applied. Model performance was evaluated using metrics including confusion matrix, accuracy, precision, recall, and F1 score, providing a comprehensive assessment of its classification capabilities32.

Validation results

Validation results: impact of cimt boundary values on model performance

The experimental results indicate that CIMT boundary values have a limited impact on the overall performance of the unimodal model. For the model including boundary values, the overall prediction accuracy on the validation set was 77.50% (Table 3), with a prediction accuracy of 66.00% for the normal group and 89.00% for the thickened group (Fig. 7a). On the test set, the overall accuracy was 75.00%, with a prediction accuracy of 70.00% for the normal group and 80.00% for the thickened group (Fig. 7d). In contrast, the model excluding boundary values achieved an overall accuracy of 77.41% on the validation set, with a prediction accuracy of 69.00% for the normal group and 87.21% for the thickened group (Fig. 7b). On the test set, the overall accuracy remained 71.73%, where the prediction accuracy for the normal group was 64.00%, and 80.95% for the thickened group (Fig. 7e). Additionally, the model including boundary values achieved AUC-ROC values of 82.79% on the validation set (Fig. 7g) and 82.58% on the test set (Fig. 6h). For the model excluding boundary values, the AUC-ROC values were 83.65% on the validation set (Fig. 7g) and 81.00% on the test set (Fig. 7h). These findings demonstrate that the inclusion or exclusion of boundary values results in minimal variations in model performance and has no significant effect on overall prediction accuracy. This highlights the robustness of the unimodal model and suggests that the treatment of boundary values has a limited influence on its predictive capabilities.

Table 3 The results of predicting whether CIMT is thickened using single-modal and multi-modal models.
Fig. 7
Fig. 7The alternative text for this image may have been generated using AI.
Full size image

Results of predicting CIMT thickening using unimodal and multimodal models.

Impact of multimodal clinical information on predictive performance

The multimodal model that takes into consideration clinical information such as age and gender achieved significantly better predictive performance compared to the unimodal model. On the validation set, the overall prediction accuracy of the multimodal model was 80.50%, 2.5 percent higher than the unimodal model (Table 3). Specifically, the prediction accuracy for the normal group improved by 5 percent to 71.00%, while the thickened group improved by 1 percent to 90.00% (Fig. 7a,c). Additionally, the AUC-ROC value for the validation set was 86.01%, 3.22 percent higher than the unimodal model (Fig. 7g). On the test set, the multimodal model achieved an overall prediction accuracy of 81.00%, 6 percent higher than the unimodal model (Table 3). The test set prediction accuracy for the normal group increased by 4 percent to 74.00%, while the thickened group improved by 3.06 percent to 88.00% (Fig. 7d,f). The AUC-ROC value for the test set reached 90.41%, 7.83 percent higher than the unimodal model (Fig. 7h). In summary, integrating clinical information (e.g., age and gender) greatly enhances the predictive accuracy and generalization capability of the model. These findings underscore the potential of multimodal models in CIMT prediction and provide a strong foundation for further research in this area.

Usage Notes

The China-Fundus-CIMT dataset comprises high-quality bilateral fundus images, CIMT measurements, and clinical information such as age and gender. To facilitate research, we provide an automated processing pipeline that makes the dataset well-suited for studies on the early prediction of cardiovascular diseases and related medical image analyses. The dataset is available in standard image and json formats, ensuring compatibility with various open-source tools and deep learning frameworks. Researchers can efficiently load, preprocess, and train models using this dataset, enabling streamlined workflows for advancing research efforts.