Introduction

Coronary heart disease (CHD) is a major global health concern, primarily caused by the accumulation of plaques in the coronary arteries, leading to restricted blood flow to the myocardium. According to the American Heart Association (AHA)1 and European Society of Cardiology (ESC)2,3, CHD ranks among the leading causes of death worldwide, accounting for millions of deaths annually. Beyond its high mortality rate, CHD significantly affects the quality of life of affected individuals. Common symptoms include angina and shortness of breath, while severe cases may progress to myocardial infarction4, potentially causing irreversible cardiac damage or fatality. The economic burden of CHD is significant, with high medical costs associated with treatment and prolonged management imposing significant pressure on healthcare systems and society. One of the primary challenges in CHD management is that its early stages are often asymptomatic5. However, plaque rupture can trigger acute myocardial infarction, emphasizing the critical importance of early diagnosis6. Identifying high-risk populations and implementing timely interventions can mitigate complications such as heart failure and malignant arrhythmias1. Early detection not only improves long-term survival rates but also enhances the overall well-being of affected individuals, underscoring its critical clinical value.

Traditional Chinese Medicine (TCM) has long held that the heart governs the eyes, and that pathological changes in cardiac function may manifest in ocular features such as eye discoloration, visible vessels, and changes in scleral or conjunctival appearance. Classical texts describe how heart-blood deficiency or stagnation may lead to observable changes in the eye area. Studies such as Gao et al.7. have begun to empirically examine eye-image features in patients with CHD. In modern medical research, there is growing evidence that eye or ocular vascular changes correlate with cardiovascular disease. For example, microvascular alterations in the retina measured via OCTA have been shown to relate to severity of coronary artery disease, left ventricular function, or number of diseased coronary vessels8,9,10. Structural changes in ocular tissues such as choroidal thickness have been associated with cardiovascular risk factors11 (e.g. blood pressure, left ventricular mass). Taken together, both TCM theory and accumulating empirical data suggest that ocular features—particularly vascular and appearance changes—may serve as non-invasive indicators of systemic cardiovascular pathology. This motivates present studys focusing on scleral imaging and interpretable features such as vascular abnormalities and pigmentation spots.

At present, scleral appearance is not utilized as a standard diagnostic tool for CHD in modern cardiology. Clinical practice primarily relies on well-established imaging modalities such as echocardiography, coronary computed tomography angiography (CCTA), or invasive coronary angiography. In contrast, Traditional Chinese Medicine (TCM) has historically incorporated ocular inspection as part of syndrome differentiation, where scleral vessel congestion, tortuosity, or pigmentation changes are interpreted as external manifestations of internal circulatory dysfunction, particularly involving the heart12. While these TCM-based observations are widely applied in clinical practice, they remain largely qualitative and have not been systematically validated or integrated into mainstream cardiology. Against this background, our study seeks to provide an objective and quantitative framework that bridges traditional clinical observations with modern deep learning techniques, thereby exploring the potential of scleral imaging as a non-invasive auxiliary marker for CHD risk assessment.

CHD patients frequently exhibit microcirculatory disorders13,14, oxidative stress15,16, and autonomic nervous system dysfunction17,18,19, leading to scleral capillary tortuosity, lipid deposition, or abnormal vascular tone. Observing scleral signs for syndrome differentiation facilitates the early identification of CHD symptoms, including qi stagnation, blood stasis, and phlegm-turbidity, which holds significant clinical value.

Convolutional Neural Networks (CNNs) have become a fundamental component of modern image processing, particularly for tasks involving image recognition and classification20. CNNs are highly effective in feature extraction, as they learn hierarchical representations of data. Lower layers capture simple features such as edges and textures, while higher layers represent increasingly abstract and complex patterns. Additionally, techniques such as transfer learning and fine-tuning enhance CNN performance across diverse tasks, even in scenarios with limited training data. CNNs have been widely applied in medical imaging, achieving remarkable success across diverse fields. For instance, they have been utilized in lung cancer detection using CT scans21,22, skin lesion classification23,24, and diabetic retinopathy screening with fundus images25. These applications highlight the potential of CNNs to match or even surpass human-level performance in disease diagnosis based on medical images26,27. Given the proven success of CNNs in medical imaging, extending this technology to scleral image analysis as an auxiliary tool for coronary heart disease diagnosis represents both a logical and promising direction.

Building on these insights28,29,30,31,32, we propose a deep-learning system that leverages CNNs to analyze scleral image features for the auxiliary diagnosis of CHD. The scleral image analysis process comprises three key stages: image segmentation, deep feature extraction and image classification. We employ the U-Net + + algorithm to segment the sclera from eye images, utilize the DenseNet121 model to extract deep features, and implement a multiple instance learning (MIL) model with an attention mechanism for classification. The proposed algorithm attained an average classification accuracy of 0.891 and an AUC of 0.942. The training dataset was internally curated and comprises scleral images from 500 participants, including 240 CHD patients and 260 healthy individuals. Furthermore, by visualizing the extracted deep features, we showed that the trained deep learning system primarily focused on scleral blood vessels, which aligns with known microcirculatory abnormalities in CHD patients, and in certain cases, it also highlighted pigmentation spots as auxiliary cues. While vascular features were consistently emphasized across most visualizations, pigmentation spots were less frequently observed but remain physiologically plausible given their association with oxidative stress and lipid deposition. This interpretability analysis suggests that the model captures physiologically relevant patterns rather than relying on spurious correlations, thereby supporting the potential of scleral imaging as an auxiliary diagnostic tool for CHD. Nevertheless, we acknowledge that pigmentation-related findings require further validation in larger datasets and with expert-annotated reference standards.

Methods and materials

Utilizing CNNs for robust deep feature extraction, we propose a CHD risk prediction system based on scleral images. As illustrated in Fig. 1, the system consists of three main components: feature extraction, classification, and prediction. The feature extraction module employs the DenseNet121 network to extract deep features from scleral images. The classification module applies MIL combined with a loss attention mechanism to determine image bag labels using extracted deep features, followed by classification. Finally, the prediction module evaluates CHD risk for subjects based on classified image bags.

Fig. 1
Fig. 1
Full size image

Overview of scleral image processing pipeline. (A) Acquisition equipment for scleral images taking. (B) Image preprocessing for data preprocessing. (C) Sclera segmentation to obtain enhanced images of sclera. (D) Feature extraction to take scleral characteristics associated with CHD. (E) Classify healthy people and CHD by multiple instance learning (MIL) and full connect neural network (FC). (F) CHD prediction according to the above steps.

Instrument and data collection

In order to collect a usable scleral image dataset for CHD, we designed and developed a specialized device28,29,33 for scleral image acquisition, as illustrated in Fig. 2A. This device is based on the shadow-free scleral imaging technology we developed. Common methods for capturing eye images usually use forward lighting and take pictures from the same direction, as shown in Fig. 2B. They usually light directly on the sclera, causing severe reflection and loss of scleral detail information. To avoid this problem, we developed a shadow-free imaging method for the sclera, as depicted in Fig. 2C. In the new method, Illumination light from the point light source is directed against the observation direction of the eyeball, we combined artificial intelligence optimization to improve the illumination angle and imaging light paths, making all reflected shadows from different eye structures (such as Cornea, Sclera, Iris, Ciliary Body, Choroid, etc.) can be focused into a point and coincides with the pupil, then achieve clear imaging of the sclera and iris without reflected light shadow interference .

To capture comprehensive scleral information, images are captured in the frontal gaze as well as in four additional directions—up, down, left, and right (Fig. 2D). Both eyes are imaged, resulting in 8 images per participant (Fig. 2E is 4 of the images for the left eye, there are also similar 4 images for the right eye). During the acquisition process, participants place their chin on a chin rest, keep one eye close to the frame and gently hold their upper and lower eyelids with their fingers to fully expose the sclera. They are instructed to gaze toward the indicator light, and the entire procedure is completed within one minute. Sample images acquired through this process are shown in Fig. 2E. In this study, we focused on the characteristics of CHD on the sclera. Therefore, only 8 fully exposed images of the sclera were selected for the left and right eyes of each subject.

Fig. 2
Fig. 2
Full size image

Schematic diagram of the sclera imaging acquisition device. (A) Actual drawing of instrument. (B) Schematic of common eye imaging. (C) Schematic of shadow-free scleral imaging. (D) Eye movement dynamics during imaging. (E) Representative acquired ocular image.

A total of 621 adult participants were recruited as volunteers for this study. The flow chart of the exclusion process is shown in Fig. 3. More than 5000 raw images were collected After data cleaning, which entailed the removal of blurry or incomplete images, data from 500 participants were retained, including 240 patients diagnosed with CHD by professional cardiovascular specialists. 4000 high-quality scleral images (8 per subject) from 500 participants were finally used for model training and evaluation. The data were collected at Dongzhimen Hospital of Beijing University of Chinese Medicine (Beijing, China) from April 2024 to November 2024. To reduce variability associated with different collection times and operators, the imaging device was equipped with an automatic color correction feature. During the hospital deployment phase, a standard color chart was captured under the hospital’s lighting conditions to serve as a calibration reference. After image acquisition, histogram equalization was applied to all images to mitigate system-induced data biases. Eight images were captured per participant, representing the sclera of both eyes in four gaze directions (up, down, left, and right), to form a single dataset entry. Participants with a CHD diagnosis were classified as positive cases, while others were classified as negative. This classification was based solely on the presence of CHD and did not account for the presence of other medical conditions.

Fig. 3
Fig. 3
Full size image

Flow chart of the exclusion process.

To evaluate classification performance, 80 out of 500 subjects were allocated to the validation set, while the remaining 420 subjects constituted the training set for classifier network optimization, maintaining a 5:1 ratio. Both the training and validation sets maintained the same ratio of healthy individuals to CHD patients. Subject-wise data partitioning was implemented by grouping all scleral images from each participant with corresponding labels, ensuring that all images from a given subject were assigned exclusively to either the training or validation set.

In total, more than 5000 raw ocular images were captured from 621 participants. After removing blurred or incomplete images, 500 participants were retained. For each participant, eight high-quality scleral images (left and right eyes in four gaze directions) were selected, yielding a final dataset of 4000 images for model development. The segmentation dataset (1607 training images and 321 testing images) described in “Sclera segmentation” section was collected independently for U-Net + + training and is not part of these 4000 images.

Image preprocessing

To enhance the accuracy and generalizability of classification models, we designed a image preprocessing pipeline consists of two steps.

The first step is to remove low-definition images for data cleaning. A subset of the collected scleral images was found to be blurry, which could negatively impact classification model performance. Therefore, it was necessary to filter and correct these images. Detecting blur is a fundamental challenge in computer vision. Commonly used blur detection algorithms include edge sharpness analysis, Bayesian discrimination functions, low depth-of-field (DOF) image segmentation, minimum directional high-frequency energy (for motion blur detection), and wavelet-based support vector machine (SVM) histogram methods. These algorithms extract blur-related features from input images based on established blur models for quantifying image blur. For instance, Chung et al.34 analyzed edge sharpness to identify blurry regions, determining that objects with lower edge sharpness appear blurred. Rugna et al.35 demonstrated that blurry regions exhibit greater invariance to low-pass filtering compared to non-blurry areas. Similarly, Ko et al.36 used statistical measures such as mean and standard deviation, finding that blurry regions tend to have lower values for both compared to clear regions. We adopted the blur detection method based on the Fast Fourier Transform (FFT). In an image, clear regions exhibit pronounced intensity variations, resulting in higher high-frequency components after Fourier transformation. In contrast, blurry regions exhibit smoother intensity transitions and are dominated by low-frequency components. The amplitude spectrum slope of blurry regions is steeper than that of non-blurry regions, enabling the distinction between blurry and clear images.

The second step involves the enhancement of scleral images. A review of relevant research and medical literature indicates that CHD exhibits distinct disease-related features in the eye37,38, such as radiating vessels at the lateral canthus39,40, haze-like patterns41, and colored spots42,43. These features are primarily concentrated in the sclera and are potentially significant for CHD classification using scleral images. However, these scleral features are often subtle, making them challenging to detect. To improve classification algorithm performance and highlight disease-related features, it is essential to enhance these features while mitigating the influence of varying illumination conditions. The enhancement process comprises two primary steps. First, a Gaussian blur is applied to the image to filter out high-frequency components, thereby reducing noise and suppressing fine details. Second, a weighted blending process is performed. Once Gaussian blurring suppresses the high-frequency components, the blurred image is subtracted from the original image, resulting in a residual “mask” that isolates the filtered high-frequency components. The mask is then blended with the original image using a weighted combination, enhancing high-frequency details for improved feature visibility. The enhanced scleral images reveal clearer structural details, such as pale pink blood vessel patterns and brown spots. Additionally, yellow-brown plaques, often imperceptible to the naked eye, become distinctly visible. For scleral images without prominent color features, the processed images do not exhibit noticeable artifacts or color distortions, maintaining enhancement robustness while preventing artificial distortions.

Sclera segmentation

As illustrated in Fig. 2, the images captured by our custom-designed imaging device include the sclera, iris, and periocular skin. Since our objective is to analyze pathological features in the sclera, isolating the scleral region is essential to remove irrelevant features. In this study, we employ U-Net++, an advanced medical image segmentation model, for scleral segmentation. U-Net + + is an enhanced version of the original U-Net architecture, originally introduced by Ronneberger et al.44. The U-Net model is particularly advantageous in medical image segmentation as it effectively integrates both deep and shallow features, facilitating precise segmentation with minimal labeled data. Zhou et al. introduced U-Net + + 45, which enhances feature fusion through dense skip connections and nested architectures, enabling adaptive depth adjustment. This improvement enhances accuracy and adaptability during segmentation, particularly in scenarios where varying levels of detail are required across different datasets.

U-Net + + comprises an encoder subnetwork (backbone) followed by a decoder subnetwork. U-Net + + differs from U-Net primarily in its redesigned skip connections (highlighted in green and blue) and the incorporation of deep supervision (indicated in red). To achieve superior scleral segmentation using the U-Net + + model, we employed an independent scleral segmentation dataset consisting of a demographically diverse cohort, including both genders and a wide age range. This dataset was collected and annotated specifically for segmentation purposes and was not derived from the 420 subjects described in “Instrument and data collection” section for CHD classification. A randomly selected subset of the dataset served as the test set to evaluate the segmentation network’s performance. The segmentation dataset (1607 training images and 321 testing images) was collected independently from the previous work for segmentation task training, and has no concern with these 4000 images mentioned in “Instrument and data collection” section. The dataset was split into 1607 images for training and 321 images for testing. The pixel-level annotations for the 1,607 training scleral images were generated by three medical master’s students from Beijing University of Chinese Medicine using the Any Labeling software. The annotation process involved three rounds of quality control. First, two annotators independently delineated the scleral region for each image. Second, a third annotator systematically reviewed their outputs: annotations with complete agreement were directly accepted, while discrepancies were flagged for further review. In cases of disagreement, the third annotator provided an assessment, and the first two annotators re-annotated the image accordingly until consensus was reached. This multi-round independent annotation and cross-review procedure ensured high accuracy and consistency of the pixel-level labels, thereby providing a reliable foundation for training the U-Net + + segmentation model. The model achieved an intersection over union (IoU) score of 0.907, highlighting its strong performance in precise scleral segmentation. Following these three preprocessing steps, we produced segmented scleral images with improved detail clarity.

Feature extraction

Traditional scleral feature extraction methods primarily focus on visually discernible features such as blood vessels, haze, and spots on the sclera. These features typically require manual annotation and are then detected using conventional image processing algorithms or deep learning networks. However, the relationship between these manually annotated features and CHD is often complex, frequently exhibiting one-to-many or many-to-one correlations. In contrast, deep features learned by CNNs exhibit a stronger and more direct correlation with CHD, leading to superior classification performance. In this study, we employ the DenseNet121 network for deep feature extraction. Proposed by Huang et al. in 201746, DenseNet121 incorporates dense connections between layers, allowing each layer to receive input from all preceding layers. This architecture facilitates efficient propagation of features and gradients, alleviates gradient vanishing issues, and enhances the learning of hierarchical features. Moreover, DenseNet minimizes model parameters and error rates, eliminates redundant feature learning, lowers model complexity, and improves classification accuracy. In this study, we choose DenseNet as the feature extraction model for depth information for its highest accuracy and F1 score, which was discussed in “Results” section. U-Net + + achieves high-precision segmentation of scleral boundaries by virtue of nested skip connections, ensuring the complete capture of tiny blood vessels and pigment structures. Through feature reuse and dense connections, DenseNet enhances its representational capability under the condition of limited samples, improves the efficiency of transmission and reuse of deep-level features, and thereby optimizes the classification performance.

In this study, DenseNet121 was first fine-tuned on the scleral dataset to extract robust deep feature representations. The extracted features were then fixed and used as inputs for the MIL classification stage.

Classification

Through this process, we collected ten ocular images from each subject, as illustrated in Fig. 2E. From these, we selected eight images that provided sufficient scleral exposure for segmentation and deep feature extraction, thereby minimizing the risk of information loss. Instead of assessing a subject’s health status based on the deep features of a single scleral image, we integrated information from all eight scleral images from each subject. This approach mitigates the effects of variability and potential confounders from individual images, enhancing the robustness and reliability of our analysis. To facilitate this, we reconstructed the dataset by aggregating the deep features from the eight scleral images of each subject. Each data package was labeled according to the subject’s health status: 0 for CHD cases and 1 for healthy individuals. This formulation represents a multi-instance binary classification problem, which we tackled through a MIL framework incorporating an attention mechanism.

For each subject, the deep features extracted from individual scleral images are fed into the MIL model to obtain the aggregated features of the entire image bag. The MIL model consolidates the feature vectors of all images within a given bag using an aggregation function. Conventional MIL approaches typically employ max-pooling or average-pooling to process deep features from multiple instances within a bag, ultimately determining the bag’s label. However, in this study, we employ the attention-based pooling mechanism proposed by Ilse et al.47 for aggregating bag-level features, which introduces an instance weighting mechanism where a neural network dynamically assigns weights. Let \(\:H=\left\{{h}_{1},\cdots\:,{h}_{K}\right\}\:\)represent the set of instance embeddings within a bag. The MIL embedding is defined as follows:

$$\:z=\sum\:_{k=1}^{K}{a}_{k}{h}_{k}$$
$$\:{a}_{k}=\frac{exp\left\{{W}^{T}tanh\left(V{h}_{k}^{T}\right)\right\}}{{\sum\:}_{j=1}^{K}exp\left\{{W}^{T}tanh\left(V{h}_{j}^{T}\right)\right\}}$$

W and V are network parameters.

The input to the MIL model comprises deep feature vectors extracted from individual scleral images, each with a dimension of 1024. The output is a single aggregated feature vector representing the entire image bag for a subject. This aggregated feature vector is then processed by a fully connected neural network to generate the final classification result for the bag. In this classification scheme, a label of 1 indicates a healthy individual, while a label of 0 corresponds to a CHD patient, as illustrated in Fig. 1. The MIL classifier with attention pooling was trained separately on the fixed feature representations obtained from DenseNet121, rather than in an end-to-end fashion.

Ethical approval and informed consent

We confirm that all methods were performed in accordance with the relevant guidelines and regulations. For more information about ethical approval and informed consent, please refer to ethics statement and the attachment.

Results

We applied the proposed algorithm to a dataset of ocular images acquired from patients with CHD and healthy individuals. To identify the optimal deep feature extraction network with respect to accuracy and computational efficiency, we conducted a comparative analysis of multiple CNNs. The evaluated models comprised the ResNet and VGG series, GoogleNet, and EfficientNet, all assessed as potential feature extractors for CHD detection.

We employed commonly used evaluation metrics, including accuracy, precision, recall, and the F1 score, to evaluate the performance of the proposed algorithm. Accuracy indicates the overall correctness of classification, precision represents the proportion of correctly predicted positive cases, recall measures the model’s capability to identify actual positive samples, and the F1 score balances precision and recall, mitigating class imbalance effects. To further analyze the model’s performance, we generated the Receiver Operating Characteristic (ROC) curve, a commonly utilized tool for assessing binary classification models. The ROC curve depicts the trade-off between the true positive rate (TPR) and the false positive rate (FPR) across varying threshold values. The area under the ROC curve (AUC) provides a quantitative assessment of model performance, with a higher AUC denoting superior classification capability. Using the ROC curve, we identified the optimal decision threshold via Youden’s J statistic and subsequently calculated the four evaluation metrics. The mathematical formulations of Youden’s J statistic, accuracy, precision, recall, and the F1 score are provided below, where TPR denotes the true positive rate, FPR denotes the false positive rate, TP signifies true positives, TN represents true negatives, FP corresponds to false positives, and FN indicates false negatives.

$$\:J=TPR-FPR$$
$$\:Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
$$\:Precision=\frac{TP}{TP+FP}$$
$$\:Recall=\frac{TP}{TP+FN}$$
$$\:F1-score=2\times\:\frac{Precision\times\:Recall}{Precision+Recall}$$

Momentum Stochastic Gradient Descent (MSGD) was chosen as the optimizer, with the momentum term set to 0.9 and an initial learning rate of 0.003. Across different experiments, the optimal learning rate was selected based on the minimum error. Additionally, cross-entropy was utilized as the loss function, and the batch size was set to 16. To balance convergence and training efficiency, each experiment was conducted for 100 epochs, providing adequate time for model stabilization. All deep learning experiments were conducted in PyTorch and executed on a single NVIDIA 4060 Ti GPU with 16GB of memory.

The test results of various models trained and evaluated on the same dataset are presented in Table 1; Fig. 4. As indicated in Table 1, substantial variations exist in performance metrics, such as AUC, among convolutional networks for sclera-based coronary heart disease classification. Among the evaluated models, DenseNet121 demonstrated the highest overall performance, achieving an average accuracy of 0.891, precision of 0.896, recall of 0.891, F1 score of 0.888, and AUC of 0.942, outperforming all other compared networks. In comparison, the highest average accuracy recorded for the VGG series was 0.873, for the ResNet series was 0.864, for AlexNet was 0.724, and for GoogleNet was 0.828.

Figure 4 depicts the ROC curves for all experiments, where a curve closer to the top-left corner and an AUC value near 1 signify superior model performance. A comparison of the nine models reveals that DenseNet121 exhibits the ROC curve nearest to the top-left corner, underscoring its superior sensitivity and specificity.

Table 1 Comparison results of different feature extraction networks.
Fig. 4
Fig. 4
Full size image

Comparison of ROC curves of different feature extraction networks.

Discussion and conclusion

Traditional scleral imaging has been predominantly utilized for identity recognition, with limited exploration in disease detection. This study introduces a comprehensive processing and analysis algorithm to facilitate CHD detection using scleral images. The first step involves image clarity analysis, where the quality of collected eye images is assessed to eliminate blurred or unsuitable images, ensuring data cleanliness. The second step focuses on feature enhancement, applying Gaussian blur and low-frequency component removal to emphasize key features, particularly in the scleral region. The third step entails sclera segmentation, employing a pre-trained U-net + + model specifically designed for this task. The fourth step involves deep feature extraction, utilizing the DenseNet121 network, trained for sclera-based CHD classification, to extract deep features from individual scleral images. The fifth step encompasses feature aggregation and classification, where deep features extracted from scleral images of the same subject are aggregated into a bag and input into a MIL model with an attention mechanism. This model assigns weights to each instance and combines them into a single vector, which is then fed into a Fully Connected Classification Network (FCN) to derive the probability distribution of bag categories. The sixth step involves CHD risk prediction, where the subject’s CHD risk is estimated based on bag category probabilities. The proposed CHD risk prediction algorithm demonstrates strong performance across key metrics, including accuracy, recall, and AUC. Experimental results indicate that, compared to convolutional neural networks such as ResNet18, ResNet34, VGG13bn, and VGG16bn, DenseNet121 yields superior performance. Employing DenseNet121 as the feature extraction network markedly enhances model accuracy.

Fig. 5
Fig. 5
Full size image

Original sclera images of one CHD patient and corresponding visualized deep features via Grad-CAM.

Grad-CAM (Gradient-weighted Class Activation Mapping)48, a widely adopted visualization technique for convolutional neural networks, was employed to analyze the deep features of the sclera in the context of CHD. This technique identifies the most influential regions of the input image for the model’s predictions by leveraging gradients of the target concept propagated to the final convolutional layer. Specifically, Grad-CAM produces a heatmap that highlights the relative importance of each spatial location in the image for the model’s decision-making process. This visualization enhances model interpretability by identifying the scleral image features that most significantly contribute to CHD classification. Representative samples from CHD patients are depicted in Fig. 5. The scleral features identified by the model are primarily localized to the scleral blood vessels, with secondary attention directed toward spots. These features exhibit a strong correlation with the pathophysiological mechanisms underlying CHD. For instance, systemic microcirculation disorders in CHD patients, including endothelial dysfunction and elevated blood viscosity, can induce dilation and stagnation of scleral capillaries, consistent with the TCM concept of “ocular collateral stasis.” Furthermore, oxidative stress linked to atherosclerosis may contribute to lipid deposition or pigmentation in the scleral connective tissue, aligning with the TCM concept of “phlegm turbidity clouding the clear orifices.” The strong concordance between the scleral features identified by the neural network and the clinical manifestations of CHD enhances the interpretability of this approach. This finding further highlights the potential of scleral imaging as a valuable auxiliary tool for disease diagnosis.

Although the deep learning algorithm achieved high accuracy on the CHD scleral dataset collected in this study, several limitations remain and warrant further investigation. First, the dataset requires expansion to improve its representativeness and robustness. CHD and its closely associated condition, systemic atherosclerosis, involve complex pathological processes that impact multiple organ systems. Physiological changes in the sclera may be influenced by multiple internal organs, including the liver and kidneys. Consequently, expanding the dataset to include scleral data from a more diverse and extensive population is imperative. More critically, the current dataset defines positive cases exclusively based on the presence of CHD, without considering potential confounding factors such as comorbidities, specific medical conditions, regional geographic influences, or other relevant variables. Future research should prioritize the development of more comprehensive case classifications within larger, more diverse cohorts to improve generalizability. Furthermore, as an exploratory study, this work necessitates the integration of additional clinical data, such as detailed case histories, physiological indicators, and clinical manifestations. A more in-depth investigation into the pathological relationship between CHD and scleral changes is crucial for advancing our understanding and enhancing diagnostic accuracy.

This study aims to develop a non-invasive auxiliary diagnostic method for CHD and highlights two key contributions. First, the proposed method demonstrates exceptional classification performance, achieving an average AUC of 0.942. Second, it reveals that vascular abnormalities are the predominant interpretable features associated with the pathophysiological mechanisms of CHD, while pigmentation spots were identified in a subset of cases as auxiliary cues. These findings enhance the interpretability of the method and underscore the clinical relevance of monitoring scleral vascular changes in CHD patients, with pigmentation requiring further validation in larger and more diverse datasets. By incorporating scleral imaging into CHD auxiliary diagnosis, this study expands the scope of early disease detection and monitoring through scleral analysis, highlighting the potential of scleral imaging for clinical diagnostic applications. The integration of scleral image acquisition with artificial intelligence algorithms provides a promising non-invasive diagnostic approach with substantial research potential. One limitation of this study is that we did not perform ablation experiments on direct classification of raw scleral images using DenseNet alone, without segmentation or MIL. Such a baseline could provide additional insights into the specific contributions of the segmentation and MIL components. Future work will therefore include ablation studies to systematically evaluate the incremental value of each module in our proposed framework.