Introduction

Diabetes mellitus (DM) is a major global public health challenge of the 21 st century due to its high prevalence and rapid growth. According to the latest evidence provided by the International Diabetes Federation (IDF), approximately 537 million adults worldwide were living with diabetes in 2021. Current trends project that the number of adults with diabetes will reach 643 million by 2030 and 783 million by 20451. As the country with the largest population of individuals with diabetes, China accounts for approximately one-quarter of global cases. The associated healthcare expenditure is projected to increase from USD 250.2 billion in 2020 to USD 460.4 billion by 2030, imposing a substantial burden on the public health system2,3. The long-term management of DM is highly complex, and its multiple chronic complications are key contributors to increased morbidity and mortality among affected individuals4. Among these complications, diabetic peripheral neuropathy (DPN) is particularly common and debilitating, affecting approximately 50% of patients with diabetes. DPN can lead to sensory loss, refractory pain, and numbness, and markedly increases the risk of foot ulcers, infection, and subsequent amputation, thereby severely impairing physical function and quality of life5. Notably, neuropathy has been reported even in individuals with undiagnosed diabetes and those with prediabetes6,7,8. Therefore, early identification and intervention for DPN are critical for delaying disease progression and improving clinical outcomes4. However, current clinical screening strategies for DPN exhibit substantial limitations. Traditional bedside tools, such as the 10-g monofilament test, vibration perception testing, and thermal sensation assessment, rely heavily on patient cooperation and examiner experience, leading to limited reproducibility and restricted sensitivity for detecting early neuropathic changes8. Although nerve conduction studies (NCS) offer objective and reproducible measurements, they primarily assess large myelinated fibers and are relatively insensitive to early damage of small unmyelinated fibers9. Skin biopsy with quantification of intraepidermal nerve fiber density (IENFD) is regarded as the gold standard for evaluating small-fiber pathology; however, its invasiveness and high cost limit routine clinical use8,10,11. These limitations underscore the urgent need for a noninvasive, objective, and highly sensitive tool capable of detecting early small-fiber neuropathy. The cornea is one of the most densely innervated tissues in the human body, and its nerve fibers primarily consist of small-diameter Aδ and C fibers similar to those in distal skin12. IVCM, an advanced, noninvasive optical imaging modality with cellular-level resolution, enables rapid visualization of corneal cellular and microstructural features. It has therefore emerged as a promising “window” for evaluating systemic small-fiber neuropathy and a potential method for early screening of DPN13,14. Quantitative IVCM parameters such as corneal nerve fiber length (CNFL) and corneal nerve fiber density (CNFD) have been shown to correlate with the severity of DPN15,16. Beyond these individual quantitative parameters, accumulating evidence has established IVCM as a reliable and noninvasive tool for the assessment and early detection of diabetic peripheral neuropathy16,17,18. Both manual and automated analyses of corneal nerve morphology have demonstrated consistent associations with neuropathy presence and severity across patients with diabetes and prediabetes16,17. More recently, artificial intelligence–based approaches have further supported the feasibility of objective and scalable IVCM analysis for DPN screening13,19. Importantly, evidence demonstrates that small-fiber injury does not begin with overt fiber loss. Instead, early pathological changes are characterized by subtle morphological abnormalities, including axonal swellings, focal beading, and regenerative sprouting, which reflect underlying cytoskeletal disruption and impaired axonal transport. Such early changes have been confirmed in studies involving skin biopsy, peripheral nerve regeneration models, and Schwann-cell-mediated repair. These findings support the concept that morphological abnormalities precede measurable reductions in nerve fiber density20,21,22. Furthermore, recent studies indicate that corneal microneuromas may appear in the early stages of DPN and are closely associated with impaired nerve regeneration23,24. Artificial intelligence (AI) has rapidly advanced in ophthalmology, with successful clinical applications in the grading of diabetic retinopathy and diagnosis of keratitis25,26,27. In the field of IVCM image analysis, convolutional neural networks (CNNs) have achieved automated segmentation and quantification of several corneal structures, including nerves and inflammatory cells28,29; however, the automatic detection of microneuromas remains challenging. In this study, we aimed to develop a deep learning–based IVCM-assisted diagnostic model for DPN and evaluate its clinical utility, providing a novel approach for the early screening and diagnosis of DPN.

Information and methods

General information

Image collection

This diagnostic accuracy study collected IVCM images obtained between August 2021 and December 2023 at the Ophthalmology Center of Renmin Hospital of Wuhan University and an independent external cohort from Wuhan Aier Hankou Eye Hospital, using the HRT III/RCM system (Heidelberg Engineering, Germany). Participants were recruited from routine clinical visits, including regular diabetes follow-up and ocular surface evaluation. Inclusion and exclusion criteria were applied to ensure reliable assessment of corneal nerve morphology. Healthy volunteers were asymptomatic with stable tear film and an intact corneal epithelium.They served as a control group to provide a reference for normal corneal nerve morphology. For patients with diabetes, medical histories were reviewed and those with potential confounding factors affecting corneal nerves, including neurological disorders unrelated to diabetes, contact lens wear, previous ocular surgery, active ocular inflammation, or topical anti-inflammatory medication use within two weeks before imaging, were excluded. The study protocol was approved by the Ethics Committee of Renmin Hospital of Wuhan University (Approval No. WDRY2023-K202). Given the retrospective design, written informed consent was waived in accordance with national regulations governing retrospective medical record research in China. The study adhered to the principles of the Declaration of Helsinki. All clinical and demographic data were retrieved from the hospital information system. The dataset from Wuhan Aier Hankou Eye Hospital was used exclusively as an independent external test set, which was not involved in model training, cross-validation, or parameter tuning. This cross-center external validation cohort was included to assess the generalizability of the model. Baseline demographic characteristics are summarized in Table 1, and the dataset partitioning is illustrated in Fig. 1.

Table 1 Demographic of patients in different datasets.
Fig. 1
Fig. 1
Full size image

Flowchart of dataset partitioning. Data from Renmin Hospital of Wuhan University were used for model development and internal testing, while data from Wuhan Aier Hankou Eye Hospital were used exclusively as an independent external test set to evaluate model generalizability.

In vivo confocal microscopy image acquisition

All IVCM images were acquired by a senior technician with more than ten years of experience who followed a standardized imaging protocol. The images were obtained under consistent imaging parameters, with the microscope set to a lateral spatial resolution of 0.5 μm and an axial resolution of 1 to 2 μm. Each corneal en face image covered an area of 400 × 400 μm with a resolution of 384 × 384 pixels. To prevent cross-contamination, a disposable sterile cap (Tomocap, Heidelberg Engineering, Germany) was applied to the objective lens, and 0.2% carbomer gel (Liposic, Bausch & Lomb, Germany) was used to optimize optical coupling. Both eyes of each participant were topically anesthetized with 0.5% proparacaine hydrochloride (Alcaine, Alcon, USA). Images were captured in “section” mode at the level of the subbasal nerve plexus (SBNP) in the central cornea. Each eye was scanned with a maximum examination duration of 10 min. Participants were instructed to fixate on an external target to maintain primary gaze for accurate centration. The scan position was confirmed by visualizing the red laser reflection through the charge-coupled device monitor, and imaging depth was adjusted with the z-axis controller to precisely capture the SBNP, typically located at a depth of 40 to 50 μm. All images were anonymized prior to annotation and model training.

Methods

Data collection

A total of 4,554 IVCM images from 293 patients at the Ophthalmology Center of Renmin Hospital of Wuhan University and 506 images from 87 patients at Wuhan Aier Hankou Eye Hospital were included. Images were selected based on the following criteria: visualization of the SBNP layer and uniform imaging depth without inclusion of epithelial or stromal layers. All raw IVCM images first underwent quality control performed by senior corneal specialists. Images with inadequate quality due to overexposure, insufficient illumination, motion blur, or poor probe–cornea contact were classified as low quality and excluded. Remaining images then underwent non-target layer filtering to remove frames that did not contain structures of interest; the retained images were defined as high-quality target images. Optical character recognition (OCR) was applied to extract imaging depth information embedded in the frame and to remove any patient-identifying information. All images were then resampled to a uniform resolution of 384 × 384 pixels for subsequent analysis. Senior corneal specialists then evaluated each image to determine the presence or absence of microneuromas and further categorized positive cases into three morphological subtypes: axonal distension, enlarged bulge, and hyper-reflective diffuse pattern (HRDP). Data from both patients with diabetes and healthy controls were randomly assigned at the patient level to the training and internal test sets to avoid subject overlap. The external test dataset remained fully independent. This design ensured that the model was exposed to both normal and abnormal corneal nerve appearances rather than only pathological images.

Image preprocessing and annotation

Each IVCM image was independently annotated and classified by two ophthalmologists with more than five years of clinical experience, using the VGG Image Annotator software (Version 2.0.8, www.robots.ox.ac.uk, University of Oxford, UK). When the annotations from both readers were consistent, the result was accepted as the true label. In cases of disagreement, a third corneal specialist with over ten years of experience adjudicated to determine the final label. To ensure standardized morphological categorization, microneuromas were defined using the following criteria:

  • Axonal distension: characterized by localized thickening of the nerve fiber, typically spindle, triangular, or round-shaped, with a relatively broad base.

  • Enlarged bulge: defined as a larger bulbous protrusion than axonal distension, with well-demarcated borders and villous or spicule-like surface projections.

  • HRDP: characterized by an irregular, poorly demarcated high-reflectivity area, presenting as a bright, amorphous reflective signal on the image (Fig. 2).

Fig. 2
Fig. 2
Full size image

Types of corneal microneuromas. The yellow boxes indicate microneuroma locations: (a) axonal distension, (b) enlarged bulge, (c) HRDP. Scale bar, 100 μm.

These criteria were designed to maximize sensitivity and include all potential microneuromas candidates for subsequent segmentation and classification model development and evaluation.

Model development

ResNet-50, a structurally stable and computationally efficient residual learning network, has been validated in our previous studies30,31,32,33. Based on this foundation, we employed it as the backbone of our deep learning model. All ResNet-50 backbones were initialized with weights pre-trained on the ImageNet dataset, enabling transfer learning rather than training from scratch. Image-level classification was conducted using a Mean Teacher–based semi-supervised ResNet-50 framework, and microneuroma segmentation was performed with a U-Net + + architecture incorporating a ResNet-50 encoder. Within the Mean Teacher framework, each classification model consisted of a student network and a teacher network, with the teacher parameters updated as an exponential moving average of the student parameters. The training objective combined supervised cross-entropy loss on labeled data with a consistency loss (mean squared error), encouraging prediction agreement between the student and teacher under different input perturbations. This strategy was adopted to improve model robustness and generalization in the presence of limited labeled data.

Model training was performed on a server equipped with four NVIDIA GeForce RTX 2080 GPUs, each with 8 GB of memory. The algorithm was implemented in Python (version 3.6.5) using TensorFlow (version 1.12.2) and Keras (version 2.2.5). For all classification models, input images were resized to 224 × 224 pixels. Training was conducted using stochastic gradient descent with a learning rate of 1 × 10⁻³ and a batch size of 32 for up to 200 epochs, with early stopping applied using a patience of 10 epochs. Regularization strategies included weight decay set to 1 × 10⁻⁴ and a dropout rate of 0.5. To improve generalization and reduce overfitting, data augmentation was applied during training, including random rotations within ± 30°, horizontal and vertical flipping, and brightness and contrast adjustments. To address class imbalance within the training set, weighted loss functions were used by assigning higher weights to underrepresented classes, while the same augmentation-based sample enrichment strategy was leveraged to increase the effective representation of minority classes. For microneuroma segmentation, U-Net + + was trained using an input image size of 512 × 512 pixels and the Adam optimizer, with a learning rate of 1 × 10⁻⁴ and a batch size of 2. Training was performed for up to 300 epochs with early stopping applied using a patience of 10 epochs. A combined Dice–cross-entropy loss was adopted to balance region overlap and pixel-wise classification accuracy. Given the limited availability of pixel-level annotations, extensive spatial and intensity augmentations were applied using the Albumentations library, including elastic deformation, grid distortion, and coarse dropout, to further enhance model generalization. Figure 3 shows the workflow of the model. The overall model training pipeline consisted of the following steps:

Fig. 3
Fig. 3
Full size image

Flow chart of model training.

  1. 1.

    DCNN1 (Deep Convolutional Neural Network 1): performed image quality assessment and target-layer filtering.

  2. 2.

    DCNN2: identified high-quality target images and classified them as either containing microneuromas or not.

  3. 3.

    DCNN3: performed segmentation of microneuromas.

  4. 4.

    DCNN4, DCNN5, and DCNN6: further classified microneuroma-positive images into axonal distension vs. non-axonal distension, enlarged bulge vs. non-enlarged bulge, and HRDP vs. non-HRDP, respectively.

After model training, image-level predictions were converted to binary labels using a predefined probability threshold for classification decisions. A uniform cutoff of 0.50 was applied across all classifiers for consistency and comparability. The DCNN4 model was the sole exception, where a threshold of 0.25 was selected following validation-set performance optimization due to its distinct prediction-probability distribution.

Cross-Validation

To assess model robustness and stability during model development, 5-fold cross-validation was performed only on the training set. In each fold, 80% of the data were used for training and 20% for validation, with patient-level splitting applied to ensure that all images from the same participant were included in a single fold, thereby preventing data leakage. After model development was completed, a final model was trained using the full training set. All final performance metrics and confusion matrices reported in the Results section were derived exclusively from the independent internal and external test sets, which were not involved in cross-validation, model selection, or parameter tuning.

Model evaluation

Evaluation metrics

The model was evaluated on both the internal test set (Renmin Hospital of Wuhan University) and the external test set (Wuhan Aier Hankou Eye Hospital). For the classification models, performance metrics included accuracy, specificity, sensitivity, receiver operating characteristic (ROC) curve, area under the ROC curve (AUC), positive predictive value (PPV), and negative predictive value (NPV). For the segmentation task, performance was assessed using the Dice similarity coefficient (DSC), intersection over union (IoU), precision, and recall, all calculated at the pixel level on the full lesion mask.

Model-Assisted image reading by junior physicians

After model training, an independent set of 150 IVCM images, separate from the training and test datasets, was used for an image-reading experiment. Five junior ophthalmologists, each with less than two years of clinical experience and no formal IVCM training, participated. Each physician evaluated the same image set under two conditions: (1) without model assistance (independent reading) and (2) with model assistance (model pre-evaluation with labels displayed). For each image, physicians assessed the presence or absence of microneuromas and classified microneuromas subtypes. Expert consensus was used as the reference standard to calculate diagnostic accuracy under both conditions. A washout period of at least two weeks and randomized image order were implemented between sessions to minimize memory and order bias.

Statistical analysis

All statistical analyses were performed using GraphPad Prism 10.0 (GraphPad Software, USA). Continuous variables were tested for normality using the Shapiro–Wilk test and are presented as mean ± standard deviation (mean ± SD). Categorical variables are presented as counts (percentages). Comparison of image-reading times for junior physicians between the two conditions (with vs. without model assistance) was performed using paired t-tests, while diagnostic accuracy comparisons were analyzed using the McNemar test. A two-sided P < 0.05 was considered statistically significant.

Results

Model validation

The performance of the developed models was evaluated on both the internal test set (Renmin Hospital of Wuhan University) and the external test set (Wuhan Aier Hankou Eye Hospital). DCNN1, which was used to identify high-quality images, achieved an accuracy of 0.975, sensitivity of 0.972, specificity of 0.979, PPV of 0.978, and NPV of 0.973 in the internal test set. DCNN2, designed for binary classification of images as containing microneuromas or not, demonstrated an accuracy of 0.837, sensitivity of 0.909, specificity of 0.757, PPV of 0.807, and NPV of 0.881 in the internal test set, while in the external test set, accuracy was 0.812, sensitivity 0.813, specificity 0.812, PPV 0.834, and NPV 0.788. DCNN3, which performed automatic segmentation of corneal microneuromas, achieved an average DSC of 0.786, IoU of 0.647, precision of 0.816, and recall of 0.773 in the internal test set. In the external test set, the corresponding metrics were DSC of 0.731, IoU of 0.576, precision of 0.765, and recall of 0.722. For morphological classification, DCNN4, tasked with distinguishing axonal distension from non-axonal distension, achieved an overall accuracy of 0.831 in the internal test set, with a sensitivity of 0.755 for axonal distension and a specificity of 0.861 for non-axonal distension; in the external test set, overall accuracy was 0.810, with a sensitivity of 0.707 and specificity of 0.868. DCNN5, used to classify enlarged bulge versus non-enlarged bulge, reached an overall accuracy of 0.843 in the internal test set, with a sensitivity of 0.902 and specificity of 0.740, and an overall accuracy of 0.798 in the external test set, with a sensitivity of 0.812 and specificity of 0.785. Finally, DCNN6, which distinguished HRDP from non-HRDP, demonstrated an overall accuracy of 0.886 in the internal test set, with a sensitivity of 0.883 and specificity of 0.887, and in the external test set, overall accuracy was 0.832, with a sensitivity of 0.888 and specificity of 0.707 (Table 2). Model visualization and evaluation are presented as follows: confusion matrices and ROC curves for DCNN1 and DCNN2 are shown in Fig. 4; representative examples of microneuroma segmentation by DCNN3 are shown in Fig. 5; and confusion matrices for morphological classification by DCNN4–DCNN6 are shown in Fig. 6.

Table 2 Performance of the DCNNs in Classification.
Fig. 4
Fig. 4
Full size image

Confusion matrix diagrams and ROC curve of DCNN1 and DCNN2 models. (A) Confusion matrix diagram of DCNN1 (Internal test) (B)༚Confusion matrix diagram of DCNN2 (Internal test) (C) ROC curve of DCNN1 (Internal test) AUC=0.977 (D)ROC curve of DCNN2 (Internal test) AUC༝0.850 ROC༚receiver operating characteristic curve.

Fig. 5
Fig. 5
Full size image

Segmentation of microneuromas by the DCNN3 model (A-B) IVCM images without microneuromas (C-F) IVCM images with microneuromas (light blue areas: expert-annotated gold standard; dark blue areas: model-predicted segmentation). Scale bar, 100 μm.

Fig. 6
Fig. 6
Full size image

Confusion matrix diagrams of DCNN4, DCNN5 and DCNN6 models (A) DCNN4 (Internal test) (B) DCNN5 (Internal test) (C)DCNN6 (Internal test) (D) DCNN4 (External test)(E) DCNN5 (External test) (F) DCNN6 (External test).

Comparison of junior physicians’ performance with and without model assistance

In the tasks of detecting the presence or absence of microneuromas and classifying microneuromas subtypes on IVCM images, junior physicians demonstrated significantly improved diagnostic performance with model assistance. The overall diagnostic accuracy increased from 0.687 (515/750) during independent reading to 0.875 (656/750) with model assistance (P < 0.001). Regarding reading efficiency, the average time per image decreased from 63.4 ± 15.2 s without model assistance to 28.5 ± 6.8 s with model assistance (t = 19.3, P < 0.001). All five junior physicians consistently exhibited improvements in both diagnostic accuracy and reading efficiency when assisted by the model (Fig. 7).

Fig. 7
Fig. 7
Full size image

Comparison of the accuracy of image diagnosis by junior physicians before and after model assistance.

Discussion

In this study, we developed and validated a multi-task deep learning pipeline for corneal microneuromas, encompassing image quality control, lesion screening, segmentation, and morphological classification. The model demonstrated stable performance across both internal and external datasets, with a slight decrease on external data likely due to domain shift. In real-world reading scenario, model assistance improved junior physicians’ diagnostic accuracy by 18.8% and reduced average image-reading time by 55.1%, while alleviating visual strain. By filtering low-quality images, prioritizing frames likely to contain microneuromas, and highlighting lesion regions, the system directs attention to diagnostically relevant structures, minimizes exhaustive frame-by-frame inspection, and supports more systematic interpretation strategies. These structured outputs and lesion visualizations help reduce common errors and contribute to standardized training in corneal confocal microscopy.

Previous studies combining IVCM with AI have mainly focused on automated quantification of macrostructural parameters, or segmentation and counting of nerve fibers and dendritic cells29,34,35. While these metrics capture overall nerve fiber loss, they are less sensitive to early, focal pathological changes36. From a mechanistic perspective, microneuromas may represent abnormal axonal distension or nutrient deficiencies during axonal regeneration, and their presence could potentially correlate with severity of peripheral neuropathy or pain phenotypes37. Our study focused on microneuromas, a focal phenotype, extending analysis from presence recognition to segmentation and morphological classification. This enables more precise quantification of focal alterations and provides preliminary evidence that microneuromas may serve as early indicators of corneal small-fiber injury; however, longitudinal correlation with clinical gold standards is necessary. Automated segmentation enables objective measurement of lesion burden and spatial distribution, while morphological classification captures distinct pathological patterns, offering a framework for future investigations into disease progression and individualized monitoring.

Several limitations should be noted. First, this is a retrospective, two-center study; prospective, multi-center validation with more diverse populations and devices is needed. Second, the prevalence of microneuromas across patient groups and their direct association with clinical neuropathy measures were not analyzed. These factors limit the ability to assess the clinical significance of microneuromas. Third, for clinical translation, work is required to integrate AI outputs into routine workflows, including generating visualization reports for physicians and prioritizing image review, and to develop methods that further reduce false positives and false negatives.

In summary, this study presents an automated pipeline for corneal microneuroma analysis on IVCM images, covering detection, segmentation, and morphological classification. The models demonstrated robustness across two centers and provided proof-of-concept diagnostic gains in junior physician reading; nevertheless, prospective multi-center studies with clinical correlation are warranted to validate these findings and support safe clinical implementation.