Abstract
Diabetic retinopathy is a serious ocular complication that poses a significant threat to patients’ vision and overall health. Early detection and accurate grading are essential to prevent vision loss. Current automatic grading methods rely heavily on deep learning applied to retinal fundus images, but the complex, irregular patterns of lesions in these images, which vary in shape and distribution, make it difficult to capture the subtle changes. This study introduces RadFuse, a multi-representation deep learning framework that integrates non-linear RadEx-transformed sinogram images with traditional fundus images to enhance diabetic retinopathy detection and grading. Our RadEx transformation, an optimized non-linear extension of the Radon transform, generates sinogram representations to capture complex retinal lesion patterns. By leveraging both spatial and transformed domain information, RadFuse enriches the feature set available to deep learning models, improving the differentiation of severity levels. We conducted extensive experiments on two benchmark datasets, APTOS-2019 and DDR, using three convolutional neural networks (CNNs): ResNeXt-50, MobileNetV2, and VGG19. RadFuse showed significant improvements over fundus-image-only models across all three CNN architectures and outperformed state-of-the-art methods on both datasets. For severity grading across five stages, RadFuse achieved a quadratic weighted kappa of 93.24%, an accuracy of 87.07%, and an F1-score of 87.17%. In binary classification between healthy and diabetic retinopathy cases, the method reached an accuracy of 99.09%, precision of 98.58%, and recall of 99.64%, surpassing previously established models. These results demonstrate RadFuse’s capacity to capture complex non-linear features, advancing diabetic retinopathy classification and promoting the integration of advanced mathematical transforms in medical image analysis. The source code will be available at https://github.com/Farida-Ali/RadEx-Transform/tree/main.
Similar content being viewed by others
Introduction
Diabetic retinopathy (DR) is a microvascular complication of diabetes mellitus and a leading cause of vision impairment and blindness among working-age adults worldwide. According to the International Diabetes Federation, the global prevalence of diabetes is projected to rise from 463 million in 2019 to 700 million by 2045, intensifying the burden of DR on healthcare systems1. DR is characterized by progressive damage to the retinal microvasculature, starting with microaneurysms and hemorrhages and advancing to proliferative diabetic retinopathy (PDR), where abnormal neovascularization can cause severe vision loss2.
Early detection of DR is critical, as treatments are more effective before significant damage occurs. However, DR often progresses asymptomatically, and by the time symptoms are noticeable, considerable retinal damage has already taken place3. Hyperglycemia in diabetic patients damages retinal blood vessels, leading to leakage and the formation of exudates, hemorrhages, and microaneurysms. The extent and severity of these lesions are used to grade DR2. Despite the need for early detection, manual DR grading is labor-intensive, subject to inter-clinician variability, and requires specialized expertise4.
Automated DR grading using deep learning offers a promising solution to these limitations. Convolutional neural networks (CNNs) have demonstrated significant potential in medical image analysis, particularly for DR detection and classification5,6. However, relying solely on retinal fundus images presents inherent limitations. CNN-based models often face challenges in capturing fine-grained retinal features, especially in early-stage DR, where lesions like microaneurysms and small hemorrhages are subtle and less pronounced7. These lesions are difficult to detect due to their small size, irregular shape, and sparse distribution within the retina7,8,9. The complex, non-linear nature of retinal lesions, combined with the spatial variability of their distribution, makes it challenging for CNNs to extract detailed features and establish connections between different lesion regions10,11,12,13. Furthermore, adjacent DR stages often exhibit minimal visual differences, leading to difficulties in accurately distinguishing between them11,13.
To address these challenges, researchers have explored the integration of advanced image transformations with deep learning models. The Radon transform, which computes projections of an image along various angles, has been used successfully in medical imaging, particularly in computed tomography, for image reconstruction and feature extraction14,15. Its key benefitis the ability to simplify a complex image structure into analyzable projections. By projecting image data along linear paths, the Radon transform can highlight critical structures and edges. For example, incorporating Radon transforms has improved feature extraction in tumor detection by highlighting linear structures and edges15. Tavakoli et al.12 demonstrated the Radon transform’s effectiveness in enhancing microaneurysm detection in retinal images. Another study by Raaj et al.16 applied the Radon transform to mammogram images to classify them into normal, benign, and malignant categories, achieving high performance using hybrid CNN architecture. However, the linear Radon transform struggles to capture non-linear features such as curved edges or irregular textures, typical in pathological conditions. Its linear assumptions may not accurately represent these complex features, potentially leading to missed diagnoses or inaccuracies. To overcome these limitations, we previously introduced the RadEx transformation, a non-linear extension of the Radon transform, designed to capture non-linear and complex subtle features in image data17. While the RadEx transformation has been applied to chest X-rays for COVID-19 detection, its use in retinal image analysis for DR grading remains unexplored.
In this study, we propose RadFuse, a novel multi-representation deep learning approach that integrates RadEx-based sinogram images with original fundus images, creating a robust multi-representation input and providing complementary perspectives for DR detection and severity grading. Although both image types originate from the same fundus image, the RadEx transformation generates a distinct feature space that complements the spatial information from the original images to capture non-linear lesion patterns and distributions that are not readily discernible in the raw images. The rationale behind generating and including RadEx-transformed images as additional input is to capture intricate, non-linear lesion patterns that are often subtle and distributed across the retinal surface. While CNNs are proficient at extracting diverse feature types, the complex, non-linear nature and spatial variability of retinal lesions present challenges for standard architectures when processing retinal disease images alone10. The RadEx transformation serves as a secondary representation, enriching the feature set and providing an additional layer of diagnostic information, thereby enhancing the model’s ability to detect subtle and complex lesion patterns that may be overlooked by single-representation models. This dual-representation approach enables the model to leverage complementary features, improving its ability to detect subtle and distributed lesions essential for accurate DR grading.
This approach is supported by previous studies12,16, which demonstrated that transformations like the Radon transform emphasize unique features not easily discernible in the original spatial representations. These transformations provide a complementary view that, when combined with the original images, enables a more comprehensive analysis. The RadEx transformation, as a non-linear extension of the Radon transform, further enhances the model’s ability to capture complex, distributed lesion patterns in retinal images, which are critical for accurate DR grading. Incorporating multi-modal inputs or transformed images, therefore, provides complementary information, capturing lesion diversity more effectively and improving diagnostic accuracy10.
To the best of our knowledge, this is the first study to use a multi-representation approach combining non-linear RadEx—transformed sinogram images with retinal fundus images for DR grading. These sinogram representations highlight non-linear features associated with DR and provide additional information that enriches the deep learning model, significantly boosting CNN performance in detection and grading. The main contributions of this work are:
-
Innovative multi-representation approach utilizing non-linear RadEx transformation: We optimized and applied the non-linear RadEx transformation to retinal images, generating sinograms as a new representation of the data. These sinograms serve as an additional representation, offering a complementary perspective to traditional retinal images for DR detection and severity grading. This transformation detects non-linear features and microvascular abnormalities associated with different DR severity levels. We then fused the sinogram and fundus images to constitute a multi-representation input, enabling the model to learn complementary features from both representations.
-
Comprehensive evaluation and validation of the proposed approach on two benchmark datasets, APTOS-2019 and DDR: We evaluated our approach on the Asia Pacific Tele-Ophthalmology Society (APTOS) 2019 dataset, which was released as part of the Kaggle blindness detection challenge18, and the DDR dataset19, which is the second-largest publicly available dataset for DR grading. Extensive experiments, including comparisons using retinal image-only, RadEx-only, and multi-representation images with three different CNN architectures, were conducted to validate the effectiveness of our proposed approach. Our method demonstrated significant improvements over retinal image-only models and outperformed existing state-of-the-art (SOTA) methods on both datasets.
Our findings indicate that integrating non-linear Radon transformations provides an effective means of capturing complex non-linear features in retinal images, leading to accurate DR severity grading. This work advances the current state of DR classification and opens up new possibilities for integrating advanced mathematical transformations into medical image analysis pipelines.
Methods
This section outlines our proposed multi-representation deep-learning approach for DR detection and severity grading. The approach consists of two primary components: the generation of RadEx-sinogram images and the implementation of multi- representation deep learning.
Proposed approach
RadEx transformation: theoretical overview
The non-linear RadEx transformation is an advanced extension of the traditional Radon transform, designed specifically to capture complex, non-linear patterns within medical images17. Unlike the Radon transform, which relies on straight-line projections, the RadEx transformation employs parameterized curves that adapt to non-linear structures in the image. This allows for a more refined extraction of features critical to detecting and grading diabetic retinopathy, such as curvilinear blood vessels and scattered exudates. The RadEx transformation is defined by the following equation:
where z is the transformed vertical coordinate, p is the horizontal coordinate, M is the image size (assuming a square image of dimensions M × M), The parameters q and c control the horizontal shift and the curvature of the transformation, respectively.
Unlike the classical Radon transform, which relies on straight-line projections, the RadEx transformation parameterizes a family of curves in the plane. Each curve within this family is uniquely defined by two parameters: q, which serves as a horizontal shift parameter, and c, which controls the curvature of the projection trajectory. Specifically, adjusting q translates the curves horizontally across the image plane, allowing the transformation to systematically sample various vertical cross-sections of the image. Meanwhile, the curvature parameter c modulates the shape of these curves: positive c values produce upwardly curved paths, whereas negative c values produce downwardly curved paths, with the magnitude of c determining the degree of curvature. This parameter-driven adaptability allows RadEx trajectors to closely align with the intrinsic non-linear anatomical structures prevalent in different medical images, such as X-rays and retinal images. By varying q and c, RadEx generates projection curves that span a wide range of positions and curvatures, allowing it to effectively trace complex and arched anatomical features such as tortuous blood vessels and irregular lesion borders, which are often poorly captured by linear methods. Projecting image intensities along these non-linear trajectories generates richer and more discriminative sinogram representations (see Fig. 5) compared to linear projection methods. This enhanced feature representation provided by RadEx is particularly beneficial in improving the diagnostic accuracy of deep learning models in the detection of DR and severity classification.
Optimization of the RadEx transformation
Although the original RadEx transformation shows potential, it faces limitations when applied to high-resolution retinal fundus images as demonstrated in Fig. 1. This becomes especially problematic with larger images (e.g., 512 × 512 pixels and above), where significant regions of the image may remain underrepresented due to suboptimal parameter selection.
Visualization of pixel coverage by the RadEx transform across different image resolutions. Blank images of identical dimensions are used, with activated pixels representing areas covered by the transform. Black areas indicate uncovered regions.
In retinal context, where lesions are distributed unevenly across the retina, the failure to cover these non-uniformly distributed areas poses a limitation to achieving high accuracy. Therefore, an optimized version of RadEx is necessary to ensure full coverage of the retinal image and enhance the detection of subtle, non-linear, and distributed lesions. To ensure full image coverage and enhance feature extraction, we propose an optimized RadEx transformation focused on refining the selection of parameters q and c. This optimization improves the transformation’s ability to cover the entire image uniformly, reduce sparsity, and better capture non-linear retinal structures.
Selection of q Values (Horizontal Shift) The parameter q acts as a horizontal shift, determining the location of transfor- mation curves across the image width. Extreme values of q (either too negative or positive) can cause the curves to cluster at the image’s edges, resulting in redundant information and inadequate coverage. To avoid this, we select q values at regular intervals to ensure uniform sampling across the entire image width. The q values are computed as:
where Δq represents the step size, calculated as Δq = M/m, where M is the image dimension (512 pixels in our case), and m is the number of increments or divisions along the q axis. As shown in Fig. 2 (left), when q values are spaced with increments of approximately 10 pixels (Δq ≈ 10), the resulting curves demonstrate that the entire width of the image is covered uniformly. For comparison, Fig. 2 (right) illustrates the curves when larger increments of approximately 44 pixels are used (Δq ≈ 44), resulting in less frequent coverage across the image width. This comparison reinforces the need for appropriate selection of m to ensure adequate coverage of the image. For this study, we set m = 50, resulting in Δq = M/50 ≈ 10. This choice was made after conducting a sensitivity analysis of different step sizes to evaluate the transformation behavior across various settings and select the optimal one that balances model performance and computational cost.
Plots show pixel selections for varying q values from 0 to M at different increments, keeping c constant. The X-axis represents the image width, and the Y-axis represents the image height.
Selection of c values (Curvature Control) The parameter c controls the curvature of the transformation curves. Excessively large or small c values cause the curves to quickly reach the image edges, potentially missing important regions. To avoid this, we constrain c within a moderate range (− 1 ≤ c ≤ 1) and use gradual increments to achieve a balanced distribution of curves across the image (see Fig. 3). This ensures more uniform coverage and improves pixel selection granularity, which is crucial for detecting subtle lesions. We reformulate the RadEx transformation to precalculate c for each z using the inverse RadEx equation:
Plotting the pixels chosen for changing values of c between − 1 and 1 at different intervals, keeping q constant at zero. The X-axis depicts image pixels in the horizontal direction while the y-axis depicts image pixels in the vertical direction.
This reformulation ensures that the transformation spans the full vertical range of the image, capturing essential features across all regions. We conclude that the optimal count of specialized c values should approximate half the image size to achieve nearly 99% coverage, as shown in Fig. 4, where the count of specialized c values is M/2. Algorithm 1 outlines the steps for applying the transformation and generating RadEx-transformed images (sinograms), which are essential for our multi-representation deep learning model.
Visualization of pixel coverage by the optimized RadEx transform for different image sizes when the number of c is M/2. Blank images of identical dimensions are used, with activated pixels representing areas covered by the transform. White areas indicate covered regions.
Image preprocessing and RadEx sinogram generation
Several preprocessing steps were performed before applying the RadEx transformation to standardize the images and enhance feature visibility. First, we cropped the images to remove the black background, focusing solely on the retinal region. We then employed the Ben-Graham preprocessing method, a widely used technique in retinal image analysis20. This method involves multiple stages aimed at standardizing and improving image quality. The images were resized to a consistent resolution to ensure uniformity across the dataset. Following this, pixel intensity values were normalized to standardize brightness and contrast, reducing variability caused by different imaging conditions and equipment. A Gaussian blur was also applied to minimize noise while preserving important retinal structures.
After image preprocessing, we generated sinogram representations of the fundus images using our optimized RadEx transformation. For the optimized RadEx transformation, q values are selected uniformly between 0 and M, with increments of M/50. For each q, we use M/2 values of c (e.g., 112 for M = 224) to ensure dense vertical coverage. The resulting sinograms were then normalized to ensure compatibility with the original images during fusion. Figure 5 illustrates both classical Radon and optimized RadEx sinograms for representative images at each DR severity level (Healthy, Mild, Moderate, Severe, PDR). The third column shows traditional linear Radon projections, while the fourth column presents the corresponding RadEx sinograms. By juxtaposing these panels, RadEx-sinograms provide enhanced feature visibility, and the differentiation between DR grades is more pronounced in the RadEx-sinograms, facilitating more accurate classification of severity levels by deep learning models.
Comparison of Radon and RadEx Sinograms across DR Stages. From left to right: Original color fundus images (Healthy, Mild, Moderate, Severe, PDR); Preprocessed grayscale fundus images; Linear Radon sinograms highlighting straight-line projections; RadEx sinograms generated along parameterized curved trajectories, enhancing visibility of non-linear features.
The integration of these sinograms with the original fundus images forms the basis of the multi-representation (multimodal) input framework. This framework combines the non-linear, lesion-focused representation provided by the RadEx-transformed sinograms with the spatial details inherent in the original images. Together, these inputs enrich the feature space available to deep learning models, enabling better feature extraction. This dual-input approach is particularly valuable for early disease detection, where subtle indicators carry critical diagnostic and prognostic significance.
Optimized RadEx Image Transformation.
Multi-representation deep learning framework
We used three CNN architectures to evaluate the efficacy of our multi-representation approach: ResNeXt-5021, MobileNetV222, and VGG1923. ResNeXt-50 is known for its strong feature extraction capabilities through its grouped convolution strategy, while MobileNetV2 is optimized for efficient performance, making it suitable for deployment in resource-constrained environments. VGG19, although deeper and more computationally demanding, has demonstrated high accuracy in various image classification tasks.
To maximize the potential of our proposed RadEx transformation, we integrated the sinogram representations with the original fundus images (RadFuse) through an early fusion strategy24. In this approach, the RadEx-transformed sinogram and the original image were horizontally concatenated to form a multi-representation input, allowing the CNNs to process both the spatial and transformed-domain features simultaneously (Fig. 6). This multi-representation framework enables the network to capture complementary information from both domains, thus enhancing feature representation and improving the classification of DR severity. Additionally, we conducted experiments on two benchmark datasets—APTOS-2019 and DDR—to validate the generalizability and robustness of our RadFuse framework. These datasets provide diverse retinal image samples, ensuring that our approach performs consistently across different datasets.
Our proposed multi-representation RadFuse framework for DR detection and grading. The input comprises two images: the original retinal fundus image and the RadEx-sinogram image. These two modalities are concatenated and processed through the CNN architecture. The final output layer utilizes a sigmoid activation function for binary DR detection and a SoftMax activation function for multi-class DR severity grading. This multi-representation approach enables the model to leverage both spatial and RadEx-transformed features, improving the accuracy and robustness of DR detection and severity classification.
Time complexity
The time complexity of the optimized RadEx transformation algorithm can be analyzed based on the following factors:
-
Outer loop over q values: The loop runs from 0 to M with a step of M/10 Hence, it runs 10 iterations (since the step size is M/10 and the total range is M).
-
Inner loop over z values: For each value of q, the inner loop runs from M/2 to M, resulting in M/2 iterations.
-
Computing c values: In each iteration of the inner loop, a c value is computed, resulting in M/2 computations for each q value.
-
Sorting and range operations for c values: Sorting the c list takes O(n log n) time complexity, where n is the size of the list, approximately 10 × M/2 = 5 M.
-
Nested loops over q and cs values: For each q (10 iterations), the loop over cs runs. The length of cs is determined by the number of selected c values, which is approximately M/2.
Thus, the dominant factor in the time complexity is the nested loops over q values and cs, resulting in a time complexity of approximately O(10 × M/2 × M) = O(5M2). Therefore, the overall time complexity is approximately O(M2), considering the significant computational steps involved.
Experiments and results
Datasets
To evaluate the robustness and generalizability of the proposed RadFuse framework, we conducted experiments on two distinct datasets: APTOS-2019 and DDR.
APTOS-2019 Dataset18: This dataset, released as part of the 2019 Kaggle blindness detection challenge, contains 3,662 high-resolution color fundus images captured with various clinical cameras in controlled lab environments. The images in the dataset are categorized into five stages of DR: no DR (0), mild (1), moderate (2), severe (3), and PDR (4).
The dataset is categorized into five stages of DR: no DR (0), mild (1), moderate (2), severe (3), and PDR (4). The images were divided into training (80%), and testing (20%) sets for diagnosing and grading DR stages. As the data is highly unbalanced, we selected 10% of the training set for validation. To align with established literature, we designed the test set to include 20% of each category from the original APTOS-2019 dataset. This approach allowed for direct comparison with previous studies that used a similar evaluation fraction. The same test data was employed across all experiments to ensure consistency in evaluations. The distribution of images for both binary and multi-class classification tasks is detailed in Table 1.
DDR Dataset19: This dataset is the second largest publicly available dataset, comprising 13,673 fundus images divided into 6835 training, 2733 validation, and 4105 test images. Each image is graded into six categories by seven trained graders based on the International Classification of DR. Images of poor quality without clearly visible lesions are labeled as ungradable. Thus, the six levels are: no DR, mild DR, moderate DR, severe DR, PDR, and ungradable. For our experiments, we focused on the five-class DR grading task, excluding ungradable images. Consequently, our final dataset included 6320 training, 2503 validation, and 3759 test images. The distribution of the dataset is imbalanced in that the normal images are more than the DR images. To ensure consistency, we applied the same preprocessing steps and model configurations as with the APTOS-2019 dataset.
Experimental setup
We implemented our proposed strategy using PyTorch 2.1.2 on a Linux operating system, leveraging an Nvidia GeForce RTX 2080 Ti GPU for both training and testing. The AdamW optimizer was used with a learning rate of 1 × 10−4, which includes weight decay to improve generalization. A batch size of 16 was used for all experiments, and images were resized to 512 × 512 pixels for both the training and testing phases. The models, initialized with ImageNet pre-trained weights, were fine-tuned on the APTOS-2019 and DDR datasets. The training spanned 100 epochs, with early stopping applied after 10 epochs to prevent overfitting. Cross-entropy loss was the loss function, and data augmentation included blurring, flipping (vertical and horizontal), random rotation, sharpening, and adjustments to brightness and contrast. For our multi-representation model, we first concatenate the fundus image and the RadEx-transformed image to create a composite image. Then, augmentations are applied to the composite image.
Evaluation metrics
The imbalanced nature of the datasets presents challenges when using accuracy as a performance metric, as it tends to favor the majority class, often at the expense of the minority class. To overcome this, We prioritize the Quadratic Weighted Kappa (QWK) score as our primary evaluation metric to overcome this. QWK is designed to measure inter-rater agreement in multi-class classification by comparing expected and predicted scores, with values ranging from − 1 (indicating complete disagreement) to 1 (indicating perfect agreement). A higher QWK score reflects stronger model performance and agreement with true labels. In addition to QWK, we report other key metrics such as Area Under the Curve (AUC), F1-score, recall, precision, accuracy, and Matthews correlation coefficient (MCC), ensuring a comprehensive evaluation of the model’s performance, particularly in handling imbalanced data.
Comparative experiments
This section evaluates the effectiveness of our RadFuse approach on both the APTOS-2019 and DDR datasets on the five-class severity grading, a clinically relevant and more challenging task. We used three CNN architectures—ResNeXt-50, MobileNetV2, and VGG19- to ensure the improvements are not architecture-dependent, as described in the Methods section. We first compare our RadFuse results with image-only, and RadEx-only models and then benchmark against several state-of-the-art (SOTA) models on both datasets. All experiments followed the same setup and data augmentation techniques for fair comparison.
Results on APTOS-2019 dataset
For severity grading, Table 2 shows the performance of our RadFuse models compared to the image-only and RadEx-only configurations across multiple metrics. RadFuse consistently showed improved performance across all three architectures: ResNeXt-50, MobileNet, and VGG-19. Notably, the ResNeXt-50-based RadFuse model achieved the highest metrics, with a QWK of 93.24%, outperforming the image-only (90.62%) and RadEx-only (78.78%) models. It also attained an AUC of 96.41%, and an F1-score of 87.17%, significantly outperforming its image-only and RadEx-only counterparts. Moreover, the RadFuse models showed a significant increase in MCC across all architectures compared to image-only and RadEx-only models. For example, the ResNeXt-50 RadFuse model achieved an MCC of 80.50%, an improvement of 5.96 percentage points over the image-only model. This substantial increase indicates that the RadFuse model is more effective in handling class imbalance and accurately detecting underrepresented classes. VGG19 with RadFuse also showed enhanced performance, with the MCC increasing from 76.55%, in the image-only model to 79.4%, thereby improving the DR classification accuracy. MobileNetV2 showed more modest gains but still improved with the RadFuse approach, indicating the consistent benefits of the multi-representation approach.
RadEx-only models performed moderately well but did not reach the high-performance levels of image-only or RadFuse models, indicating that while RadEx transformation captures valuable features, these are most effective when combined with original images. The improvements across various CNN architectures underscore the reliable advantages of integrating RadEx-transformed sinogram images with traditional fundus images within the RadFuse framework.
To further verify the superiority of our results, we compare our best-performing ResNeXt-50-based RadFuse model with SOTA models for DR severity level grading. These include ADCNet25, CLS26,MIL-ViT27, CANet28, and Dual-Branch with Augmentation29. Moreover, we also included some SOTA generic models such as Swin30, ViG*31, and MDGNet13. All the comparison results are presented in Table 3. It should be noted, however, that differences in model architectures, methodologies, and experimental setups among these SOTA methods introduce variability in performance, which may limit the direct comparability to our results. Despite these variations, our proposed RadFuse consistently outperforms all these models by a significant margin. It can be seen that our proposed RadFuse outperforms all these models by a large margin. The crucial performance metric, QWK, reflects the superior reliability of our model, with a score of 93.24%, which is higher than the 92.00 reported by MIL-ViT27, 89.03 % from29, and 90.0 % from CANet28, showcasing RadFuse’s consistent ability to grade DR severity. In the APTOS-2019 dataset, recall is an important metric due to the imbalanced distribution of severity levels between DRs. This demonstrates how accurately the model is able to identify true positive cases, especially in minority groups. Our ResNeXt-50-based RadFuse model achieved a recall of 87.2 %, outperforming all other models by at least 5 percentage points, indicating its effectiveness in detecting various DR levels, including underrepresented categories. In addition to the recall, RadFuse achieved the highest accuracy (87.07 %), F1-score (87.17 %), and AUC (0.964), underscoring its robustness in DR severity classification and its ability to balance precision and recall.
The lower performance in the Severe and PDR categories can be attributed to the limited number of training samples for these classes in the APTOS-2019 dataset. As highlighted in Table 1, the dataset contains only 136 training images for the Severe category—the smallest among all categories—which hampers the model’s ability to learn distinctive features for accurate classification. Further analysis of the misclassifications in the figure below reveals that errors of the categories other than the PDR category are identified as neighboring categories. For instance, the MDGNet13 discriminated 51% of the Sever category as Moderate and 28% of the Mild category as Moderate. The reason for this may be that the difference between the DR images of the neighboring categories is very small, which leads to the misidentification of all the models. This confusion is understandable, given that the clinical differences between adjacent DR stages can be minimal and difficult to discern, even for experienced clinicians. Compared to the models discussed in the literature, our method demonstrates superior performance in recognizing the Severe category. While MDGNet13 achieved only 38% correct identification for Severe cases, our model correctly classifies 64%, indicating a significant improvement despite the small training samples for this category. This enhancement could be attributed to the inclusion of the non-linear Radex-based sinograms, which amplify critical features associated with advanced DR stages.
Stage-Specific Performance Metrics and Misclassification Rates on APTOS-2019 To provide a clear stage-wise error analysis, we reported the F1 score, recall true positive rate, and precision for each DR severity level in Table 4 and Fig. 7. The Normal category achieves an outstanding true positive rate of 99.0%, ensuring healthy patients are rarely misdiagnosed with DR—a critical aspect for reliable screening. The Moderate category also performs strongly, with an 88.02% true positive rate, which is essential for timely intervention. However, we observe elevated false negative rates in the Mild and Severe categories. Specifically, only 71.67% of Mild cases are true positive, with 23.33% of cases misclassified as Moderate. The Severe category’s true positive rate drops to 63.64%, with 30.30% misclassified as Moderate. The PDR stage exhibits the lowest true positive rate at 57.14%, including 28.57% misclassified as Moderate and 14.29% misclassified as Severe. These misclassification patterns highlight the challenge of distinguishing adjacent DR grades. This behavior aligns with prior work by CLS26 and MDGNet13, which similarly reported difficulties in Severe and PDR classification (see Fig. 7a, b).
Confusion matrices of three models on the APTOS-2019 dataset.
In addition to the multi-class severity grading, we evaluated the performance of our models on the binary classification task of distinguishing between DR and healthy cases. This task is critical for screening purposes, where the primary goal is to identify patients who require further ophthalmic examination. We trained and tested the same three CNN architectures—MobileNetV2, VGG19, and ResNeXt-50—on a re-labeled version of the dataset for binary classification, differentiating between healthy (Grade 0) and DR (Grades 1–4). The key performance metrics of our RadFuse-based approach are presented in Table 5. As shown, RadFuse outperformed both image-only and RadEx-only models across all three CNNs. ResNeXt-50-based RadFuse achieved the best overall performance, with the highest QWK (98.17%), accuracy (99.09%) and F1-score (99.11%), indicating superior classification capability. MobileNetV2 and VGG19 also showed strong discriminative ability, with AUC values exceeding 0.996 and QWK scores over 0.97. The confusion matrices (Fig. 8) offer insights into the classification outcomes. ResNeXt-50 demonstrated the fewest misclassifications, incorrectly predicting one DR case as healthy, and misclassifying four healthy cases as DR. VGG19 and MobileNetV2 had slightly higher error rates, with six and eight misclassifications, respectively. However, all models tended to misclassify healthy cases as DR, a trend that is acceptable in screening contexts, where minimizing missed diagnoses is crucial. This analysis further emphasizes that the models prioritize DR detection, which is essential in clinical screening to prevent undiagnosed DR progression. For a comprehensive evaluation, we also compared our best ResNeXt-50-based RadFuse model against recent SOTA methods for binary classification. As shown in Table 6, our approach outperformed all referenced methods, achieving the highest accuracy (99.09%) and recall (99.64%). Additionally, the high precision (98.58%) indicates a lower false-positive rate compared to other approaches, underscoring its effectiveness in screening applications.
Binary classification confusion matrices for our three different models on the APTOS-2019 dataset.
Results on DDR dataset
To substantiate the generalizability of our RadFuse framework beyond the APTOS-2019 dataset, we conducted additional experiments using the DDR benchmark dataset. Table 7 presents the performance metrics of our RadFuse models compared to Image-only and RadEx-only configurations across three CNN architectures: ResNeXt-50, MobileNet, and VGG-19. With the ResNeXt-50 architecture, RadFuse achieved a QWK score of 85.50%, outperforming the image-only model’s 81.00% by 4.50 percentage points. Key metrics including MCC, F1 Score, and AUC also improved, with the AUC rising from 0.921 to 0.948. For the VGG19 architecture, RadFuse demonstrated even substantial gains, achieving a QWK score of 84.82% and an AUC increase from 0.903 to 0.925. This trend continued with MobileNetV2, where RadFuse outperformed the image-only model in QWK (78.90% vs. 76.66%) and AUC (0.916 vs. 0.893), underscoring the effectiveness of integrating RadEx-transformedimages with original fundus images within the RadFuse framework across various CNN architectures.
To further validate the efficacy of our proposed RadFuse framework, we compared our best-performing ResNeXt-50- based RadFuse model with several SOTA models for five-class DR severity level grading. Table 8 presents a comprehensive comparison. Our RadFuse model achieved an accuracy of 83.32% and a superior QWK of 85.50%, positioning it as the second-highest performer in accuracy and the top performer in QWK among the compared models. Compared to attention-based models like CABNet36 and FA-Net37, which achieved QWK scores of 78.63% and 82.68%, respectively, RadFuse not only matches but exceeds their performance, underscoring its reliability across all DR classes. These results suggest that while attention mechanisms contribute significantly to focusing on specific lesion types and enhancing grading performance data, the integration of RadEx-transformed features in RadFuse provides an additional layer of diagnostic information, further enhancing its discriminative capability. ViT + CSRA38 achieves an accuracy of 82.35% but lacks the QWK metric, suggesting that while Vision Transformers are effective for DR grading, they may not capture fine lesion details as effectively as Radon-transformed representations or specialized attention models. In multimodal and hybrid models, strategies like DeepMT-DR39 and FA + KC- Net + R137 highlight the strengths of multitask and multimodal learning. DeepMT-DR achieved 83.60% accuracy and 80.20% QWK, illustrating the value of multitask learning in DR grading, while FA + KC-Net + R1 slightly outperformed with 83.96% accuracy and 84.76% QWK, showing the benefits of combining fine-grained attention with knowledge-based methods. Notably, RadFuse matches or slightly surpasses these complex models despite its relative simplicity, suggesting that non-linear Radon transformations as an added data representation enable effective capture of subtle lesion features similar to intricate multimodal systems.
Stage-specific performance metrics and misclassification rates on DDR dataset
To extend our stage-wise performance and error analysis, we display the full five-class confusion matrices for the image-only and RadFuse setups across all three architectures in Fig. 9. Notably, the reduction in misclassification rates for severe and PDR stages underscores the efficiency of RadFuse for enhancing their detection, as accurate detection of these stages is critical for timely intervention and prevention of vision loss. A detailed analysis of the confusion matrices for the ResNeXt-50 backbone shows that the image-only model exhibits a false negative rate of 70.42% for Severe (50/71 cases misclassified) and 22.55% for PDR (62/275 cases misclassified). In contrast, RadFuse reduces these false negatives to 52.11% for Severe (37/71) and 16.36% for PDR (45/275). This represents an 18.31% reduction in Severe grade misclassification and a 6.19% reduction in PDR grade misclassification, respectively. Similarly, the MobileNet architecture demonstrated a reduction in misclassification rates from 66.20% (47 out of 71 the severe cases misclassified) in the image-only model to 52.11% (37 out of 71 the severe cases misclassified) in the RadFuse model for the severe grade, and from 31.70% (87 out of 275 PDR cases misclassified) to 16.36% (45 out of 275 PDR cases misclassified) for PDR grade. For the VGG-19 architecture, the image-only model showed a severe grade misclassification rate of 57.75% (41 out of 71 cases misclassified), which was not reduced in the RadFuse model.
Confusion matrices for severity grading for the Image-only and RadFuse models on the DDR dataset.
Although this indicates a slight increase in DR3 misclassification for VGG-19, the overall performance metrics, including QWK and MCC, improved significantly, suggesting enhanced performance in other classes that compensate for this increase. The reduction in misclassification rates for these grades highlights RadFuse’s efficacy in capturing complex, non-linear lesion patterns. The per-class metrics in Table 9 further illustrate RadFuse’s advantages. RadFuse achieves an AUC of 0.973 for No DR (vs. 0.960 image-only) and substantially improves AUC for Mild (0.904 vs. 0.806), Moderate (0.921 vs. 0.909), Severe (0.958 vs. 0.857), and PDR (0.987 vs. 0.968). In terms of true positive rate, RadFuse boosts Mild from 77.12 to 82.36%, Moderate from 74.40 to 82.36%, Severe from 29.58 to 47.89%, and PDR from 22.55 to 23.00%. While the PDR gain is modest, the pronounced improvement in Severe detection underscores RadFuse’s clinical value. Additionally, F1 score and precision metrics consistently improve across all classes, reflecting a balanced enhancement in sensitivity and specificity.
Sensitivity analysis of augmentation strategies
We evaluated the effect of data augmentation strategies on the performance of RadFuse using the DDR dataset and the ResNeXt-50 architecture. Two approaches were compared: (1) joint augmentation of pre-generated RadEx-transformed images concatenated with fundus images and (2) dynamic recalculation of RadEx for each augmented version of the fundus images during training.
The results, detailed in Supplementary Table S1, demonstrate that RadFuse consistently outperformed both Image-only and RadEx-only models across both strategies, underscoring the value of integrating RadEx-transformed images with fundus images. Notably, recalculating RadEx for each augmentation improved performance, with QWK increasing from 85.0 to 86.0% and the F1 score rising from 83.0 to 85.0%. However, this improvement came at a significant computational cost, with training time per epoch increasing approximately 20-fold, from 3 min with joint augmentation to over an hour with recalculated augmentation. Considering this trade-off, we adopted the joint augmentation strategy for its computational efficiency. Future work will focus on optimizing the RadEx calculation process to reduce its computational demands during dynamic image augmentation while further enhancing its performance benefits.
Sensitivity analysis of RadEx transformation’s parameters
We conducted a sensitivity analysis to determine optimal values for the RadEx transformation parameters Δq and c. To evaluate the impact of step size Δq on model performance, we assessed three step sizes: Δq = M/100, Δq = M/50, and Δq = M/25, where M = 512 pixels represents the image dimension (for a 512 × 512 pixel image). The number of curvature parameters c was fixed at M. Table 10 summarizes the performance of the RadFuse and RadEx-only models at different Δq values on the APTOS-2019 dataset. Results show that performance differences between step sizes Δq = M/50 and Δq = M/100 are minimal for both RadFuse and RadEx-only models. Specifically, for RadFuse, QWK scores range from 91.41 to 93.10%, MCC from 79.53 to 80.86%, and AUC from 0.955 to 0.964 across these step sizes. Similarly, RadEx-only exhibits QWK scores between 72.18% and 78.78%, MCC from 61.99% to 66.06%, and AUC from 88.38% to 88.88%. These minor variations suggest that both models are relatively robust within the tested Δq range.
Among the evaluated step sizes, Δq = M/50 emerged as the optimal choice for both configurations. This setting achieves the best trade-off between performance metrics and computational cost, with RadFuse reaching peak values of QWK (93.10%), MCC (80.55%), and AUC (0.964), while RadEx-only attains its highest QWK and MCC at this step size. Additionally, this
configuration shows the lowest misclassification rates for severe and PDR stages (Supplementary Figure S1). Misclassifications between severe and PDR (e.g., severe predicted as PDR and vice versa) remain consistent across Δq values, indicating that step size mainly affects misclassifications with less severe grades like DR2 (Supplementary Figure S1). Thus, Δq = M/50 provides balanced performance with minimal misclassification. Overall, the RadEx transformation demonstrates robustness to parameter selection within this range. While slight performance fluctuations are observed, they do not detract from overall efficacy, suggesting that while RadEx benefits from tuning Δq.
The sensitivity analysis on the curvature parameter c also reveals that the RadFuse model performs reliably across different configurations, while careful tuning of c can enhance specific performance metrics. We evaluated three c settings as shown in Table 11 and Fig. 10, where M = 512 and Δq = M/2. At c = M/2, the RadFuse model achieves an optimal balance of performance and computational efficiency. Recall rates for Severe and PDR stages reach 51.52% and 54.00%, respectively, with the lowest rate of misclassifications among severe stages. This setting effectively captures essential features while avoiding major performance trade-offs, indicating that moderate adjustments to c enhance classification accuracy. Increasing c to M yields further improvements, especially for PDR recall (68.00%), with slight increases in misclassifications for the Severe stage. This suggests that while performance gains are achievable with higher \(c\) values, the model’s stability remains strong across the tested range, and c = M/2 offers a balanced solution with lower cost. Specifically, for the RadFuse model, c = M/2 achieved a balanced performance across all metrics, attaining the highest AUC of 0.964. Increasing c to M resulted in marginal improvements, with peak values for QWK (93.26%), MCC (82.81%), F1 Score (88.60%), and Accuracy (88.60%), though at increased computational cost. This configuration enhances the detection of specific features but does not universally improve severe stage performance, reinforcing c = M/2 as the optimal setting for balancing accuracy and efficiency. Detailed confusion matrices for these analyses are provided in Supplementary Figure S2.
Recall rates for severe and PDR grades across different curvature parameter configurations in the ResNeXt-50-based RadFuse model. The bar chart shows the specific recall rates for each grade under three configurations of the curvature parameter c.
Limitations and future work
Despite the promising results of RadFuse, several key limitations warrant further investigation. First, our evaluation on the APTOS-2019 and DDR datasets—while covering different camera systems and patient populations—does not encompass the full spectrum of ethnic diversity. We explicitly note this as a limitation and emphasize the importance of future validation on truly multi-ethnic cohorts (e.g., the AFIQ dataset) to assess and mitigate potential racial bias. Second, although APTOS and DDR differ in imaging protocols and demographic composition, we have not yet performed formal cross-dataset validation to quantify domain shift. While our independent experiments on each dataset demonstrate robustness, future work will include training on one dataset and testing on the other, as well as exploring domain-adaptation techniques to ensure consistent performance across varied clinical environments. Another limitation of our study is the absence of a dedicated explainability analysis, such as Grad-CAM visualizations, to interpret the decision-making process of the proposed models. Although our quantitative results demonstrate the performance gains from RadEx integration, future work will leverage explainability techniques to better understand and communicate the model’s behavior. This will help elucidate how sinogram features influence decision-making and further support the clinical trustworthiness and transparency of our approach. Moreover, while our optimized RadEx transform has proven effective in improving the feature representation of retinal images, its computational complexity could limit scalability, particularly in large-scale or resource-constrained environments. Addressing these limitations in future research will enhance both the practical applicability and robustness of the RadFuse approach. Future work will also focus on exploring synergies with emerging technologies, including the integration of RadEx with liquid neural networks for dynamic lesion tracking in longitudinal fundus series, and extending the RadEx transformation to three dimensions for Optical Coherence Tomography (OCT) volume analysis to capture volumetric lesion morphology. Finally, translating our methodological advancements into practical clinical tools necessitates the development of a structured clinical deployment framework. Future efforts will address real-time processing requirements to ensure efficient clinical integration, seamless compatibility with existing DICOM workflows, and navigation of regulatory pathways such as FDA approval for fusion-based diagnostic systems. Such steps will be vital to enable the real-world applicability and clinical adoption.
Related work
Deep learning techniques have become increasingly popular for DR detection and grading, outperforming traditional machine learning methods in accuracy and effectiveness29. Early work by Gulshan et al.5 and Ting et al.6 demonstrated that CNN could surpass general ophthalmologists in detecting referable DR. Subsequent studies aimed to refine DR grading granularity. Several researchers have developed various methods for the diagnosis of DR using different approaches. For example, Pratet al.7 used CNNs with data augmentation to improve model generalization, while Shanthi et al.40 enhanced the AlexNet architecture for better grading accuracy. More recent developments include Islam et al.’s supervised contrastive learning (SCL) approach, which significantly improved binary classification and multi-stage DR grading using the APTOS 2019 dataset26. The SCL model attained an accuracy of 98.36% and an AUC of 0.985 for binary classification, and 84.36% accuracy for five-stage grading. Hossein et al.29 also leveraged transfer learning with ResNet50 and EfficientNetB0 to address class imbalances, utilizing data from multiple datasets. More recently, MDGNet introduced a novel architecture that integrates local and global lesion features for improved DR grading13. Despite these advancements, accurately capturing early-stage DR features, such as microaneurysms and small hemorrhages, remains challenging due to their subtle nature and variability41. CNN-based models often struggle with capturing subtle and non-linear in DR-related lesions, making it difficult to differentiate between adjacent severity levels.
Image transformation techniques are essential in medical imaging, facilitating effective feature extraction crucial for precise diagnostics42. The Radon transform is a mathematical integral transform that computes projections of an image along specified directions. It has been widely used in computed tomography for image reconstruction from projection data15. The transform effectively captures linear features and has been applied in medical image analysis for tasks such lung and breast cancer classification16,43. Its key benefit is its ability to simplify a complex image structure into analyzable projections, essential for both image reconstruction and feature extraction. Its integral nature enables it to accumulate all image data along specified lines through an object, highlighting critical structures and features for disease detection. TAVAKOLI et al.12 demonstrated the Radon transform effectiveness in enhancing microaneurysm detection in retinal images. It is worth mentioning that these studies primarily focused on using extracted features from the sinogram rather than directly utilizing the sinogram images for classification tasks. However, the linear Radon transform struggles to capture non-linear features such as curved edges or irregular textures, typical in pathological conditions. Its linear assumptions may not accurately represent these complex features. Therefore, Non-linear extensions of the Radon transform have been proposed to capture more complex patterns in image data. Our previous work introduced the RadEx transform, a customized non-linear Radon transformation designed to improve feature extraction in chest X-ray images for COVID-19 detection17. The RadEx transform demonstrated the ability to highlight non-linear features not readily apparent in the spatial domain, suggesting its potential applicability to other medical imaging tasks. However, the application of non-linear Radon transformations for DR grading remains relatively unexplored. Therefore, building on the concept of the Radex transform, this study explores the use of an optimized version of RadEx, for enhanced feature extraction from fundus images in the context of DR grading.
Combining multiple imaging modalities can provide complementary information, leading to improved diagnostic performance multimodal deep learning models integrate data from different sources to enhance feature representation and capture complex patterns24. In the context of DR, multimodal approaches have been less common due to the reliance on retinal fundus images as the primary data source. Some studies have explored the use of optical coherence tomography (OCT) alongside fundus images to improve DR detection19. However, OCT imaging is not always readily available, especially in resource-limited settings. Our work builds upon these insights by introducing a new multi-representation deep learning framework that integrates non-linear RadEx transformations-based sinogram images with retinal fundus images. This approach enhances feature extraction and improves DR severity grading accuracy.
Conclusion
In this study, we introduced RadFuse, a multi-representation deep learning approach designed to enhance DR detection and severity grading by integrating an optimized non-linear RadEx transformation with original fundus images. Our method significantly improves CNN performance in detecting and classifying DR severity levels. Extensive experiments on the APTOS-2019 and DDR datasets demonstrate that RadFuse multi-representation consistently outperforms traditional image-only models and several SOTA methods. By capturing complex non-linear patterns and distributed retinal lesions, the RadEx-sinogram provides valuable discriminative information not easily visible in the original images. Moreover, because RadFuse fuses RadEx sinograms with spatial images at the input level, it remains entirely architecture-agnostic. This design allows seamless integration with any backbone—whether convolutional, transformer-based, or hybrid—making RadFuse broadly applicable to current and future state-of-the-art models in medical image analysis. These findings highlight the potential of combining spatial and transformed domain information to improve DR detection and grading.
Data availability
The APTOS dataset is openly available at: https://www.kaggle.com/competitions/aptos2019-blindness-detection (accessed on 12 August 2024). DDR dataset is available from https://github.com/nkicsl/DDR-dataset (accessed on 15 November 2024.)
References
International Diabetes Federation. IDF Diabetes Atlas 9th edn. (International Diabetes Federation, 2019).
Thanikachalam, V., Kabilan, K. & Erramchetty, S. Optimized deep CNN for detection and classification of diabetic retinopathy and diabetic macular edema. BMC Med. Imaging 24, 227. https://doi.org/10.1186/s12880-024-01406-1 (2024).
Kusakunniran, W. et al. Detecting and staging diabetic retinopathy in retinal images using multi-branch CNN. Appl. Comput. Inform. (2022).
Abràmoff, M. D. et al. Automated analysis of retinal images for detection of referable diabetic retinopathy. JAMA Ophthalmol. 131, 351–357 (2013).
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).
Ting, D. S. W. et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 318, 2211–2223 (2017).
Pratt, H., Coenen, F., Broadbent, D. M., Harding, S. P. & Zheng, Y. Convolutional neural networks for diabetic retinopathy. Procedia Comput. Sci. 90, 200–205 (2016).
Gadekallu, T. R. et al. Early detection of diabetic retinopathy using PCA-firefly based deep learning model. Electronics 9, 274 (2020).
Seoud, L., Hurtut, T., Chelbi, J., Cheriet, F. & Langlois, J. P. Red lesion detection using dynamic shape features for diabetic retinopathy screening. IEEE Trans. Med. Imaging 35, 1116–1126 (2015).
De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).
Bala, R., Sharma, A. & Goel, N. CTNet: Convolutional transformer network for diabetic retinopathy classification. Neural Comput. Appl. 36, 4787–4809 (2024).
Tavakoli, M. et al. Automated microaneurysms detection in retinal images using radon transform and supervised learning: Application to mass screening of diabetic retinopathy. IEEE Access 9, 67302–67314 (2021).
Wang, Y., Wang, L., Guo, Z., Song, S. & Li, Y. A graph convolutional network with dynamic weight fusion of multi-scale local features for diabetic retinopathy grading. Sci. Rep. 14, 5791 (2024).
Toft, P. The radon transform. Theory Implementation (Ph. D. Diss. Tech. Univ. Denmark, 1996).
Deans, S. R. The Radon Transform and Some of Its Applications (Courier Corporation, 2007).
Raaj, R. S. Breast cancer detection and diagnosis using hybrid deep learning architecture. Biomed. Signal Process. Control. 82, 104558 (2023).
Islam, A., Mohsen, F., Shah, Z. & Belhaouari, S. B. Introducing novel radon based transform for disease detection from chest x-ray images. In 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS) 1–5 (IEEE, 2024).
(APTOS), A. P. T.-O. S. Messidor-adcis. https://www.kaggle.com/c/aptos2019-blindness-detection (2019). Accessed: Sep. 11, 2021. [Online].
Li, T. et al. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf. Sci. 501, 511–522 (2019).
Graham, B. Kaggle diabetic retinopathy detection competition report (2015). https://www.kaggle.com/c/diabetic-retinopathy-detection/discussion/15801.
Xie, S., Girshick, R., Dollár, P., Tu, Z. & He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 1492–1500 (2017).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4510–4520 (2018).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at https://arxiv.org/abs/1409.1556 (2014).
Mohsen, F., Ali, H., El Hajj, N. & Shah, Z. Artificial intelligence-based methods for fusion of electronic health records and imaging data. Sci. Rep. 12, 17981 (2022).
Yue, G. et al. Attention-driven cascaded network for diabetic retinopathy grading from fundus images. Biomed. Signal Process. Control. 80, 104370 (2023).
Islam, M. R. et al. Applying supervised contrastive learning for the detection of diabetic retinopathy and its severity levels from fundus images. Comput. Biol. Med. 146, 105602 (2022).
Bi, Q. et al. Mil-vit: A multiple instance vision transformer for fundus image classification. J. Vis. Commun. Image Represent. 97, 103956. https://doi.org/10.1016/j.jvcir.2023.103956 (2023).
Li, X. et al. Canet: Cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE Trans. Med. Imaging 39, 1483–1493 (2019).
Shakibania, H., Raoufi, S., Pourafkham, B., Khotanlou, H. & Mansoorizadeh, M. Dual branch deep learning network for detection and stage grading of diabetic retinopathy. Biomed. Signal Process. Control. 93, 106168 (2024).
Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision 10012–10022 (2021).
Han, K., Wang, Y., Guo, J., Tang, Y. & Wu, E. Vision GNN: An image is worth graph of nodes. Adv. Neural Inf. Process. Syst. 35, 8291–8303 (2022).
Gong, L., Ma, K. & Zheng, Y. Distractor-aware neuron intrinsic learning for generic 2d medical image classifications. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II 23 591–601 (Springer, 2020).
Liu, S., Gong, L., Ma, K. & Zheng, Y. Green: a graph residual re-ranking network for grading diabetic retinopathy. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part V 23 585–594 (Springer, 2020).
Bodapati, J. D. et al. Blended multi-modal deep convnet features for diabetic retinopathy severity prediction. Electronics 9, 914 (2020).
Kumar, G., Chatterjee, S. & Chattopadhyay, C. Dristi: A hybrid deep neural network for diabetic retinopathy diagnosis. Signal Image Video Process. 15, 1679–1686 (2021).
He, A., Li, T., Li, N., Wang, K. & Fu, H. Cabnet: Category attention block for imbalanced diabetic retinopathy grading. IEEE Trans. Med. Imaging 40, 143–153 (2020).
Tian, M. et al. Fine-grained attention & knowledge-based collaborative network for diabetic retinopathy grading. Heliyon 9 (2023).
Gu, Z. et al. Classification of diabetic retinopathy severity in fundus images using the vision transformer and residual attention. Comput. Intell. Neurosci. 2023, 1305583 (2023).
Wang, X. et al. Joint learning of multi-level tasks for diabetic retinopathy grading on low-resolution fundus images. IEEE J. Biomed. Heal. Inform. 26, 2216–2227 (2021).
Shanthi, T. & Sabeenian, R. Modified alexnet architecture for classification of diabetic retinopathy images. Comput. Electr. Eng. 76, 56–64 (2019).
Lam, C., Yu, C., Huang, L. & Rubin, D. Retinal lesion detection with deep learning using image patches. Investig. Ophthalmol. Vis. Sci. 59, 590–596 (2018).
Arulmurugan, R. & Anandakumar, H. Early detection of lung cancer using wavelet feature descriptor and feed forward back propagation neural networks classifier. In Computational Vision and Bio Inspired Computing 103–110 (Springer, 2018).
Rani, K. V., Sumathy, G., Shoba, L., Shermila, P. J. & Prince, M. E. Radon transform-based improved single seeded region growing segmentation for lung cancer detection using AMPWSVM classification approach. Signal Image Video Process. 17, 4571–4580 (2023).
Acknowledgements
Open Access funding provided by the Qatar National Library.
Author information
Authors and Affiliations
Contributions
F.M. and S.B. contributed to the conceptualization of the study. F.M. performed the simulations and contributed to the writing of the original manuscript. S.B. and Z.S. analyzed the results. S.B. and Z.S. revised the manuscript. S.B. and Z.S. supervised the study. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Mohsen, F., Belhaouari, S. & Shah, Z. Integrating non-linear radon transformation for diabetic retinopathy grading. Sci Rep 15, 30706 (2025). https://doi.org/10.1038/s41598-025-14944-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-14944-7
This article is cited by
-
Artificial intelligence and precision medicine
Scientific Reports (2026)













