Introduction

Diabetic retinopathy (DR) is a progressive eye disease that causes due to prolonged high blood glucose levels among the diabetes mellitus patients of Type I and Type II category1. The patients with DR initially affect the retinal blood vessels and leads to the vision impairment and blindness among the patient2. As per the world health organization (WHO), around 422 million people were affected by diabetes in 2014. The disease count will be projected to grow to 552 million by 2030 that indicates the diabetes and its complications. DR accounts for about 2.6% of all cases of visual impairment like just myopia3. It also causes blindness due to retinal damage among the diabetic infected people. DR develops when excess glucose in the bloodstream damages small blood vessels in the retina that is light-sensitive tissue at the back of the eye4. The retina sends visual signals to the brain through the optic nerve and hence the damage in retina disrupts the vision. DR shows minimal symptoms in early stages that make it hard to detect without routine screenings5. Misdiagnosis occurs due to smaller visual cuesin under-resourced areas and the undiagnosed DR lead to irreversible blindness. Thus, early detection and monitoring are critical for the DR patients6.

Conventional methods designed for DR detection utilized manual diagnosis using retinal fundus photography, where ophthalmologists visually inspect images for clinical signs like microaneurysms, hemorrhages, and exudates7. The diagnostic tools like fluorescein angiography and optical coherence tomography (OCT) are used to detect the retinal changes in the images8. The conventional methods are clinically effective; still, the highly dependent of expert knowledge and time-consuming process due to labor-intensive degrades the performance9. Also, while considering the resource-limited regions that has the limited access to specialized equipment and skilled personnel lead to delayed or missed diagnoses10. The subjectivity nature of the manual assessment introduces variability and the infrastructure cost in the regular screening is prohibitively high for many low-income settings11.

To overcome the limitations in the manual diagnosis deep learning has developed for automated DR detection. Deep learning models based on convolutional neural networks (CNNs) has the capability of learning complex patterns from retinal images that enables the model to detect DR with high accuracy and consistency12,13. The deep learning models significantly reduce the need for manual feature extraction and expert requirementprovides the faster and more scalable screening processes14. Also, while considering the deep learning models that is trained and deployed in telemedicine platforms that provide the diagnostic capabilities to under-resourced or remote areas15. Still, traditional deep learning approaches require centralized data collection for model training that raises significant concerns in patient privacy, data security, and compliance with health data protection regulations16.

Federated learning (FL)based methods solves the challenges by enabling collaborative training of deep learning models across decentralized data sources without sharing raw patient data17. In federated learning models, each participating hospital trains the model locally and shares only the model updates with a central server that aggregates to form a global model18. The FL basedapproach preserves data privacy; still, the diversity of medical datasets from multiple institutions enhances model robustness and generalization19. FL based models by combining the predictive power of deep learning with the privacy-preserving and collaborative nature of federated learning provides enhanced detection accuracy by maintaining privacy20. Thus, this research introduces a FL based approach for DR detection. The major contributions of the research are:

Design of ECSRNet

The local and global model training is devised using the Efficient Cross Stage Recurrent Network (ECSRNet) that combines the strengths of ShuffleNet, Cross Stage Partial Networks (CSPNet), and Gated Recurrent Unit (GRU) to achieve high accuracy and computational efficiency.

Design of IKMC

The improved K-means clustering (IKMC) based user selection is devised in the proposed FedDRNet model for enhancing the communication efficiency.

Design of FedDRNet model

The proposed federated learning based diabetic retinopathy detection network (FedDRNet) model for the DR detection to enhance the accuracy of disease detection with minimal computation burden and enhanced privacy preservation. The data learning using ECSRNet enhances the detection accuracy and IKMC assist in reducing the communication round. Besides, the inclusion of Homomorphic encryption enhances the privacy preservation of the model.

The research is organized as: Section “Related works” details the related works with problem statement and the detailed proposed methodology in Section “Proposed methodology”. The results and discussions are presented in Section “Result and discussion” and the conclusion in Section “Conclusion”.

Related works

FL with Vision Transformer FLViT architecture was designed by21 to build a robust DR detection model using multiple data gathered through geographically distributed healthcare institutions by preserving the privacy of sensitive patient data. The designed model utilized a global ViT model for coordinating client. In this, each participating institution receives a copy of the current global ViT model. The training process of the model includes feeding batches of local images to the model by calculating the loss and updating the model’s weights based on the gradients. The aggregated weights were used to update the global ViT model residing on the central server. The designed model demonstrated the more accurate and reliable DR detection outcome. Still, frequent exchange of model updates between clients and the server was bandwidth-intensive and slow down the training process for large number of clients.

FL with deep learning (FedDL) was designed by22 to improve the accuracy of DR detection. The designed model integrates the five cutting-edge CNNs like VGGNet19, EfficientNetB7, AlexNet, DenseNet201 and ResNet50. Each client locally preprocesses its fundus images and applies data augmentation techniques to increase the size and variability of its training data. The central server aggregates these received model updates using a federated aggregation algorithm. After global training, each client further fine-tunes the global model on its local data. The FedDL model demonstrates higher accuracy compared to traditional FL methods, leading to more reliable diagnoses. Here, the consideration of five various deep learning models enhances the computation burden of the model.

FL with convolutional neural network (FedCNN) was designed by23. The designed model incorporates the Weiner and median filtration (Weinmed) to perform noise reduction by removing salt-and-pepper noise and preserving edges. In this, VGG architecture (VGG-19) based feature extraction and CNN based classification was devised for the final classification of DR severity levels. FedCNN ensures that sensitive patient retinal images remain within the local medical institutions by protecting patient privacy and complying with data security regulations. Malicious institutions could potentially send harmful model updates to the central server that may compromise the integrity of the global model.

Differentially private federated learning (DPFL) was designed by24 to enable collaborative training of DR detection. The designed model considered differential privacy (DP) into the federated learning process to provide rigorous privacy guarantees. In this, the clients participate in subsequent training rounds and update only the latest checkpoint from the server instead of the entire model that reduces the communication overhead using the checkpoint criteria. The framework based on global collaborative systems improving access to screening and early detection worldwide. Still, the addition of noise for differential privacy inevitably leads to some loss in model accuracy.

FL with dynamically weighted federated averaging (data weighted fed) was designed by25 for effective fundus disease diagnosis from ophthalmic images. The designed model utilized a custom model aggregation method, wherein the weights of each client’s local model in the global aggregation were determined dynamically based on the size of their local training dataset. In this, the clients with larger datasets have a greater influence on the global model update. For this, the client selection was devised using k-Client selection training, wherein only a subset with k clients from the available clients were selected to participate in each round of federated learning. The wFedAvg aims to improve accuracy; still, it might disproportionately favor institutions with very large datasets that overlooking valuable information from smaller but diverse datasets.

A federated learning approach that preserved data privacy by enhancing accuracy in distributed environments was designed by26. The method utilized decentralized training to mitigate data leakage risks and achieved robust performance across heterogeneous networks. The framework demonstrated scalability, efficiency, and strong adaptability against evolving cyber threats. A blockchain-driven customized federated learning framework to secure data was designed by27. The approach ensured data confidentiality and integrity through blockchain consensus by enabling collaborative learning across medical devices. The designed model offered efficient computation, privacy preservation, and resilience against cyber-attacks, significantly improving IoMT security and performance.

Problem statement

The occurrence of diabetic diseases among the people causes the DR that necessitates the development of early detection and classification model more accurately for proper medication. Traditional deep learning methodsof DR disease prediction depend on the centralized data collectionthat enhances the privacy concerns due to the sensitive nature of medical records and imaging data. The privacy preservation is the challenging aspect among healthcare scenariowhere data confidentiality, regulatory compliance, and patient trust are significant. To address the challenge associated with deep learning methods distributed medical data without compromising security is introduced by the FL based methods. Still, the inefficient learning capability, higher computation burden and several other factors limits the FL based methods. Thus, thisresearch introduces a model to tackle the issue by introducing Federated Learning based Diabetic Retinopathy Detection Network (FedDRNet) model. The proposed model includes Efficient Cross Stage Recurrent Network (ECSRNet) that combines the strengths of ShuffleNet, CSPNet, and GRU to achieve high accuracy and computational efficiency. Besides, Improved K-Means Clustering (IKMC) based user selection enhances the communication efficiency by reducing the communication rounds.

Proposed methodology

DR is a severe complication of diabetes that makes vision impairment and blindness for the patients. Traditional deep learning-based DR detection models require centralized data gathering that may generates the privacy concerns due to the sensitive nature of medical images. To address the issue, FL based models are utilized due to the privacy-preserving solution by enabling collaborative model training without directly sharing patient data. The FL models are efficient in preserving the privacy through sharing the weights of the patient’s data instead of sharing the patient’s data directly. Thus, this research proposes an efficient and privacy-preserving DR detection framework named Federated Learning based Diabetic Retinopathy Detection Network (FedDRNet) model. The proposed model utilized Efficient Cross Stage Recurrent Model (ECSRNet) for designing the local and global model training. The proposed ECSRNet combines the ShuffleNet, Cross Stage Partial Networks (CSPNet), and Gated Recurrent Unit (GRU) for enhancing the disease detection accuracy by learning the spatial and temporal features more effectively with minimal computation burden. In the proposed FedDRNet model, the privacy preservation is further enhanced using the Homomorphic encryption. The privacy preservation prior to the weight sharing assists the model to enhance the security of the model. Besides, Improved K-Means Clustering (IKMC) based user selection is devised further for enhancing the communication efficiency. The IKMC based user selectiongroups the clients based on the similarity of their data distributions. In addition, the image quality is enhanced using the Multi-scale Gaussian Bilateral Filtering based pre-processing technique. The structure of proposed FedDRNet model for DR Detection is portrayed in (Fig. 1).

Fig. 1
figure 1

Proposed FedDRNet model for DR detection.

Local model

The local model is deployed on each client device called hospitals, wherein patient’s retinal fundus images are stored. The local model is responsible for learning DR-related patterns from the hospitals data without ever sharing the data externally. The traditional FL method employed DNN for learning the characteristics of patient’s retinal fundus images. In the proposed model, a novel deep learning architecture ECSRNet is proposed for learning the retinal fundus images. It is helpful to extract both spatial and temporal features from the image for enhancing the disease detection accuracy. Each client hospitals train the model locally using its own data and optimize the loss function using Adam optimizer.

Server model

The server model is the central aggregator in the federated learning system. The server in FL coordinates communication between the clients and aggregates model updates received from the client. The server initializes the global model and distributes it to the clients at the beginning of each round. After receiving updates from the clients post-local training is devised and the server aggregates these updates to refine the global model. In the proposed FedDRNet model, the client selection is employed using the IKMC model. The designed IKMC model assists the proposed model in reducing the communication round for enhancing the performance with minimal computation burden.

Aggregation layer

The aggregation layer is utilized for combining the model weights from all clients to produce the updated global model. The FedAvg is utilized in the aggregation layer for computing theweighted average of the client’s model parameters.

Client side pre-processing

The image pre-processing is devised using the MSBF for enhancing the quality of the image.

Enhanced privacy preservation

The privacy preservation is devised using the Homomorphic encryption method.

Data acquisition

The input data is acquired from the publically available dataset28, which is acquired from Diabetic Retinopathy 2015 Data Colored Resized dataset.

Pre-Processing using multi-scale Gaussian bilateral filtering

The multiscale bilateral Gaussian filter (MSBF) is utilized for enhancing image quality by preserving edge features. The MSBFsmooths uniformly by considering the spatial and intensity information to selectively smooth regions without losing edge details29. The MSBF filter at multiple scales allows it to capture both fine and coarse image features that ensureenhanced visual interpretability. The filtered outcome of the bilateral Gaussian filter is:

$$B_{fil} \left( d \right) = \frac{1}{{V_{d} }}\sum\limits_{h \in \phi } {M_{l} \left( {\left\| {d - h} \right\|} \right) \cdot M_{u} \left( {\left| {B\left( d \right) - B\left( h \right)} \right|} \right) \cdot B\left( h \right)}$$
(1)

where, \(B\left( d \right)\) notates the intensity at pixel \(d\) in the original image, \(B_{fil} \left( d \right)\) signifies the output intensity at pixel \(d\) after filtering, and \(h \in \phi\) interprets the neighboring pixels within a spatial window. \(M_{l} \left( {\left\| {d - h} \right\|} \right)\) signifies the spatial Gaussian kernel, \(M_{u} \left( {\left| {B\left( d \right) - B\left( h \right)} \right|} \right)\) signifies the range Gaussian kernel and \(V_{d}\) defines the normalization factor to ensure weights sum to 1.

$$V_{d} = \sum\limits_{h \in \phi } {M_{l} \left( {\left\| {d - h} \right\|} \right) \cdot M_{u} \left( {\left| {B\left( d \right) - B\left( h \right)} \right|} \right)}$$
(2)

Spatial Gaussian kernel

The spatial Gaussian kernel is utilized to measure how close the neighboring pixels to focus on local neighborhoods.

$$M_{l} \left( {\left\| {d - h} \right\|} \right) = \exp \left( { - \,\frac{{\left\| {d - h} \right\|^{2} }}{{2\chi_{l}^{2} }}} \right)$$
(3)

where, \(\chi_{l}^{{}}\) is useful for controlling the spatial influence.

Range Gaussian kernel

The range Gaussian kernel is utilized to measure how similar the pixel values for preserve edges.

$$M_{u} \left( {\left| {B\left( d \right) - B\left( h \right)} \right|} \right) = \exp \left( { - \,\frac{{\left( {B\left( d \right) - B\left( h \right)} \right)^{2} }}{{2\chi_{u}^{2} }}} \right)$$
(4)

where,\(\chi_{u}^{{}}\) is useful for controlling the intensity difference.

Multi-scale filtering

Multi-scale filtering uses multiple versions of the image, each processed with a bilateral filter using different spatial and intensity intensities. It is formulated as:

$$B_{fil}^{q} \left( d \right) = \frac{1}{{V_{d}^{q} }}\sum\limits_{{h \in \phi^{q} }} {M_{l}^{q} \left( {\left\| {d - h} \right\|} \right) \cdot M_{u}^{q} \left( {\left| {B^{q} \left( d \right) - B^{q} \left( h \right)} \right|} \right) \cdot B^{q} \left( h \right)}$$
(5)

where, \(B^{q}\) signifies the image at scale \(q\),\(\phi^{q}\) notates the neighborhood window at scale \(q\), and \(\chi_{q}^{u} \,\,\,and\,\,\,\chi_{q}^{l}\) specifies the scale-specific spatial and range intensities.After filtering at all scales, the enhanced image \(B_{e}\) is obtained by combining the multi-scale results:

$$B_{e} = \sum\limits_{q = 1}^{Q} {\lambda_{q} \cdot B_{fil}^{q} }$$
(6)

where,\(\lambda_{q}\) signifies the weight for each scale and \(\sum\limits_{q} {\lambda_{q} } = 1\), which is the normalization value.The outcome of the pre-processing using MSBF enhances lesion contrast while keeping the retinal vessel edges intact and it reduces illumination artifacts. The pre-processed image is utilized for training the local model of FedDRNet.

Local model training using ECSRNet

The efficient cross stage recurrent network (ECSRNet) based federated learning architecture is designed to balance performance and resource efficiency by combining ShuffleNet, cross stage partial networks (CSPNet), and gated recurrent unit (GRU). Each of the local client like hospitals trains the local model using the retinal fundus images that are processed through a lightweight ShuffleNet backbone to extract low-cost spatial features that are refined through CSPNet that splits the feature flow to reduce memory consumption by preserving enhanced feature representations. The spatial features are passed sequentially into a GRU for capturing the progression and temporal patterns of the disease30.The model output provides features of DR disease that is used for transmitting to the server without sharing the raw patient data. The structure of ECSRNet is presented in (Fig. 2).

Fig. 2
figure 2

Structure of ECSRNet based training.

The CSPNet is utilized in the model training due to its capability in enhancing both training stability and better feature representation. By splitting the input feature map into two parts and processing only one through ShuffleNet modules and then merging at the final stage helps to preserve gradient flow that is essential for deep networks. The design minimizes redundancy and reduces memory usage while still maintaining enhanced feature representation. For the proposed diabetic retinopathy detection, CSPNet ensures that key features like blood vessel morphology and lesion boundaries are retained during down-sampling and deep processing that contributes to more accurate and robust detection for the varied-quality fundus images across clients31. The CSPNet splits the features two parts and its passes the features through two various paths like main path and short path. It is defined as:

$$k_{m1} ,k_{m2} = split\left( {k_{m}^{r} } \right)$$
(7)

It process only one part \(k_{m1}\) with the ShuffleNet blocks and the other through the short path. The ShuffleNet block is a lightweight feature extractor that is designed with grouped convolutions and channel shuffling, which is helpful in reducing the number of operations (FLOPs) and memory usage compared to traditional CNN based models. In federated learning, for the resource-constrained environmentlow computational cost is significant by preserving the spatial patterns of the DR image with small lesions32. Also, architecture of ShuffleNet provides the parallel computation that makes the mode ideal for deployment across heterogeneous devices. Let, the input denoted as \(k_{m1}^{{}}\) is fed into the ShuffleNet module and its outcome is denoted as:

$$k_{m1}^{\prime } = ShuffleNet\left( {k_{m1} } \right)$$
(8)

where, \(k_{m1}^{\prime } \in R^{D\prime \times A\prime \times E\prime }\), which is the intermediate feature map. Then, at the final stage, the features from both the main path and short path are concatenated together as:

$$k_{m}^{con} = concat\left( {k_{m1}^{\prime } ,k_{m2} } \right)$$
(9)

Then, flattening based transformation is devised for mapping the 2D spatial feature maps into 1D vectors suitable for feeding into recurrent layers based on GRUs. It allows the model to connect spatial patterns across timefor capturing the sequences of fundus features. While considering the DR detection, patients has the small changes in lesion appearance or distribution that evolve gradually. Flattening allows the model to capture these shifts and the flattening is defined as:

$$u_{m} = Flatten\left( {k_{m}^{con} } \right)$$
(10)

The GRU is designed for modeling temporal dependencies for capturing how a patient’s condition changes over time. It uses update and reset gates to control memory flow and allows it to retain long-term disease patterns while discarding irrelevant noise. It is important for DR, where disease progression is not always linear33. GRU helps in modeling finer characteristics of micro-vascular damage and in stabilizing predictions by aggregating features. The lightweight of GRU compared to LSTM aligns with the efficiency goals of federated learning and making it feasible to deploy in low-power environments. Let, the input obtained by the GRU is the outcome of flatten layer and is represented as \(u_{m}\).

$$g_{m}^{l} = \gamma \left( {V_{g} u_{{}}^{m} + I_{g} l_{d}^{m - 1} } \right)$$
(11)
$$q_{m}^{d} = \gamma \left( {V_{q} u_{{}}^{m} + I_{q} l_{d}^{m - 1} } \right)$$
(12)
$$\hat{l}_{m}^{d} = \tanh \left( {V_{l} u_{{}}^{m} + I_{l} \Theta l_{d}^{m - 1} } \right)$$
(13)
$$l_{m}^{d} = \left( {1 - g_{d}^{m} } \right)\Theta l_{d}^{m - 1} + g_{d}^{m} \Theta \hat{l}_{m}^{d}$$
(14)

Here, weight matrices for gates from input-to-hidden is signified as \(V_{g}\), \(V_{q}\) and \(V_{l}\), and weight matrices for gates from hidden-to-hidden is signified as \(I_{g}\), \(I_{q}\) and \(I_{l}\) respectively. Hidden state at current time step is defined as \(l_{d}^{m}\), candidate hidden state is signified as \(\hat{l}_{m}^{d}\) and hidden state from previous time step is defined as \(l_{d}^{m - 1}\). Update gate vector is signified as \(g_{m}^{l}\) and reset gate vector is symbolized as \(q_{m}^{d}\). The acquired weights are further secured by performing Homomorphic encryption.

Homomorphic encryption

Homomorphic encryption allows computations on encrypted data. Thus, the Homomorphic encryption is employed in the proposed FL method for enhancing the security further34. The steps considered in the Paillier-style based Homomorphic encryption are:

  • Two large prime numbers are chosen and is symbolized as:\(g,y\)

  • Generation of private and public keys

  • The public key is generated as: \(h = g \cdot y\)

  • For performing the data decryption, the key generation is devised as: \(\gamma = LCM\left( {g - 1,y - 1} \right)\), where LCM is the least common multiple.

  • For generating the private key, random number is identified as: \(r \in Z_{{h^{2} }}^{ * }\), wherein \(GCD\left( {Q\left( {r^{\gamma } \,\bmod \,h^{2} } \right),h} \right) = 1\)

  • The estimation of \(Q\left( s \right) = \frac{s - 1}{h}\)

  • Then, the private key estimation is formulated as:\(\eta = Q\left( {r^{\gamma } \,\bmod \,h^{2} } \right)^{ - 1} \,\bmod \,h\)

  • The encryption of the weight vector is employed as: \(p = r^{a} \cdot q^{h} \,\bmod \,h^{2}\)

where, \(p\) notates the cipher text,\(a\) symbolizes the plaintext message, and \(r\) is the generator (part of the public key).\(q\) notates the random number (fresh each encryption) to ensure non-determinism,\(h\) interprets the public modulus and \(h^{2}\) ensures the cipher text space is large enough for security35.

  • Extract the original message \(a\) using the private key \(\gamma\) and decryption function \(Q\). The decryption is formulated as: \(a = Q\left( {p^{\gamma } \,\bmod \,h^{2} } \right) \cdot \eta \,\bmod \,h\)

Thus, the Homomorphic Encryption based encryption to encrypt local model outcomes in Federated Learning for DR detection provides strong privacy protection of patient’s data. Using the Homomorphic Encryption outcome, the computations and model aggregation are performed directly on encrypted data without decrypting the patient data trained in local model updates36. Here, the inclusion of Homomorphic Encryption prevents privacy attacks and gradient inversion that ensures the authorized clients decryption by the aggregated global model. Hence, the enhanced security is acquired through the inclusion of Homomorphic Encryption in the proposed FL model.

Clustering based user selection

In federated learning (FL), clients (hospitals) have the data in non-independent and identically distributed (non-IID) form, which indicates the local data distributions will vary significantly. The heterogeneity of the data poses a challenge and hence the random client selection leads to inefficient training and unstable global model performance due to biased or unrepresentative updates. To address this issue, Improved K-means clustering (IKMC) is utilized in the client selection process, wherein grouping of clients is devised based on the similarity of their data distributions or model updates37. By forming clusters and selecting representative or diverse clients from each group, IKMCenhances the quality of participation that leading to more stable model convergence. It also assist in improving training efficiency, and generalization capability of the model. Initially, the Adjacency Matrix is generated by representing all users as nodes in a graph. Edges are weighted based on similarity.

$$S_{l,h} = CS\left( {p_{l} ,p_{h} } \right)$$
(15)

where, adjacency matrix is notated as \(S_{l,h}\), \(p_{l}\) is the feature vector of client \(l\) and \(CS\left( {p_{l} ,p_{h} } \right)\) signifies the similarity that is estimated based on cosine similarity. The construction of adjacency matrix is utilized for encoding the relationship strength between clients based on the data.Then, degree matrix (K) and Laplacian (Q) is designed for encoding the structure of the data graph and is used to find cluster boundaries through eigenvalues.

$$K_{ll} = \sum\limits_{h} {S_{l,h} }$$
(16)

The Laplacian matrix is estimated as:

$$Q = K - S$$
(17)

The normalized outcome of the Laplacian matrix is estimated as:

$$Q_{norm} = 1 - K^{ - 1/2} SK^{ - 1/2}$$
(18)

The first k eigenvectors, the eigenvectors with the smallest eigenvalues of the normalized Laplacian:

$$Q_{norm} ,v_{h} = \beta_{h} v_{h} \;for\,\,h = 1,...d$$
(19)

where,\(v_{h}\) represents the \(h^{th}\) eigenvector and \(\beta_{h}\) corresponding eigenvalue. All the eigenvectors are stacked to form a new matrix:

$$V = \left[ {v_{1} ,v_{2} ,...v_{d} } \right]$$
(20)

The eigen decomposition converts the data from non-linear space to a linearly separable space that makes the clustering simpler. Then, the K-Means clustering is applied to the matrix for grouping the similar users.

Here, grouping clients with similar data enables the structured selection instead of random selection for minimizing the number of communication rounds.

Result and discussion

The proposed FedDRNet model is implemented in PYTHON programming language and is compared with existing methods like FLViT21, FedDL22, FedCNN23, and data weightedfed25 to demonstrate the superiority of the proposed model. The proposed model is assessed using the diabetic retinopathy 2015 data colored resized dataset28. The implementation parameter of the proposed model is presented in (Table 1). It quantifies and controls the amount of information leakage during the model training and aggregation process.

Table 1 Simulation parameters.

Dataset description

The diabetic retinopathy 2015 data colored resized dataset28 is utilized for evaluating the proposed FedDRNet model. The dataset comprises of 35,126 retina scan images with five various classes like Proliferate_DR, Severe, Moderate, Mild and No_DR. All the images in the dataset are resized to the pixel of 224 × 224 size. Also, the Messidor dataset38 is utilized for evaluating the proposed model. It contains around 1748 high-resolution fundus images from 874 examinations.

Experimental outcome

The experimental outcome of the proposed FedDRNet model along with the pre-processing and the input image is presented in (Fig. 3).

Fig. 3
figure 3

Experimental outcome.

Accuracy analysis presented in Fig. 4 is the ratio of correctly predicted instances to the total number of predictions made on the DR image. The consideration of ShuffleNet based CSPNet provides the lightweight but high-representation feature extraction for improving learning capability without over-fitting. Besides, GRU-based temporal modeling helps in learning sequential patterns andfiner variations across image sequences. The FedDRNet modelensures a more diverse dataset exposure without centralizing data for enabling enhanced generalization. Besides, IKMCensures that selected clients represent various distributions thatreduce the model bias. The detailed analysis of FedDRNet model based on accuracy is presented in (Table 2).

Fig. 4
figure 4

Accuracy of DR detection methods.

Table 2 Accuracy analysis.

Precision analysis measures how many of the predicted positive cases were actually positive that is presented in Fig. 5 with its detailed analysis in (Table 3). The proposed FedDRNet modelwith the ECSRNet improves the learning of fine-grained features by controlling gradient flow and reducing information loss in shallow layers. Besides, ECSRNetallows the context-aware learning using the GRU lead to minimize the false positives. The FedDRNet modelwith diverse clientsand IKMCbased client selection improves the model’s ability to distinguish true disease markers from similar-looking non-disease features that assist in improving precision of the proposed model.

Fig. 5
figure 5

Precision analysis.

Table 3 Precision analysis.

Recall is utilized to measure how many actual positive cases were correctly predicted for the DR detection, the outcome of the analysis is presented in Fig. 6 and its detailed analysis in (Table 4). The ECSRNetmodel’s efficient feature learning with CSPNet based on ShuffleNetand GRUfor capturingfiner pathological cues in DR like microaneurysms or hemorrhages. The FedDRNet modelreduces the chance of missing rare or complex cases using the enhanced generalization capability through the acquisition of enormous features acquired through the IKMC client selection. The IKMCenhances recall by ensuring that under-represented distributions are also included in the training updates.

Fig. 6
figure 6

Recall based analysis.

Table 4 Recall based analysis.

F1-Score analysis presented in Fig. 7 is employed to measure the balance between both precision and recall with its detailed analysis in (Table 5). The ECSRNetarchitecture makes better trade-offs between false positives and false negatives. Besides, IKMC client selectionbased on clustering avoids over-fitting to dominant client data for enhancing model fairness and maintaining a strong balance between precision and recall. The FedDRNet model with homomorphic encryption promotes secure and consistent client participation for enabling a stable model with balanced predictive capability.

Fig. 7
figure 7

F-Score based analysis.

Table 5 F-score based analysis.

Specificity presented in Fig. 8 with its detailed analysis in Table 6 measures how well the model identifies negative (healthy) cases. The ECSRNet with lightweight and deep convolutional layers of ShuffleNet helps in fine feature discrimination for reducing misclassification of healthy images as DR. The model avoids false alarms by learning inter-class boundary features more robustly. The FedDRNet model increases exposure to healthy patient samples from varied demographics assist in enhancing the specificity through broader training exposure.

Fig. 8
figure 8

Specificity based analysis.

Table 6 Specificity based analysis.

The accuracy-loss analysis of FedDRNet model is presented in (Fig. 9). The consideration ofECSRNet in the FedDRNet model enhances the efficiency through lightweight components to reduce communication and computation costs. Besides, IKMCensures that only the most representative clients are selected that reduces update noise and convergence delay. Besides, GRU’s capacity to retain temporal information with fewer parameters minimizes local model complexity and helps with fewer rounds of communication. Thus, enhanced accuracy with minimal loss is derived by the proposed FedDRNet model.

Fig. 9
figure 9

(a) accuracy and (b) loss analysis.

The confusion matrix showing accurate and misclassifications by the proposed FedDRNet model is presented in (Fig. 10). The enhanced outcome by the FedDRNet model is accomplished due to accurate modeling of both disease-specific and non-disease features.

Fig. 10
figure 10

Confusion matrix.

Analysis for various noise levels

The capability of the proposed FedDRNet model for various levels of noise high, moderate and no noise is presented in (Fig. 11). The proposed method performed well with all the levels of noise compared to the existing methods due to the integrated design of ECSRNet based feature extraction, IKMC based client selection and MSBF based filtering.

Fig. 11
figure 11

Accuracy analysis of the proposed model with various noise levels: (a) High noise, (b) moderate noise and (c) no noise.

The accuracy analysis with various data size based on training percentage is presented in (Fig. 12), which demonstrates that the FedDRNet model acquired high detection accuracy with reduced training data and outperformed existing models. The improvement is employed due to the incorporation of ECSRNet with ShuffleNet, CSPNet, and GRU that makes the model to extract rich spatial–temporal features with minimal computation burden. Besides, the multi-scale Gaussian bilateral filtering further enhances input image quality and enables the model to learn more significant features irrespective of training percentage.

Fig. 12
figure 12

Analysis based on data size.

The memory computations based on the Floating Point Operations (FLOPS) illustrated in (Fig. 13) demonstrates that the proposed FedDRNet acquired lower computational complexity compared to existing federated learning models. The consideration of ShuffleNet and CSPNet reduces redundant convolution operations by maintaining strong feature representation. Here, the reduction of FLOPS in the proposed model minimizes the memory and acquired improved efficiency with limited hardware capabilities.

Fig. 13
figure 13

Analysis based on FLOPS.

The latency analysis showed in Fig. 14 demonstrates that FedDRNet has acquired lower inference delay due to the lightweight design. Besides, the homomorphic encryption is employed only for secure weight sharing, which minimizes cryptographic computation during inference. Similarly, the analysis based on training time demonstrates that FedDRNet acquired faster completion time due to efficient feature learning and reduced model complexity. The design of ECSRNet architecture ensures that fewer epochs are required to reach optimal accuracy. Besides, the IKMC reduces communication delays by selecting clients with similar data distributions.

Fig. 14
figure 14

Analysis based on training time and latency.

The convergence rounds analysis by varying the number of clients portrayed in (Fig. 15a) demonstrates that FedDRNet converges faster compared to existing models as the client count increases. In this, the IKMC ensures that updates from each communication round are highly relevant to the global model that reduces the number of iterations needed for stable learning. Thus, the convergence rounds for the proposed method is minimal compared to the existing methods. Similarly, the analysis based on computation cost by varying number of clients presented in (Fig. 15b) demonstrates that FedDRNet reduces the per-client processing overhead. The proposed model with lightweight architecture minimizes the required computational resources and the clustering-based client selection reduces unnecessary computation for clients with low-quality or irrelevant data. Thus, when the number of clients is increased, the proportional increase in computation cost is much lower compared to existing models.

Fig. 15
figure 15

Analysis based on (a) convergence speed and (b) computation cost.

The ablation study of the proposed model is presented in (Fig. 16). The analysis based on accuracy demonstrated that role of each modules in the proposed FedDRNet model in detecting the DR. The results demonstrate that IKMC effectively mitigates the adverse impact of non-IID data by achieving higher overall accuracy, faster convergence, and more balanced performance across clients compared to baseline methods. This indicates that the consolidation strategy preserves critical knowledge from diverse data sources and reduces the bias introduced by skewed distributions, confirming the robustness of IKMC in heterogeneous environments.

Fig. 16
figure 16

Ablation study.

The analysis of the Homomorphism encryption algorithm for various numbers of users is made to demonstrate the scalability of the model and is presented in (Fig. 17).

Fig. 17
figure 17

Analysis of homomorphic encryption algorithm.

The analysis of the proposed method with the data acquired from the Kaggle and Messidor data is presented in (Fig. 18). Here, the analysis demonstrated the better performance by the proposed model for both the datasets for all the assessment measures.

Fig. 18
figure 18

Analysis with two various data.

Statistical analysis based on Friedman test is presented in Table 7 for the federated learning based DR detection models. The Friedman test checks the statistically significant differences in performance across the DR detection models. Here, the test statistic measures how far the observed rank sums deviate from what would be expected under the null hypothesis. Here, p < 0.05 denotes the rejection of null hypothesis. It indicates the statistically significance based on difference in performance among the models. Besides, the proposed model consistently achieves the highest performance ranks across all rounds.

Table 7 Friedman test.

The comparison of the proposed FedDRNet with the baseline DR methods from the existing literatures is presented in (Table 8).

Table 8 Baseline method comparisons.

Here, the analysis demonstrates the superiority of the proposed FedDRNet model in detecting the DR disease.

Comparative discussion

The comparative discussion based on the best outcome is presented in (Table 9). The accuracy estimated by proposed FedDRNet acquired the accuracy of 98.6%, which is 5.07, 3.75, 1.83, and 1.12% enhanced outcome compared to FLViT, FedDL, FedCNN, and DataWeightedFed methods. The precision estimated by proposed FedDRNet acquired the accuracy of 98.8%, which is 6.48, 4.76, 3.24, and 2.53% enhanced outcome compared to FLViT, FedDL, FedCNN, and DataWeightedFed methods. The recall estimated by proposed FedDRNet acquired the accuracy of 98.3%, which is 6.61, 4.78, 3.15, and 2.24% enhanced outcome compared to FLViT, FedDL, FedCNN, and DataWeightedFed methods. The F1-Score estimated by proposed FedDRNet acquired the accuracy of 98.6%, which is 6.59, 4.77, 3.25%, and 2.43% enhanced outcome compared to FLViT, FedDL, FedCNN, and DataWeightedFed methods. The Specificity estimated by proposed FedDRNet acquired the accuracy of 98.1%, which is 4.99, 3.16, 2.04, and 1.12% enhanced outcome compared to FLViT, FedDL, FedCNN, and DataWeightedFed methods.

Table 9 Comparative discussion.

The analysis demonstrates the enhanced performance by the proposed FedDRNet model based on Accuracy, Precision, Recall, F-Score, and Specificity. The use of the ECSRNet is efficient in achieving high accuracy by efficiently extracting both spatial and temporal features. The ShuffleNet ensures lightweight and fast feature extraction andCSPNetassist in enhancing the gradient flow and feature reuse across stages for better generalization. Also, GRU assist in learning the sequential attributes that leads to robust classification of images with smaller DR stages. The incorporation of Homomorphic Encryption guarantees secure weight sharing without compromising data integrity that improves recall by reducing missed cases. Also, the IKMCbased user selection enhances communication efficiency and model convergence by grouping clients with similar data distributions and hence reduces the model inconsistency and boosting F-Score that balances precision and recall. Thus, the enhanced outcome is derived by the proposed model.

Conclusion

The proposed FedDRNet modelprivacy-preserving DR detection framework addresses key challenges in DR detection by offering data privacy, computational efficiency and enhanced accuracy. Using the ECSRNet, the FedDRNet modelachieves high diagnostic performance by maintaining low resource consumption that makes it suitable for deployment in real-world healthcare settings. Besides, the consideration of Homomorphic Encryption ensures secure communication inthe sever training for handling sensitive patient information. The use of IKMCfor client selection enhances the communication efficiency and convergence of FedDRNet framework. Still, the FedDRNet framework has certain limitations in handling highly heterogeneous data distributions and increased computational overhead due to encryption. Thus, in the future work, the model will be enhanced by exploring adaptive learning strategies to better manage data non-IID conditions and integrating differential privacy techniques for an additional layer of security.