Abstract
Skin diseases frequently cause mental and physical distress and are major global health concern. Because early detection is crucial to successful treatment, accurate diagnosis is challenge for dermatologists as well. Diagnostic accuracy could be significantly enhanced using methods like machine learning (ML) and deep learning (DL). However, substantial datasets are required for these models to make accurate predictions. The healthcare providers frequently encounter data shortages, and privacy regulations restrict data sharing. A privacy-preserving federated transfer learning for diagnosing skin diseases which incorporate four key strategies to enhance effectiveness. The transfer learning is used to train a model with dense neural network (DNN) for skin diseases detection. The feature extraction is performed using pre-trained architectures and DNN is used for classification. The federated learning (FL) replaces the transfer learning to train the model across distributed nodes with the DNN used to disease detection. The FL is combined with transfer learning to build a cohesive ecosystem where data privacy is maintained. The model performance was validated on both IID and non-IID database, with the proposed feature extraction with federated learning model achieving cross validation accuracy of 99.528% and 99.689% for IID and non-IID database, respectively. Results indicate that feature extraction with FL model can produce efficient, lightweight models—well-suited for resource-constrained devices—while ensemble learning enhances edge device performance, offering a powerful and privacy-preserving solution for skin disease diagnosis in modern healthcare.
Similar content being viewed by others
Introduction
Globally, millions of people of all ages and demographics suffer from skin problems. Skin ailments range from eczema, psoriasis, and acne to melanoma and other skin malignancies1. Chronic illnesses like psoriasis can cause physical discomfort, emotional suffering, and social isolation2. Non-fatal skin diseases account for a large portion of global healthcare costs. The dermatologist scarcity in many places delays diagnoses and worsens patient outcomes3. Skin illnesses can indicate underlying health difficulties, thus early and precise diagnosis is crucial to preserving patient health and possibly detecting additional systemic diseases4. Dermatologists directly examine lesions, pigmentation, and texture changes to diagnose skin illnesses5,6. Analyzing large datasets of skin images and finding disease patterns with artificial intelligent (AI) based techniques is also improving diagnostic accuracy7. Despite technological advances, such equipment and technical competence are scarce, especially in low-resource areas8. In dermatology, virtual and real-time skin condition diagnosis are now possible through advanced digital tools9,10. Patients benefit from quick assessments and teledermatology consultation improves the dermatological care accessibility10. Continuous observation allows for personalized treatment adjustments, improving patient outcomes and adherence11. Additionally, AI models can analyze patient data to detect early skin abnormalities and potentially identify skin cancers or other serious conditions11,12. However, as these digital healthcare ecosystems expand, concerns about data security and privacy become increasingly significant, particularly in dermatology where sensitive medical data is transmitted and stored12.
Medical imaging and diagnosis capture and share sensitive health data across platforms, making data privacy as serious problem13. Medical images used in dermatology contain visual data about skin problems and information that could reveal identification of patients if privacy protections are insufficient. Centralized storage systems, which contain patient data from numerous sources, are particularly vulnerable to hackers, threatening patient privacy and confidence in digital health care systems14. Federated learning (FL) model allows decentralized data utilization on local devices while keeping it secure, allowing shared model advances without transferring patient data15. To prevent data leaks during training, FL modelrequires strong encryption and secure aggregation. These advances make it harder to balance data utility and privacy since models need enough data to be clinically useful without violating patient privacy16. FL and transfer learning models have been popular in medical application because they solve data privacy, limited resources, and model adaptability17. FLmodel makes it possible to train machine learning (ML) and deep learning (DL) models on dispersed datasets, such as medical servers, without the need for centralized collection18. Transfer learning model allows pre-trained models on huge, publically available datasets to be tailored to specific medical applications with less task-specific data19. Transfer learning lets models adapt to diverse healthcare domains, such as dermatology and radiology. Transfer learning along with FL, can improve medical diagnostic accuracy by using information from many data sources, even in resource-limited medical environments20. These methods promise to improve model performance while protecting privacy and managing data scarcity, enabling ethical and practical AI use in healthcare. FL models with decentralized data interested by the discretion subjects of traditional ML/DL techniques that have been previously discussed. After that, each local network model is trained using its own local data, preventing sensitive information from being shared over a server network. The rest of this paper is organized as follows. The literature on skin disease diagnostic using ML/DL techniques is reviewed in Section "Related Work". The proposed model for diagnosing skin diseases using transfer learning, pre-trained feature extraction models, federated feature extraction, and federated transfer learning is presented in Section "Methodology skin disease diagnosis". The experimental setup and results comparison of skin disease detection models are described in Section "Results analysis" and "Discussion". The paper conclusion and future scope discussed in Section "Conclusion".
Related work
By handling visual complexity and model generalization through image augmentation, the convolutional neural network (CNN) offers a diverse dataset that more accurately captures the variability of skin conditions21. The model’s accuracy of 86% and reminiscence of 81% across seven disease classes show that it can recognize the features of skin disorders. The FL framework22 aggregates prediction while sharing sensitive data. FL differential privacy architecture facilitates cooperative model training without transferring confidential patient data to central servers using decentralized manner23. The implementation is on Amazon’s AWS cloud system, showed ease of use and scalability24 which improves mobile health technology diagnostics. A hybrid model using CNN and optimization module25 is used to improve the gesture identification. FL pre-trains the mixed approach without revealing sensitive sEMG data, and then transfer learning fine-tunes the model for each subject based on their features. According to experimental results, this approach improves recognition accuracy by 12.01% over conventional FL model and 28.52% over local training, overcoming data shortage and prioritizing privacy. The FL is used to train global model and sharing encrypted parameters via blockchain with permission to address privacy and trust issues26. According to the data, the scheme outperforms baseline models in segmentation by 19.08% in Hausdorff distance for whole malignancies and 1.99% in Dice comparison coefficient for attractive growths. The local devices run simulations on their datasets without transferring sensitive health data, solving privacy concerns27. Radar-based heartbeat and activity monitoring is implemented using a networked multi-task transfer learning28. FedRadar beats local training models in heartbeat rate prediction and action planning on actual radar datasets by 2.8% and 2.5%, respectively. FL with decentralized data storage improves the detection rate29. A data balancing strategy improves classifier performance and achieves 95% accuracy by correct the dataset’s class imbalance. FRESH is smart healthcare architecture that combines FL with ring identity safeguards against such assaults30. Modified batch verification takes advantage of lined operations’ additively on elliptic arches to ease the server’s dispensation load.
Review summary
Based on the literature review (Table 1), DL techniques used to draw attention to the problems of using FL for skin disease diagnosis21,22,23,24,25,26,27,28,29,30. The inherent non-IID distribution and data imbalance in skin disease datasets are significant issues. Patients from various demographic groups, geographical locations, and healthcare facilities have varying disease frequencies and image features, which leads to biased models that are not particularly successful at generalizing to other populations. Threats to security and privacy are another significant obstacle. In a medical context, protecting patient information’s security and confidentiality is essential. The FL system31, which uses a dataset of over 10,000 photos and decentralized data, initially demonstrates an overall accuracy rate of around 79% in the classification of skin disorders. The four categories of skin diseases are classified using the CNN32 and the parameters are optimized using the hyper-parameter tuning.Even though FL is decentralized, during model updates, sensitive patient data—such as images of skin lesions—is still susceptible to reconstruction or inference assaults. The varied nature of medical imaging data, which unintentionally expose distinguishable characteristics, increases this danger21,23.
Skin disease diagnostics include analyzes intensive high-resolution dermoscopic images. IoT devices with limited processing and storage capacities find it challenging to handle such data hence models that are both effective and lightweight are needed24,25. Additionally, the communication cost in FL frameworks exacerbates this issue, particularly when delivering large quantities of model changes in real-time from devices with constrained resources. Ethical and legal restrictions make using FL to diagnose skin conditions much more difficult26,27. Another issue is the lack of model interpretability, as doctors frequently want precise justifications for diagnostic judgments before they can have faith in AI systems, particularly when dealing with complex disorders like psoriasis or melanoma. The accuracy of diagnosis compromises by malicious clients who can introduce erroneous data or interfere with model updates21,22,23,24,25,26,27,28,29. The data processing techniques, robust model design, ethical adherence, and enhanced security measures are required to get over these challenges and ensure FL’s efficacy in identifying skin conditions30,31,32. FL system is used for skin disease diagnostics with an emphasis on resource utilization and data confidentiality. It eliminates the need to transfer confidential skin photos to a centralized server while working with sensitive data. This work offer four distinct models to skin disease diagnosis using DNN classifier (a) transfer learning (b) feature extraction (c) feature extraction with federated learning (d) federated transfer learning.
Methodology skin disease diagnosis
This section presents a resource-efficient FL outline for the recognition and classification of skin illnesses. IoT-enabled devices at different locations collect skin disease images from patients and store them locally. The overall structure of data collection for skin disease diagnosis using FL is shown in Fig. 1. By using distributed data to enhance the accuracy of ML/DL models, it facilitates more effective diagnosis of skin conditions. Figure 2 illustrates the conceptual framework for skin disease diagnosis using four distinct strategies: federated transfer learning, feature extraction with FL, feature extraction with transfer learning, and transfer learning alone. In this framework, images of skin conditions are collected from patients across different locations and stored locally to maintain data confidentiality. Once data collection is complete, pre-processing methods—such as resizing, grayscale conversion, and sharpening—are pragmatic to reduce noise and enhance image quality. Following pre-processing, the dataset is analyzed using four methods.
-
1.
The first method, transfer learning, employs DNN to fine-tune a pre-trained model for skin disease classification.
-
2.
The second method combines feature extraction and transfer learning, where pre-trained models like DenseNet, VGG19, Xception, and UNet are used to extract features, which are then used for DNN-based classification.
-
3.
The third method integrates FL and feature extraction, enabling distributed clients to collaboratively train models on both IID and non-IID datasets while ensuring strong performance and privacy.
-
4.
The fourth method —federated transfer learning—uses FL in conjunction with transfer learning to build a global model from dispersed data while preserving patient privacy.
The proposed framework offers a secure, scalable solution to modern healthcare challenges by leveraging ML/DL methodologies in a decentralized setting.
FL with IID and non-IID datasets
Federated learning (FL)33 model arrange statistics and secrecy while dealing the hitches of exercise representations above a net of detached plans. The parameters or gradients of these locally trained replicas are then collective to generate a global model. By keeping the system exercise course as adjacent to the statistics bases as likely, FL model aims to safeguard data privacy. FL model is therefore, a good optimal for submissions where secrecy is important, mainly when working with complex numbers, geographically detached evidence, or campaigns with partial possessions or erratic network connectivity. FL model has attracted a lot of consideration and research in a range of actual submissions, despite its challenges, particularly in the security domain34. The data-privacy-conscious industries like healthcare and finance employ FL model more frequently to overawe the confines of federal data storage. Without disclosing private patient information to outside servers, FL model enables cooperative model training in the medical field to identify illnesses35. FL model employs IID datasets, which have a uniform and balanced distribution of data among devices36, and Non-IID datasets, which have an uneven and different distribution of data between devices37. Real-world scenarios with inconsistent data from several sources are often reflected in non-IID databases. FL model enables resident strategies to maintain their discrete and assorted documents though attractive a universal system, even in cases when data is not disseminated evenly.
Model training using transfer learning
In deep learning, transfer knowledge is the procedure of applying the information acquired from previously trained models to new and related situations. The key idea is to shift the focus from a large-dataset-trained model to a different-but-related goal that requires fewer labeled instances. The substantial monetary outlay needed to train intricate variables in DL models drives transfer learning. TL has becoming more and more popular in this business for good reason, and it’s easy to incorporate into real-world applications. This process retrains a trained network using just the final classification layer’s parameters by the exercise statistics from the novel mission. This study identifies skin illnesses using transfer learning models, including VGG16, Xception, EfficientNetB3, and MobileNetV238. By initializing models using learned properties, transfer learning has the advantage of speeding up training and reducing computational and resource costs. CNNs are distinguished by their hierarchical representations and use of convolutional, pooling, dropout, and fully connected layers to extract features from pictures. The transmission learning model has already recognized useful traits and trends across a range of data, serving as a knowledge base. Applying the model to new work only improves the top layers; the lower layers retain all of their learnt information. Initially, transfer learning models only train on low-level structures, keeping all additional layers fixed. While training on a new dataset, transfer learning models often have their remaining layers updated or adjusted. By enabling the model to derive higher-level features pertinent to the new data distribution, altering these layers may enhance the model’s performance.When enhancing a transfer learning model, it’s important to pick your layers wisely and strike a balance between relying on prior knowledge and learning from fresh data.he empirical source distribution \(\hat{Y}\) is specified as \(\hat{Y} = \left\{ {\hat{Y}_{1} ,\hat{Y}_{2} , \ldots ,\hat{Y}_{K} } \right\}\), while the source circulation Y for multi-source transfer learning is definite as \(Y = \{ Y_{1} ,Y_{2} , \ldots ,Y_{K} \}\), where \(I_{k}\) represents the distribution of the K-th basis domain. For the set of hypothesis functions I that map P to Q, let (⋅,⋅) ∶Q × Q → R + represent the loss function. The next is the definition of the q-Discrepancy distance discY between two distributions, \(I_{1}\) and \(I_{2}\):
where \(I_{{x_{i} }} (i,F_{{\Gamma_{i} }} ): = e_{{p\sim x_{i} }} [(i(p),F_{{\Gamma_{i} }} (p))]\). \(F_{{Y_{1} }}\) And \(F_{{Y_{2} }}\) denotes the labeling meaning for the delivery \(Y_{1}\) and \(Y_{2}\), respectively. The empirical optimal problem of f may be clearly shown as follows, given a hypothesis class of real-valued functions f and a set of training data samples \(T = \left( {t_{1} , \ldots ,t_{a} } \right)\):
where \(\varepsilon = (\varepsilon_{1} , \ldots ,\varepsilon_{a} )\), \({\mathcal{E}}_{h}\) are the Rademacher random variables \(X(\varepsilon_{h} = - 1) = X(\varepsilon_{h} = 1) = 0.5\). Let I be a set of theory functions i(⋅) that map the first s-time step efforts \(\{ P_{1} ,P_{2} , \ldots ,P_{5} \} \in r^{{c_{n} \times 5}}\) to the s-time-step output \(q_{s} \in r^{{c_{q} }}\). Using the set I and the distribution X, a new hypothesis function set \(l_{I}\) is distinct as follows:
where the initial t-time-step inputs \(p \in r^{\rho }\) are mapped to [0, 1] by the loss purpose \(l(i(p),F_{X} (p)) \in l_{I}\), an \(l_{R}\)-Lipchitz function associated with the RNN hypothesis. The following equality holds with chance at least 1 − δ over X for every i ∈ I given a dataset of m samples \(\hat{X} = \left( {p_{h} = q_{h} } \right)_{s - 1}^{s}\) h = 1…that is taken from the domain X:
Particular a dataset of K divisions with \(a_{g}\) examples h = 1…\(a_{g}\) strained from several basis areas \(Y_{g}\) for g = 1… K, the next equivalence grips with chance no less than 1 − δcompleted \(Y = \{ Y_{1} ,Y_{2} , \ldots ,Y_{K} \}\) for all i ∈ I:
The next variation grips for \(Y_{g}\) with chance no less than 1 − δ∕K using δ∕K in its place of δ for g = 1, … K.
samples \(\hat{Q} = \left( {p_{h}^{g} \cdot q_{h}^{g} } \right)_{s - 1}^{S}\),h = 1,…, \(a_{g}\), from the basis domain \(Y_{g}\) for g = 1, …, K, and set of data samples \(\hat{X} = \left( {p_{h} = q_{h} } \right)_{s - 1}^{s}\),h = 1,…,a, drawn from the aim area X. The triangle inequality and the definition were used to compute the Q-discrepancy distance. The following equation can further constrain the major component in the next line with a probability of at least 1 − δ over X, according to the goal function.
Additionally, the bound for \(Disc_{Q} (Y,\hat{Y})\)
Let me be a private of the theoretical purpose i(⋅) that translates the concept of the RNN to the output of the s-th step. The following difference grips with a chance of at least 1 − δ: Set a dataset of K subsections with samples \(\hat{Q} = \left( {p_{h}^{g} \cdot q_{h}^{g} } \right)_{s - 1}^{S}\) h = 1,…, to from the base domain \(Y_{g}\) for g = 1,…, K, and a set of data samples \(\hat{X} = \left( {p_{h} = q_{h} } \right)_{s - 1}^{s}\), h = 1,…,a, pulled from the objective area X.
The triangle inequality condition and the discrepancy distance concept may be used to get the following inequality.
Q-discrepancy distance and characteristics of empirical source distribution \(\hat{Y} = \left\{ {\hat{Y}_{1} ,\hat{Y}_{2} , \ldots ,\hat{Y}_{K} } \right\}\) defined as follows.
Lastly, the inequality that follows may be obtained using
The empirical error of the function i as evaluated on the experimental multi-source area \(\hat{Y}\) is represented. \(\hat{X}\) And \(\hat{Y}\) is Q-discrepancy distance is the second term. The function set I on the empirical basis domain \(\hat{Y}\) and the empirical goal domain \(\hat{X}\) has a Rademacher difficulty term that is the third and fourth terms, respectively. The final two elements show the probability terms, which are based on the assurance level δ and the quantity of data samples.
Feature extraction using pre-trained architectures
A key component of DL models that enables effective use of a pre-conditioned neuronal system’s abilities is feature extraction. Among the several layers in these networks that are especially built to extract essential characteristics for tasks like object identification and localization are convolutional and pooling layers. To might change the learning rate, add layers, and variation the sum of neurons in every stratum, and so on to advance our systems. These methods provide significant time and computing resource savings. Pre-trained replicas that have been trained on huge datasets are effective feature extractors. System performance can be improved by selecting the appropriate feature extractor. DenseNet, VGG19, Xception, and UNet were among the pre-trained models39 used which are used to extract the properties of the second-to-last layers. The resulting attributes are then used to classify skin illnesses in FL with IID and Non-IID databases. By allowing remote devices to work together by sharing these derived features for model training, they excel at extracting meaningful patterns from high-dimensional image data, such as lesions’ shape, colour, texture, and edge details, which are critical for diagnosing skin conditions.
Classification using dense neural network (DNN)
Dense neural network (DNN) is highly effective in performing complex classification tasks and learning intricate data representations40. A DNN can learn hierarchical features from input data because it has several completely linked layers, with each neuron in one layer connected to every other neuron in the layer above. In this context, DNNs are particularly advantageous. As the input features propagate deeper into the network, higher layers extract more abstract and disease-specific patterns, enabling accurate differentiation between various skin conditions. To simulate intricate relationships in the data, each layer of a DNN applies a weighted sum and then a non-linear activation function.The model is trained using supervised learning with labeled datasets, optimizing weights via backpropagation and gradient descent to minimize classification error. The architecture’s ability to learn deep, abstract features makes it well-suited for skin disease diagnosis, where subtle variations in texture, color, and lesion shape can significantly affect classification accuracy. By leveraging the dense connectivity of DNNs, the system achieves robust performance in automated dermatological analysis. The architecture consists of three main layers: the input, hidden, and the output layer. The input layer is the first point of contact for the model and receives raw data, which, in the context of skin disease classification, typically includes a feature vector derived from skin images. The hidden layers form the core computational engine of the DNN, where each layer applies a weighted sum followed by a non-linear ReLU activation function. The first hidden layer focuses on detecting low-level features like edges, spots, and gradients, which serve as fundamental building blocks in image recognition. Subsequent intermediate layers learn to combine these low-level cues into more complex structures such as textures, shapes, and boundary patterns that are often characteristic of specific skin conditions. The model’s depth and width significantly influence its ability to generalize across diverse cases, although deeper networks may require larger datasets and robust regularization to mitigate overfitting. The output layer gives the classification result, typically using a softmax activation function to generate chances for each class.The input and output feature maps of a precise layer can be characterized as \(P \in r^{I \times Z \times H}\) and \(Q \in r^{I \times Z \times O}\), where I, Z, H, and O represent the height, width, and number of channels, individually. The convolutional filters are embodied as \(D \in r^{I \times O}\). In group involvedness, the feature maps P, Q , and the filters D are separated into G distinct groups. Group convolution is characterized in the calculations below. Here ⊗ characterizes 2D convolution.
The depth-dependent convolution used in the DNN module allows for the extraction of localized features while preserving the spatial scale of the data. The subsequent point-wise dense vector further improves the replica’saptitude to acquiremultifaceted representations by combining features from different channels, allowing for richer information encoding. Depth convolution and point convolution is describes as follows.
Here Z denotes the difficulty kernel, q denotes the contributionarticle map, h and g are the dimensions of the input feature map, K and L are the dimensions of the output feature map, and m denotes the number of channels. Triple attention (TA) improves the replica’scapability to recognize and discriminate different characteristics. Each branch is used to analyze the input tensor (χ ∈ RC × I × Z) in different ways, which improves the model’s complex shapes. In each branch, the input tensor undergoes rotation, followed by W-union and convolution operations, which help extract dimensional correlations between height and channel dimensions. The W-pool function is given by the following relation.
By capturing key interactions between features at dissimilarbalances and locations, TA improves the replica’s ability to identify subtle patterns essential for accurate classification. The final refined feature map is generated by averaging the refined tensors generated by each branch.
whereσ represents the sigmoid function of each objective while \(\psi_{1}\),\(\psi_{2}\) and \(\psi_{3}\) denotes the average two dimensional convolutional layers definite by kernel size K in the three twigs of triplet courtesy.
where \(\omega_{1}\), \(\omega_{2}\), and \(\omega_{3}\) represents the three-dimensional attention weights \(q_{1}\) and \(q_{2}\) which ensures that TA effectively captures spatial and channel dependencies. The working process of skin disease diagnosis using DNN is summarized in Algorithm 1.

Algorithm 1: Skin disease diagnosis using DNN
Results analysis
This segment presents the results and comparative examination of the models used to identify skin illnesses. Parameters such as accuracy, precision, recall, and loss are used to measure how effectively the models detect the specified skin diseases. The proposed FL model implemented on the Google Colab platform using Python, with model training and testing conducted on Colab cloud GPU server. Given the size of the HAM10000 dataset and the iterative communication between local and global models in FL, model training requires substantial computational time, which CPU cannot efficiently handle. For system-level validation, experiments were also executed on a local system configuration comprising an NVIDIA GTX 1650 graphics card with 4 GB dedicated memory, 16 GB RAM, and an Intel Core i5 processor. In order to adjust volume of time, the model is often built and executed on a GPU. An existing FL sample available on Kaggle.com was adapted and modified to design the FL framework used in this study. The FedAvg method is used to average all of the local networks in order to aggregate them into a global network at the FL server.HAM10000 "Human against Machine with 10,000 training images41," a publicly accessible resource housed in the ISIC repository, served as the dataset used. Regarding hyperparameter tuning, all models trained using hyperparameters optimized through empirical tuning and grid search experiments. Specifically, the learning rate, batch size, and number of epochs were systematically varied for each model to achieve optimal performance on the validation dataset. During tuning, the number of epochs was varied from 0 to 150, and the best-performing configuration was selected based on accuracy and convergence behavior. For most models, a learning rate of 0.001, batch size of 32, and 100 epochs were found to provide the most effective balance between training time and model performance.
The dataset includes 11,253 dermatoscopeimages that show seven dissimilar kinds of skin infections (Fig. 3): vascular lesions 412, benign keratosis-like lesions 1058, basal cell carcinoma 358, actinic keratosis 6858, melanocytic nevi 635, melanoma 847, and Dermatofibroma 1085 (Table 2). The training and testing groups were randomly selected from the dataset. Ten percent of the dataset is used for testing, while ninety percent is used for training. To prevent overfitting during training, a validation process was also included. FL used both IID database, where records is disseminated consistently and identically among devices, and Non-IID database, where data spreading is uneven and differs amongst devices, as shown in Fig. 4 for 2 distinct clients (N = 2). Non-IID databases often depict real-world situations with conflicting information from several sources.Every client uses its own local dataset to train on its own network. The server receives all of the local networks and combines them into a global network once the local networks have finished training. The neural network is subsequently distributed back to the customers. The clients then train their local network once more using their local dataset, utilizing the global network as a fresh starting point. The cycle is repeated 100 times once the client’s local network has been upgraded. The model presented in this work assumes that there are no problems with the communication between the clients and the server. In practice, a local network to global network transfer can be costly and erratic, which increases the likelihood of mistakes.
Results analysis of transfer learning models on skin disease diagnosis
This section provides a detailed analysis of transmissionknowledgereplicas for casing illness diagnosis. Figure 4 shows the results analysis of training and testing accuracy for VGG16, Xception, EfficientNetB3, and MobileNetV2 reveals that MobileNetV2 performs best, with a high training accuracy of 90.352% and testing accuracy of 98.374%. Xception follows closely, achieving 87.798% in training and 97.985% in testing, while EfficientNetB3 reaches 89.857% in training and 97.968% in testing. VGG16, despite some fluctuations, achieves a strong testing accuracy of 95.858%, but with slower convergence. MobileNetV2 outperforms the others, offering the best accuracy and generalization for skin disease classification. Fig. 5 demonstrations the loss results of the transfer learning models during training and exciting over 10 epochs. Among the models, MobileNetV2 shows the best performance, with training loss reduced from 0.258 to 0.175 and testing loss from 0.199 to 0.116, reflecting its strong learning and generalization capabilities. EfficientNetB3 follows closely, with consistent reductionin training and testing loss to 0.235 and 0.076, respectively. Xception demonstrates moderate progress, ending with testing loss of 0.077, while VGG16 shows slower improvement, with a final testing loss of 0.966. MobileNetV2 and EfficientNetB3 is the most efficient models, with MobileNetV2 achieving the lowest losses, making suitable for the skin disease diagnosis task.
Table 3 delivers a relativeexamination of transfer knowledge models with DNN classification for skin disease detection. MobileNetV2 achieves the highest testing accuracy at 98.064%, shown 3.45% increase over EfficientNetB3 and 3.64% improvement compared to Xception. To address this, regularization techniques were applied, including dropout (rate 0.5), early stopping (patience = 10), and batch normalization. The results in Table 4 present the class-wise accuracy of transfer learning models with DNN, including VGG16 + DNN, Xception + DNN, EfficientNetB3 + DNN, and MobileNetV2 + DNN, across 10 folds of K-fold cross-validation for skin disease diagnosis. Table 5 presents a comparative analysis of resource metrics for transfer learning models utilized in skin disease detection, evaluating GPU memory usage, GPU process usage, CPU process usage, and virtual memory consumption.
Results analysis of feature extraction models on skin disease diagnosis
Figure 6 compares the accuracy of DenseNet, VGG19, Xception, and UNet during training and testing. UNet leads with the highest exercise accuracy of 84.537% and challengingexactness of 90.894%, presentation strong generalization. VGG19 improves steadily, reaching 82.845% in training and 89.202% in testing, while DenseNet trails with peak training accuracy of 82.365% and testing accuracy of 88.722%. Figure 7 shows that UNet also achieves the lowest loss during both phases, reducing training loss to 0.406 and testing loss to 0.321 by epoch 50. Table 6 presents the results of feature extraction models with a DNN classifier for skin disease discovery and organization, highlighting notable differences in both accuracy and loss metrics. In terms of loss values, UNet achieves the lowest testing loss at 0.112, followed by Xception at 0.124, VGG19 at 0.158, and DenseNet at 0.138. UNet reduces testing loss by 0.026 compared to DenseNet, 0.046 compared to VGG19, and 0.012 compared to Xception. For training loss, UNet again records the lowest value of 0.087, while Xception follows with 0.098, VGG19 at 0.145, and DenseNet at 0.125. UNet reduces training loss by 0.038 over DenseNet, 0.058 over VGG19, and 0.011 over Xception.
The results in Table 7 shows the class-wise accuracy of feature extraction models with DNN (DenseNet + DNN, VGG19 + DNN, Xception + DNN, UNet + DNN) across 10 folds of K-fold cross-validation for skin disease diagnosis. UNet + DNN achieve the highest average accuracy, ranging from 90.31% to 90.37% across all classes (MV, MEL, BKL, BCC, AK, VL, and DF), indicating superior performance. Table 8 presents the resource utilization metrics for feature extraction models integrated with DNN classification frameworks for skin disease detection and classification, focusing on GPU memory, GPU process, CPU process, and virtual memory usage. UNet demonstrates higher GPU memory, GPU process, CPU process, and virtual memory usage, making it more computationally demanding but potentially suited for scenarios requiring higher processing capabilities.
Results analysis of federated transfer learning models on skin disease diagnosis
The analysis of the federated transfer learning perfect for skin disease diagnosis demonstrates notable improvements in performance metrics across both Client 1 and Client 2 on the IID dataset. As highlighted in Section "Results analysis of feature extraction models on skin disease diagnosis", among the four transfer learning models evaluated, MobileNetV2 delivers the most effective results, achieved accuracy of 98.064%, making most suitable model for this experiment. As shown in Figs. 8 and 9, for Client 1, accuracy improves from 25.568% to 99.698%, while precision, recall, and F-measure increase from 17.82%, 19.856%, and 18.783% to 99.897%. On Non-IID datasets, as shown in Figs. 10 and 11, training outcomes exhibit performance over 25 epochs. The outcomes confirm the models’ effective learning and optimization, even with the testsmodeled by non-IID data circulations.
Table 9 compares the presentation of federated transfer knowledgefor skin infectionfinding across IID and Non-IID datasets. During training, models on the Non-IID dataset demonstrate marginal improvements over IID data. Training accuracy rises from 96.428% to 96.573%, precision improves from 96.004% to 96.235%, recall increases from 96.108% to 96.389%, and F-measure advances from 96.048% to 96.298%. Additionally, the training loss decreases from 0.775 (IID) to 0.632 (Non-IID), indicates enhanced optimization on Non-IID data. In testing, the Non-IID dataset again outperforms the IID. Table 10 describes the class-wise accuracy of MobileNetV2 + FL + DNN model across tenfold cross-validation for both IID and Non-IID datasets shows highly consistent and stable performance in skin disease diagnosis. As shown in Fig. 12, the narrow clustering of average accuracy values between 95.9% and 96.5%, along with steady accuracy trends across all folds, confirms the robustness, reliability of the MobileNetV2 + FL + DNN perfect when used to both IID and Non-IID dataset. Table 11 summarizes the resource utilization metrics for the MobileNetV2 + FL + DNN model on both IID and Non-IID datasets for skin disease diagnosis, focusing on GPU memory, GPU process, CPU process, and virtual memory usage across two clients. The results confirm that resource utilization remains efficient and fairly stable between IID and Non-IID scenarios for this federated learning configuration.
Results analysis of UNet + FL + DNN for skin disease diagnosis
The UNet-based feature extraction model achieves a maximum training accuracy of 90.338% and testing accuracy of 83.854%. Figures 13 and 14 shows the training results for the UNet + FL model on the IID dataset for Client 1 and Client 2. Both clients show exceptional performance, with accuracy exceeding 99% by the final epoch. Figures. 15 and 16 show the training results of federated learning models for feature extraction on Non-IID datasets, comparing Client 1 and Client 2 over 25 epochs. Table 12 presents the effectiveness of the UNet + FL + DNN model for skin diseases diagnosis, comparing results with both IID and Non-IID datasets. During training, the model exhibits minor variations, with accuracy dropping from 99.514% for IID to 99.414% for Non-IID data. The loss is reduced for Non-IID data at 0.587 compared to 0.623 for IID. In the testing phase, the model shows superior performance with Non-IID data, with accuracy rising by 0.161% to 99.689%, precision increasing by 0.193% to 99.506%, recall by 0.144% to 99.441%, and F-measure by 0.171% to 99.473%.
Table 13 depicts the class-wise accuracy performance of the proposed UNet + FL + DNN for skin disease diagnosis was evaluated using tenfold cross-validation on both IID and non-IID datasets. Figure 17 confirms that the MobileNetV2 + FL + DNN model maintained consistent and superior accuracy trends under both IID and non-IID distributions, displays the robustness of the FL framework for reliable skin disease diagnosis. Table 14 illustrates the resource utilization metrics for the UNet + FL + DNN model under IID and Non-IID data distributions in the context of skin disease diagnosis. Non-IID data introduces a mild increase in GPU memory, GPU process, and CPU process usage for both clients, with Client 1 generally experiencing slightly higher increases than Client 2 in most metrics. The virtual memory usage remains fairly consistent, indicating Non-IID data marginally raises computational demand and system resource utilization remains balanced and efficient in FL settings.
Discussion
Table 15 presents a comparative analysis of accuracy, loss, and resource utilization for four model strategies in skin disease diagnosis: Strategy (a) as MobileNetV2, Strategy (b) as UNet, Strategy (c) as MobileNetV2 + FL + DNN, and Strategy (d) as UNet + FL + DNN. In terms of accuracy, integrating FL with DNN classification significantly improved performance. Under IID conditions, MobileNetV2 + FL + DNN achieved 99.063% accuracy, which is an 8.23% and 18.03% increase over MobileNetV2 and UNet, respectively. Inference speed was also faster in FL model, with UNet + FL + DNN achieving 28 ms (IID) and 29 ms (Non-IID), significantly quicker than MobileNetV2 (42 ms) and UNet (57 ms). In terms of model size, although federated models were slightly larger (24.6 MB for Strategy c and 27.1 MB for Strategy d), this increase is acceptable given their superior accuracy and efficiency. The resource consumption analysis further emphasizes the advantage of FL-based models. UNet + FL + DNN utilized the least GPU memory (24.548% IID and 24.315% Non-IID), compared to MobileNetV2 (76.2%) and UNet (69.5%), reflecting a reduction of 67.8% and 64.9%. GPU process utilization also decreased in FL models, with UNet + FL + DNN consuming only 27.975% (IID) and 27.803% (Non-IID), and MobileNetV2 + FL + DNN slightly higher. CPU process usage followed the same trend, with UNet + FL + DNN requiring the least at 3.983% (IID) and 3.738% (Non-IID), showed drop from MobileNetV2 and UNet. Virtual memory usage was similarly optimized in federated setups; with UNet + FL + DNN and MobileNetV2 + FL + DNN maintaining lower consumption levels than their standalone counterparts. The ANOVA results (Table 15) confirmed statistically significant differences (p < 0.001) in accuracy, loss, training time, and GPU-related metrics, indicating that the choice of strategy has a meaningful impact on performance. To further determine where these differences lie, Tukey’s HSD post-hoc analysis was applied. The post-hoc results revealed that Strategy (d) (UNet + FL + DNN) consistently outperformed Strategies (a), (b), and (c) with statistically significant higher accuracy and lower loss values. Similarly, both FL-integrated strategies (c and d) shows reduced resource usage (GPU/CPU/memory) compared to their non-federated counterparts (a and b), with strong statistical significance.
In real-world clinical settings, computational resource efficiency plays a crucial role in determining the deployability of AI models, especially in resource-constrained environments such as small clinics or mobile diagnostic units. From the comparative analysis (Table 15), observe that traditional models like MobileNetV2 and UNet require higher GPU memory and longer training times, which not be feasible for on-site training or rapid inference. UNet + FL + DNN, in particular, requires only 24.548% GPU memory and 945 s of training time, while offering the fastest inference speed of 28 ms. FL-based model offer enhanced data privacy, aligning with regulatory frameworks like HIPAA and GDPR, which is critical for clinical use. The slightly larger model sizes (27.1 MB for UNet + FL + DNN) are still manageable on modern edge devices and embedded systems, making these models highly practical for deployment in decentralized clinical infrastructures without compromising diagnostic accuracy.
Conclusion
A privacy-preserving FL framework was proposed for skin disease diagnosis, evaluated through four strategic approaches: strategy (a) employed MobileNetV2 with transfer learning and DNN classification, strategy (b) utilized UNet for feature extraction followed by DNN classification, strategy (c) integrated FL with MobileNetV2 and DNN, and strategy (d) combined UNet-based feature extraction with FL and DNN classification to maintain data decentralization while enhancing diagnostic accuracy. Both IID and Non-IID datasets were used for comprehensive assessment. From the results, strategy (d) achieved the highest diagnostic accuracy of 99.689% (IID), surpassing MobileNetV2 by 8.16% and UNet by 15.835%. It also recorded the lowest loss of 0.415 (IID), representing a 17.32% reduction compared to MobileNetV2 and 32.21% decrease relative to UNet. Strategy (c) delivered performance with 98.918% accuracy (IID) and a loss of 0.425, improving substantially over both baseline models though marginally lower than strategy (d). In terms of resource consumption, strategy (d) required 24.548% GPU memory (IID) and 27.975% GPU process, which were significantly lower than MobileNetV2 and UNet. Strategy (c) followed closely with 29.708% GPU memory and 31.838% GPU process. Similar trends were noted for CPU process and virtual memory usage, where federated models consumed fewer resources while achieving higher accuracy and lower loss values. When comparing strategy d to strategy c, the former outperformed with 0.771% higher accuracy and 2.35% lower loss which confirms that incorporating feature extraction through UNet prior to federated optimization results in superior classification outcomes compared to direct transfer learning fine-tuning in a federated environment. The proposed strategy (d) model effectively combines high diagnostic accuracy with strong data privacy safeguards. It ensures reliable, scalable, and privacy-preserving skin disease detection across both IID and Non-IID data distributions. These capabilities position the model as a robust AI-driven dermatological solution, highly suitable for real-world telemedicine and remote healthcare applications.
Data availability
The data supporting the findings of this study are available from the corresponding author upon reasonable request. We confirm that all necessary steps were taken to ensure the privacy and confidentiality of the data used in this research. The HAM10000 dataset is publicly available and does not contain any personally identifiable information. Additionally, the proposed federated learning approach inherently protects data privacy by keeping the raw data localized to each client device. No identifying information or individual patient details were included in the manuscript.https://www.kaggle.com/datasets/vrindaat/ham10000-dataset
References
Papadopoulos, L. & Walker, C. Understanding skin problems: acne, eczema, psoriasis and related conditions (John Wiley & Sons, 2003).
Ou, M., Xue, Y., Qin, Y., & Zhang, X. (2024). Experience and caring needs of patients with psoriasis: A qualitative meta‐synthesis. J. Clin. Nurs.
Nearchou, F., &Flinn, C. (2024). The Impact of COVID-19 on Children and Adolescents with Chronic Illness. The COVID-19 Aftermath: Volume I: Ongoing Challenges, 385–399.
Li Pomi, F. et al. Artificial intelligence: a snapshot of its application in chronic inflammatory and autoimmune skin diseases. Life 14(4), 516 (2024).
Vayadande, K. (2024). Innovative approaches for skin disease identification in machine learning: A comprehensive study. Oral Oncol. Reports 100365.
Brown, M. et al. Topically applied therapies for the treatment of skin disease: past, present, and future. Pharmacol. Rev. 76(5), 689–790 (2024).
Singh, J., Sandhu, J. K. & Kumar, Y. An analysis of detection and diagnosis of different classes of skin diseases using artificial intelligence-based learning approaches with hyper parameters. Arch. Computat. Method. Eng. 31(2), 1051–1078 (2024).
Dang, T. L. P., Sadreddin, A. & Ahuja, S. Readily available technologies in low-resource communities: a review and synthesis. Inf. Technol. Dev. 30(1), 132–172 (2024).
El-Saleh, A. A., Sheikh, A. M., Albreem, M. A., Honnurvali, M. S. (2024). The Internet of Medical Things (IoMT): opportunities and challenges. Wireless Networks 1–18.
Rosário, A. T., &Rosário, I. T. (2024). Telemedicine Platforms and Telemedicine Systems in Patient Satisfaction. In Improving Security, Privacy, and Connectivity Among Telemedicine Platforms (pp. 119–151). IGI Global.
Sitaraman, S. R., Alagarsundaram, P., Kumar, V. & Kurniadi, D. Accurate skin disease detection with K-nearest neighbors and CAM in IoMT-enabled diagnostic solutions. Chin. Tradi. Med. J. 7(3), 5–17 (2024).
Ahmed, S. F. et al. Insights into internet of medical things (IoMT): Data fusion, security issues and potential solutions. Inform. Fusion 102, 102060 (2024).
Khatiwada, P., Yang, B., Lin, J. C. & Blobel, B. Patient-generated health data (PGHD): Understanding, requirements, challenges, and existing techniques for data security and privacy. J. Personal. Med. 14(3), 282 (2024).
Kissi, J. et al. Healthcare professionals’ perception on emergence of security threat using digital health technologies in healthcare delivery. Digit. Health 10, 20552076241260384 (2024).
Alsamhi, S. H., Myrzashova, R., Hawbani, A., Kumar, S., Srivastava, S., Zhao, L., Curry, E. (2024). Federated learning meets blockchain in decentralized data-sharing: Healthcare use case. IEEE Internet of Things J.
Williamson, S. M. & Prybutok, V. Balancing privacy and progress: a review of privacy challenges, systemic oversight, and patient perceptions in AI-driven healthcare. Appl. Sci. 14(2), 675 (2024).
Albalawi, E., TR, M., Thakur, A., Kumar, V. V., Gupta, M., Khan, S. B., Almusharraf, A. (2024). Integrated approach of federated learning with transfer learning for classification and diagnosis of brain tumor. BMC Med. Imag. 24(1), 110.
Schreiber, R., Koppel, R. & Kaplan, B. What do we mean by sharing of patient Data? DaSH: A data sharing hierarchy of privacy and ethical challenges. Appl. Clin. Inform. 15(05), 833–841 (2024).
Vinudevi, G., Vijayaragavan, S. P., Karthik, B. (2024). Transfer Learning Approaches for Colorectal Tumour Detection on Adapting Pre-Trained Models to Diverse Medical Imaging Datasets. In Optimizing Intelligent Systems for Cross-Industry Application (pp. 411–432). IGI Global.
Choudhry, I. A. et al. Privacy-preserving AI for early diagnosis of thoracic diseases using IoTs: A federated learning approach with multi-headed self-attention for facilitating cross-institutional study. Internet of Things 27, 101296 (2024).
Divya, N. D. & Sharma, G. Convolutional neural network (CNN) and federated learning-based privacy preserving approach for skin disease classification. The J. Supercomput. 80(16), 24559–24577 (2024).
Kareem, A. (2024). A privacy-preserving approach to effectively utilise distributed data for medical disease detection.
Barnawi, A., Chhikara, P., Tekchandani, R., Kumar, N., &Alzahrani, B. (2024). A Differentially Privacy Assisted Federated Learning Scheme to Preserve Data Privacy for IoMT Applications. IEEE Trans. Network Service Manag.
Aminifar, A., Shokri, M., &Aminifar, A. (2024). Privacy-Preserving Edge Federated Learning for Intelligent Mobile-Health Systems. arXiv preprint arXiv:2405.05611.
Zhang, Z., Ming, Y. & Wang, Y. A federated transfer learning approach for surface electromyographic hand gesture recognition with emphasis on privacy preservation. Eng. Appl. Artif. Intell. 136, 108952 (2024).
Kumar, R., Bernard, C. M., Ullah, A., Khan, R. U., Kumar, J., Kulevome, D. K., Zeng, S. (2024). Privacy-preserving blockchain-based federated learning for brain tumor segmentation. Comput. Biol. Med. 108646.
Alahmadi, A., Khan, H. A., Shafiq, G., Ahmed, J., Ali, B., Javed, M. A., Alahmadi, A. H. (2024). A privacy-preserved IoMT-based mental stress detection framework with federated learning. J. Supercomput. 80(8), 10255–10274.
Jiang, X., Zhang, J. & Zhang, L. Fedradar: Federated multi-task transfer learning for radar-based internet of medical things. IEEE Trans. Netw. Serv. Manage. 20(2), 1459–1469 (2023).
Nam, B. J. (2023). Skin Disease Classification Using Privacy-Preserving Federated Learning. Int. J. High School Res. 5(1).
Wang, W., Li, X., Qiu, X., Zhang, X., Brusic, V., Zhao, J. (2023). A privacy preserving framework for federated learning in smart healthcare systems. Inform. Process. Manag. 60(1), 103167.
Nam, B.J., 2023. Skin Disease Classification Using Privacy-Preserving Federated Learning. Int. J. High School Res. 5(1).
Hossen, M. N. et al. Federated machine learning for detection of skin diseases and enhancement of internet of medical things (IoMT) security. IEEE J. Biomed. Health Inform. 27(2), 835–841 (2022).
Gupta, M., Kumar, M. & Gupta, Y. A blockchain-empowered federated learning-based framework for data privacy in lung disease detection system. Comput. Hum. Behav. 158, 108302 (2024).
Lei, B. et al. Hybrid federated learning with brain-region attention network for multi-center Alzheimer’s disease detection. Pattern Recogn. 153, 110423 (2024).
Zhou, L., Wang, M. and Zhou, N., 2024. Distributed federated learning-based deep learning model for privacy mri brain tumor detection. arXiv preprint arXiv:2404.10026.
Mitrovska, A., Safari, P., Ritter, K., Shariati, B. & Fischer, J. K. Secure federated learning for Alzheimer’s disease detection. Front. Aging Neurosci. 16, 1324032 (2024).
Vats, S., Kukreja, V. and Mehta, S., 2024, March. Tea Leaf Disease Detection: Federated Learning CNN Used for Accurate Severity Analysis. In 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI) (Vol. 2, pp. 1–6). IEEE.
Chhikara, J., Goel, N. and Rathee, N., 2024, October. A critical analysis of transfer learning models for computer vision tasks. In AIP Conference Proceedings (Vol. 3209, No. 1). AIP Publishing.
Hussain, A. & Aslam, A. Ensemble-based approach using inception V2, VGG-16, and Xception convolutional neural networks for surface cracks detection. J. Appl. Res. Technol. 22(4), 586–598 (2024).
Vayadande, K., 2024. Innovative approaches for skin disease identification in machine learning: A comprehensive study. Oral Oncology Reports, p.100365.
Adebiyi, A., Abdalnabi, N., Smith, E.H., Hirner, J., Simoes, E.J., Becevic, M. and Rao, P., 2024. Accurate Skin Lesion Classification Using Multimodal Learning on the HAM10000 Dataset. medRxiv, pp.2024–05.
Acknowledgement
The work of Chaman Verma is supported by Department of Media and Educational Technology, Faculty of Informatics, Eötvös Loránd University, Budapest, Hungary.
Funding
Open access funding provided by Eötvös Loránd University.
Author information
Authors and Affiliations
Contributions
Author contributions Statement: Shikha Sharma: Conceptualization, methodology, and initial draft preparation. Ruchi Mittal: Project supervision, methodology review, and manuscript refinement. Nitin Goyal: Data preprocessing, feature extraction, and technical analysis. S. B. Goyal: Federated learning design and model evaluation. Chaman Verma: Model implementation, performance analysis, and results interpretation.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
The authors affirm that this study was conducted in compliance with all relevant ethical guidelines and standards.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sharma, S., Mittal, R., Goyal, N. et al. Skin disease diagnostics through federated transfer learning on heterogeneous data. Sci Rep 16, 1991 (2026). https://doi.org/10.1038/s41598-025-31730-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-31730-7



















