Personalized federated learning for predicting disability progression in multiple sclerosis using real-world routine clinical data

Pirmani, Ashkan; De Brouwer, Edward; Arany, Ádám; Oldenhof, Martijn; Passemiers, Antoine; Faes, Axel; Kalincik, Tomas; Ozakbas, Serkan; Gouider, Riadh; Willekens, Barbara; Horakova, Dana; Havrdova, Eva Kubala; Patti, Francesco; Prat, Alexandre; Lugaresi, Alessandra; Tomassini, Valentina; Grammond, Pierre; Cartechini, Elisabetta; Roos, Izanne; Boz, Cavit; Alroughani, Raed; Amato, Maria Pia; Buzzard, Katherine; Lechner-Scott, Jeannette; Guimarães, Joana; Solaro, Claudio; Gerlach, Oliver; Soysal, Aysun; Kuhle, Jens; Sanchez-Menoyo, Jose Luis; Spitaleri, Daniele; Csepany, Tunde; Van Wijmeersch, Bart; Ampapa, Radek; Prevost, Julie; Khoury, Samia J.; Van Pesch, Vincent; John, Nevin; Maimone, Davide; Weinstock-Guttman, Bianca; Laureys, Guy; McCombe, Pamela; Blanco, Yolanda; Altintas, Ayse; Al-Asmi, Abdullah; Garber, Justin; Van der Walt, Anneke; Butzkueven, Helmut; de Gans, Koen; Rozsa, Csilla; Taylor, Bruce; Al-Harbi, Talal; Sas, Attila; Rajda, Cecilia; Gray, Orla; Decoo, Danny; Carroll, William M.; Kermode, Allan G.; Fabis-Pedrini, Marzena; Mason, Deborah; Perez-Sempere, Angel; Simu, Mihaela; Shuey, Neil; Singhal, Bhim; Cauchi, Marija; Hardy, Todd A.; Ramanathan, Sudarshini; Lalive, Patrice; Sirbu, Carmen-Adella; Hughes, Stella; Castillo Trivino, Tamara; Peeters, Liesbet M.; Moreau, Yves

doi:10.1038/s41746-025-01788-8

Download PDF

Article
Open access
Published: 24 July 2025

Personalized federated learning for predicting disability progression in multiple sclerosis using real-world routine clinical data

Ashkan Pirmani^1,2,3,4,
Edward De Brouwer¹,
Ádám Arany¹,
Martijn Oldenhof¹,
Antoine Passemiers¹,
Axel Faes^2,3,4,
Tomas Kalincik⁵,
Serkan Ozakbas⁶,
Riadh Gouider⁷,
Barbara Willekens^8,9,
Dana Horakova¹⁰,
Eva Kubala Havrdova¹⁰,
Francesco Patti¹¹,
Alexandre Prat¹²,
Alessandra Lugaresi¹³,
Valentina Tomassini¹⁴,
Pierre Grammond¹⁵,
Elisabetta Cartechini¹⁶,
Izanne Roos⁵,
Cavit Boz¹⁷,
Raed Alroughani¹⁸,
Maria Pia Amato^19,20,
Katherine Buzzard²¹,
Jeannette Lechner-Scott²²,
Joana Guimarães^23,24,
Claudio Solaro²⁵,
Oliver Gerlach²⁶,
Aysun Soysal²⁷,
Jens Kuhle²⁸,
Jose Luis Sanchez-Menoyo²⁹,
Daniele Spitaleri³⁰,
Tunde Csepany³¹,
Bart Van Wijmeersch³²,
Radek Ampapa³³,
Julie Prevost³⁴,
Samia J. Khoury³⁵,
Vincent Van Pesch³⁶,
Nevin John³⁷,
Davide Maimone³⁸,
Bianca Weinstock-Guttman³⁹,
Guy Laureys⁴⁰,
Pamela McCombe⁴¹,
Yolanda Blanco⁴²,
Ayse Altintas⁴³,
Abdullah Al-Asmi⁴⁴,
Justin Garber⁴⁵,
Anneke Van der Walt⁴⁶,
Helmut Butzkueven⁴⁶,
Koen de Gans⁴⁷,
Csilla Rozsa⁴⁸,
Bruce Taylor⁴⁹,
Talal Al-Harbi⁵⁰,
Attila Sas⁵¹,
Cecilia Rajda⁵²,
Orla Gray⁵³,
Danny Decoo⁵⁴,
William M. Carroll⁵⁵,
Allan G. Kermode⁵⁶,
Marzena Fabis-Pedrini⁵⁷,
Deborah Mason⁵⁸,
Angel Perez-Sempere⁵⁹,
Mihaela Simu⁶⁰,
Neil Shuey⁶¹,
Bhim Singhal⁶²,
Marija Cauchi⁶³,
Todd A. Hardy⁶⁴,
Sudarshini Ramanathan⁶⁵,
Patrice Lalive⁶⁶,
Carmen-Adella Sirbu⁶⁷,
Stella Hughes⁶⁸,
Tamara Castillo Trivino⁶⁹,
Liesbet M. Peeters^2,3,4 &
…
Yves Moreau¹

npj Digital Medicine volume 8, Article number: 478 (2025) Cite this article

3121 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Early prediction of disability progression in multiple sclerosis (MS) remains challenging despite its critical importance for therapeutic decision-making. We present the first systematic evaluation of personalized federated learning (PFL) for 2-year MS disability progression prediction, leveraging multi-center real-world data from over 26,000 patients. While conventional federated learning (FL) enables privacy-aware collaborative modeling, it remains vulnerable to institutional data heterogeneity. PFL overcomes this challenge by adapting shared models to local data distributions without compromising privacy. We evaluated two personalization strategies: a novel AdaptiveDualBranchNet architecture with selective parameter sharing, and personalized fine-tuning of global models, benchmarked against centralized and client-specific approaches. Baseline FL underperformed relative to personalized methods, whereas personalization significantly improved performance, with personalized FedProx and FedAVG achieving ROC-AUC scores of 0.8398 ± 0.0019 and 0.8384 ± 0.0014, respectively. These findings establish personalization as critical for scalable, privacy-aware clinical prediction models and highlight its potential to inform earlier intervention strategies in MS and beyond.

Estimating individual treatment effect on disability progression in multiple sclerosis using deep learning

Article Open access 26 September 2022

Patient's perspective in clinical practice to assess and predict disability in multiple sclerosis

Article Open access 29 October 2022

Prediction of disease progression and outcomes in multiple sclerosis with machine learning

Article Open access 03 December 2020

Introduction

Multiple Sclerosis (MS) is a complex neurological disorder affecting millions of people worldwide¹. In the absence of a cure, current treatment strategies focus on controlling disease progression and preventing relapses². However, the heterogeneity of MS complicates disease management, as each patient experiences unique disease progressions and varying responses to treatment³. The primary challenge lies in capturing this heterogeneity to enable personalized, data-driven treatment strategies^4,5,6.

A promising approach for personalizing care involves leveraging the increasing availability of Real-World Data (RWD) through the application of Machine Learning (ML)⁷. Previous studies have shown that ML can significantly improve our understanding of MS progression, uncover new biomarkers, and predict individual treatment responses^{8,9,10,11,12,13}.

Despite recent advances, developing advanced ML models for MS remains constrained by limited access to large-scale, high-quality datasets, which often require data centralization¹⁴. Although MS impacts an estimated 2.8 million individuals globally¹, the clinical data needed for precision modeling remain fragmented and siloed across healthcare institutions. Aggregating such data is complicated by legitimate but complex regulatory constraints, data ownership concerns, and inconsistent data quality standards^15,16,17,18. Consequently, these factors present significant obstacles to conventional centralized model training, motivating the need for alternative approaches.

However, this centralization challenge is not unique to MS and has been observed in other fields as well, motivating the development of Federated Learning (FL)^19,20. FL is a decentralized learning paradigm that enables training ML models while preserving data localization^21,22. This decentralized approach is strongly aligned with data privacy and protection standards, offering a solution to the dilemmas inherent in data centralization^23,24,25,26.

Within healthcare, FL has shown success in a broad spectrum of applications, ranging from predicting hospitalization for cardiac events^21,27, to enhancing whole-brain magnetic resonance imaging segmentation²⁸, and even advancing drug discovery²⁹. The potential of FL in MS is evident, although only a few studies have investigated this synergy, primarily focusing on imaging data^30,31,32.

Nevertheless, conventional FL methods rely on a single global model shared across all clients, which often performs poorly when local data distributions differ significantly. This challenge is particularly pronounced in MS, where clinical presentation, disease progression, and treatment response vary markedly across patients and institutions. In such heterogeneous settings, conflicting client updates can hinder convergence during training, while the absence of client-specific adaptation limits the model’s relevance to local contexts. These shortcomings not only impair overall performance but may also reduce the incentive for participation among underrepresented clients³³. Personalized Federated Learning (PFL) has emerged to address these gaps³³, enabling models to incorporate local data characteristics and thereby improving both predictive accuracy and robustness in diverse clinical environments such as MS.

Building on this motivation, our study evaluates the practical applicability of FL and PFL for analyzing routine clinical RWD in MS. We assessed multiple strategies for predicting disability progression and examined their feasibility in real-world healthcare environments. In doing so, we aimed to provide both empirical evidence and actionable insights to guide the effective deployment of FL-based solutions in clinical practice.

Specifically, we investigated five main data analysis paradigms: (1) centralized modeling, where all data are pooled into a single dataset; (2) baseline FL, which trains a joint model on decentralized data; (3) and (4) two PFL approaches that enable personalization; and (5) local modeling, where each client trains its model independently.

The first PFL approach introduces a novel ML architecture specifically designed for FL, called AdaptiveDualBranchNet. This method modifies the learning process by maintaining individual models with varying depths for each client and federating only partial model parameters. The second approach involves fine-tuning, where each client personalizes its own model after the FL setup using local data. The frameworks for baseline FL, as well as the adaptive and fine-tuned PFL approaches, are illustrated in Fig. 1, highlighting their respective architectures and workflows.

**Fig. 1: Analysis paradigms for predicting disability progression in multiple sclerosis using real world data.**

Our experiments leveraged the MSBase registry, the largest global database of MS patients, and simulated a realistic data partitioning scenario to reflect the heterogeneity observed in real-world clinical settings^34,35. This comprehensive experimental design enabled us to identify the conditions under which FL can be effectively applied to MS research.

Contributions: (1) Systematic evaluation of FL and PFL in MS research: we present the first comprehensive assessment of federated and PFL approaches for modeling disability progression in MS using routine clinical RWD. (2) Identification of conditions for effective FL deployment: Through extensive benchmarking, we identify key factors that influence the success of FL in MS, offering actionable guidance for researchers and healthcare practitioners. (3) Development of AdaptiveDualBranchNet: we introduce AdaptiveDualBranchNet, a novel FL architecture that enables partial model sharing and demonstrates improved performance compared to existing FL baselines.

Results

The federated experiments incorporated different strategies, including the FedAVG³⁶, FedProx³⁷, and FedOpt (FedYogi, FedAdam, FedAdagrad) algorithms³⁸. For the binary classification task of this study, we carried out an extensive hyperparameter tuning using grid search for each FL strategy³⁹ to find the best set of parameters. To maintain consistency and strengthen the reliability of our results, we repeated each experiment 10 times to confirm the robustness of our findings. The Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves for these results are shown in Fig. 2, with detailed metrics provided in Table 1. The results of the experiment runtime as another metric also presented in Table 2.

**Fig. 2: Comparative analysis of federated learning model performance.**

Table 1 Performance metrics (ROC–AUC and AUC–PR) for personalized federated learning models compared to federated learning baseline across various strategies

Full size table

Table 2 Experiment timing comparison: personalized vs. baseline federated learning (in min)

Full size table

Centralized superiority in overall performance, bridged by personalized federated learning

Among these results, the centralized model consistently outperformed all federated models, achieving the highest ROC–AUC (0.8092 ± 0.0012) and AUC–PR (0.4605 ± 0.0043) scores. This highlights the benefits of having access to centralized data, which enables more effective model training.

Within the federated models, FedAdam and FedYogi demonstrated the best performance, with ROC–AUC and AUC–PR scores of 0.7920 ± 0.0031 and 0.4488 ± 0.0061 for FedAdam, and 0.7910 ± 0.0028 and 0.4420 ± 0.0078 for FedYogi, respectively. However, these gains came at the cost of higher computational demands, as FedAdam required the longest training time at 236 min, about 23% more than FedProx.

FedAVG provided a balanced alternative, with a ROC–AUC of 0.7840 ± 0.0019 and an AUC–PR of 0.4030 ± 0.0059, completing training in ~193 min. FedProx, while offering similar performance to FedAVG, reduced training time by around 11%, completing in 172 min, making it a practical option for faster execution.

FedAdagrad, although slightly lower in performance (ROC–AUC: 0.7762 ± 0.0021, AUC–PR: 0.3913 ± 0.0061), showed a comparable training duration of 190 min, balancing efficiency and accuracy.

Nevertheless, baseline FL remained limited in overall performance. As shown in Fig. 4, challenges such as non-IID data distributions, varying dataset sizes, and class imbalance hindered the model’s ability to generalize. To overcome these limitations, we evaluated two personalization strategies aimed at improving overall performance and compared their results in the following analysis.

To begin with, AdaptiveDualBranchNet (referred to as “Adaptive”) demonstrated clear improvements in model performance compared to the baseline FL paradigm. As shown in Table 1, the adaptive method consistently increased both ROC–AUC and AUC–PR across all FL strategies. However, as Table 2 indicates, these gains came at the cost of increased computational time.

FedProx, which is developed as a federated strategy to tackle system heterogeneity, variations in client data distributions and computational resources, demonstrated a 7.2% improvement in ROC–AUC and a 31% increase in AUC–PR over the baseline federated model. This suggests that FedProx’s adaptive personalization enhances its ability to capture diverse data patterns effectively. However, this increased flexibility came with a 27% longer training time, highlighting a trade-off between improved accuracy and computational efficiency. Using FedAVG, a simpler strategy that averages neural network parameters across clients, showed a similar improvement in AUC–PR (28%) while requiring only a 7% increase in computational time. This indicates that Adaptive FedAVG benefits from personalization with minimal added computational burden, making it a strong choice in resource-constrained scenarios where personalization is desired. The adaptive approach also boosted the performance of FedAdagrad, FedYogi, and FedAdam, all of which utilize adaptive optimization techniques. FedAdagrad demonstrated a 7.7% improvement in ROC–AUC and a 31% increase in AUC–PR, though it came with a 21% increase in training time. FedYogi improved AUC–PR by 6.4%, but required a 28% increase in time. On the other hand, FedAdam achieved a 16% increase in AUC–PR with only a 4% increase in training time, indicating its efficiency in handling personalization through optimized learning rate and momentum adjustments, although its initial training time was relatively high.

Following adaptive personalization, fine-tuning also improved model performance by adapting the global model to each client’s local data distribution, allowing it to better capture individual patient patterns. Fine-tuned models consistently demonstrated improvements in both ROC–AUC and AUC–PR metrics, as summarized in Table 1. For instance, both FedProx and FedAVG achieved the highest performance following fine-tuning. The FedProx model reached a ROC–AUC of 0.8375, reflecting a 6.91% improvement over its baseline of 0.7834, while FedAVG closely followed with a ROC–AUC of 0.8370. For AUC–PR, FedProx saw an increase from 0.4081 to 0.5221, representing an ~28% gain.

Among the strategies tested, FedAdagrad benefited the most from fine-tuning, with a 7.5% increase in ROC–AUC and a 28.9% rise in AUC–PR. This suggests that FedAdagrad was particularly responsive to fine-tuning, allowing it to better adapt to the data distributions specific to individual clients.

Figure 3 also confirms that fine-tuned models generally outperform federated models. This is evident from the consistently positive differences in both ROC-AUC and AUC-PR scores across all methods. Moreover, the fact that these differences rarely cross zero indicates that the advantages of Adaptive models are statistically meaningful. When it comes to comparing adaptive and fine-tuned models, the differences are minor and hover around zero, suggesting that these two approaches are quite similar overall. While adaptive models hold a slight edge in methods like FedAdam and FedYogi. Note that the x-axis scale varies across the plots.

**Fig. 3: Comparison of ROC–AUC and AUC–PR differences among adaptive, fine-tuned, and federated models.**

Federated models struggle against centralized model at the client level

To perform a detailed client-level comparison across countries, we evaluated five key paradigms: federated, fine-tuned, adaptive, centralized, and local. For consistency, FedProx was selected as the representative federated model because of its balance of performance and computational efficiency. While FedAdam showed marginally better performance, FedProx offered a more favorable trade-off between accuracy and resource usage. Nonetheless, comprehensive results for all federated strategies are provided in Supplementary Tables 1–4.

The results presented in Table 3 indicated that the centralized model outperformed all baseline federated approaches, achieving a mean weighted average ROC–AUC of 0.8092 and an AUC–PR of 0.4605. This represents an improvement of ~3.3% in ROC–AUC and 12.8% in AUC–PR over the federated model. These results highlight the advantage of centralized training, as it fully leverages the entire dataset, leading to better performance compared to federated models.

Table 3 Performance scores for different learning paradigms across countries

Full size table

Local models capture specific client insights but lack generalization

To provide a more comprehensive perspective, we also calculated the weighted averages for the local models. The local model achieved a ROC–AUC of 0.7983 and an AUC–PR of 0.3874, placing its ROC–AUC performance between that of the centralized and federated models, though its AUC–PR lagged behind the federated models. This suggests that while local models benefit from training on specific client data, they may lack the broader insights captured by centralized models and the generalized patterns learned by federated models.

Personalization enhances client-level performance in federated learning

Bringing together the results from PFL, baseline FL, centralized, and local models, we analyzed the performance scores presented in Table 3. Notably, we observed that while the federated model initially lagged behind the centralized model, the PFL approach allowed it to close the performance gap and eventually surpass the centralized model. This suggests that personalization can significantly boost the effectiveness of FL by adapting to client-specific data distribution.

From a broader perspective, the adaptive and fine-tuned paradigms are the top performers, with adaptive models leading in 11 countries and fine-tuned models in 6. This demonstrates the clear advantage of PFL approaches, which consistently achieve the highest ROC–AUC scores across different countries. The centralized paradigm ranks next, leading in five countries, showing its occasional competitiveness. The local model outperformed others in three countries, while federated paradigms did not achieve the highest performance in any country. This suggests that, without the personalized adjustments or aggregation benefits seen in centralized models, federated approaches struggle to match the predictive accuracy of the other paradigms. A similar pattern was observed for AUC–PR.

Federated vs. local models: impact of dataset size

To assess whether countries experienced greater benefits from federated or local models (excluding the centralized and PFL paradigms), our analysis of the ROC–AUC metric revealed that 19 cases favored local models, while six cases showed better results with the federated approach. Focusing on dataset size, particularly for intermediate-sized countries (ranging from BE to AR), we observed significant performance differences in favor of local models, with an average advantage of 15.98% in ROC–AUC.

In contrast, these differences were much less pronounced in countries with larger datasets, such as those from CZ to AU, where federated models outperformed local models in five of the top six cases.

Further analysis using Spearman correlation confirmed these observations, revealing a moderate negative correlation (ρ = −0.503, p = 0.010) between dataset size and the performance gap (ROC–AUC difference) between federated and local models. This pattern suggests that as dataset size increases, federated models can generalize more effectively, achieving performance comparable to or even surpassing that of local models. Countries with smaller datasets, the federated model typically underperforms in comparison to local models, with notable differences in metrics scores. It appears that local models, when trained on smaller, more specific datasets, are able to capture unique dataset characteristics that the federated model—due to its aggregated, generalized approach—may fail to recognize. Similarly, the performance gap between federated and centralized models showed a strong negative correlation (ρ = −0.761, p < 0.001), underscoring the ability of federated models to approach centralized model performance when trained on larger datasets.

Performance trends in the largest data-contributing countries

Focusing on analyzing the top six countries, CZ, IT, TR, ES, CA, AU, which hold 82% of the data as highlighted in Fig. 4d, Our goal is to identify the most effective alternative paradigms in comparison to one another. The findings from this analysis summarized in Table 4 illustrates a notable trend: across the six countries, the adaptive paradigm outperforms other approaches, closely followed by the fine-tuned model. These two paradigms frequently demonstrate superior ROC–AUC scores, surpassing local, centralized and federated models. A detailed breakdown highlighting the best-performing models for each country is provided in Supplementary Table 5.

**Fig. 4: Heterogeneity of country-specific data partitions for federated learning.**

Table 4 Performance comparison of top six countries by dataset size across different paradigms

Full size table

Discussion

Advancing ML models for complex conditions such as MS requires access to large and diverse datasets. However, centralizing sensitive patient data from multiple institutions presents significant regulatory, logistical, and ethical challenges^{14,15,16,17,18}. FL offers a privacy-aware alternative by enabling collaborative model training across decentralized datasets^19,20. Yet, baseline FL methods, such as FedAVG, which learn a single global model, often underperform in the presence of substantial statistical heterogeneity (non-IID data) common in multi-institutional clinical datasets^23,36. These limitations motivate the development of PFL approaches, which aim to adapt models to the unique characteristics of each client while preserving the collaborative benefits of FL³³.

To systematically investigate this challenge, we compared multiple FL paradigms against centralized and local baselines for predicting MS disability progression. The results revealed a clear performance hierarchy (Tables 1, 3): PFL strategies, including Adaptive and fine-tuning methods, achieved the highest discrimination performance, surpassing both centralized and local models. Although centralized models performed well and ranked third overall, their feasibility is often constrained by privacy regulations and logistical barriers that complicate large-scale data aggregation. In many practical healthcare scenarios, assembling centralized datasets remains challenging. These findings underscore the relevance of federated approaches, which enable collaborative model development without compromising data sovereignty. Importantly, our results show that with appropriate personalization strategies, FL can become a practical and privacy-respecting alternative to centralized training.

Beyond discrimination performance, clinical applicability also requires strong calibration and reliable risk estimation. While top-performing PFL models, such as Adaptive FedProx (ROC-AUC 0.8398), achieved discrimination comparable to published benchmarks for similar MS prediction tasks^10,13, discrimination alone does not guarantee clinical utility. For meaningful clinical use, particularly at the individual patient level, predictions must accurately reflect the true risk. Current model performance may support applications such as cohort-level monitoring or population health analyses, but further improvements in predictive certainty and calibration are needed before safe deployment in high-stakes clinical decision-making. In particular, comprehensive reporting of calibration metrics, including Brier scores, expected calibration error, and calibration diagrams, would provide a more complete assessment of model reliability⁴⁰. Ensuring that predictive outputs are trustworthy is essential to avoid patient harm and misinformed management⁴¹.

Ensuring clinical applicability not only requires discrimination and calibration but also demands flexible modeling approaches that can adapt to local data characteristics without sacrificing global knowledge transfer. Addressing this need, our proposed AdaptiveDualBranchNet architecture contributes to architectural personalization strategies by enhancing model flexibility. It introduces dynamic depth in the personalized branch, allowing complexity to scale with local data volume. Unlike prior approaches such as FedPer⁴², which rely on fixed partitions between shared and private layers, AdaptiveDualBranchNet retains the full global model and integrates client-specific layers in parallel. This design preserves the expressive capacity and transferability of the shared global representation while enabling client-specific adaptation. Benchmarking against representative PFL baselines, detailed in Supplementary Note Section 1.4, further supports the effectiveness of this approach.

Building upon these improvements, future work could explore dynamic or learning-based adaptations to enhance personalization flexibility. Rather than relying on heuristically defined thresholds for scaling model depth, allowing models to autonomously adjust their complexity in response to richer client-specific data signals may further improve generalization and robustness. Beyond architectural adaptivity, optimization dynamics also represent an important axis for personalization. In this study, a uniform learning rate was applied across all parameters of the models, with dynamic adjustment governed by a ReduceLROnPlateau scheduler⁴³. Preliminary investigations into more granular strategies, such as distinct learning rates for global and personalized branches, differential regularization, and gradient clipping, suggested additional opportunities for performance gains. Although not included in the final experiments to maintain a controlled baseline, these approaches highlight promising future directions. Moreover, client-specific hyperparameter adaptation, modulating local learning rates, batch sizes, or regularization strengths based on client characteristics^44,45,46, may further improve fairness, stability, and adaptability across heterogeneous federated settings. Together, dynamic architectural scaling and personalized optimization strategies could enable more resilient and equitable predictive modeling in decentralized healthcare applications.

Complementary to adaptive architectures, post-hoc fine-tuning provided an additional strategy for personalization. Fine-tuning global models on local data distributions improved model performance but showed dependency on data sufficiency. Clients with sparse data faced greater risks of overfitting and unstable evaluation due to small or imbalanced test sets. Stratified analysis (Supplementary Fig. 2) revealed that fine-tuning yielded the most consistent improvements for clients with intermediate data volumes, particularly those initially underserved by the global model. Detailed stratified results are provided in Supplementary Note Section 1.5. These findings emphasize the need for personalization methods that remain effective even in low-data settings, such as adapter-based fine-tuning^47,48 or selective layer updating combined with lightweight regularization⁴⁹.

Several limitations of this study should be acknowledged. First, the study relied on simulation due to data governance constraints. Although the simulation framework was designed to accurately model algorithmic behavior and data heterogeneity, it cannot fully capture real-world factors such as network variability, system heterogeneity, or participant dropout^50,51. Consequently, certain absolute metrics, such as the ~15% increase in training time observed for the Adaptive model, may not generalize directly. Although computational overhead was manageable in simulation, real-world deployments will be necessary to properly assess practical resource demands and communication efficiency. Nevertheless, the relative performance hierarchies observed across paradigms offer valuable hypotheses for future validation under real-world constraints.

Further limitations include the use of retrospective RWD^7,52, which may introduce missingness or bias, and the country-based data partitioning schema. While pragmatic, country-level partitioning may not fully capture natural clinical or institutional boundaries. Exploring alternative partitioning strategies, such as clinic-level, regionally grouped, or quantity-skewed clients, could provide further insights into model robustness under varied federated topologies. In addition, the use of weighted averaging during evaluation may introduce bias favoring larger clients, as discussed in more detail in Supplementary Note 1.1. Future work could explore complementary evaluation strategies, such as server-side global testing, to provide a more balanced assessment of model generalizability.

Translating FL models into clinical practice will require rigorous external validation using independent cohorts across diverse real-world settings, patient populations, and infrastructures. Such efforts could be facilitated by initiatives like the European Health Data Space^53,54 and should align with established reporting guidelines such as TRIPOD⁵⁵ to ensure methodological transparency and reproducibility.

Sustainable and scalable deployment of FL in healthcare will also require supportive ecosystem development, including trusted intermediaries, standardized governance protocols, and mechanisms for equitable benefit-sharing. Initiatives such as the Global Data Sharing Initiative in MS^14,56 and projects like MELLODDY⁵⁷ illustrate promising models for federated collaboration across institutions and industries.

Future technical enhancements should explore integrating multimodal data sources (e.g., MRI, Evoked Potentials⁵⁸), adopting advanced privacy-preserving techniques (e.g., differential privacy, secure multi-party computation^59,60) while balancing trade-offs, and developing refined personalization strategies, particularly for low-resource clients.

Ultimately, unlocking the clinical potential of FL will depend not only on technical advances but also on embedding FL within a broader healthcare ecosystem that supports data harmonization, clinician engagement, regulatory alignment on fairness, interpretability⁶¹, and privacy. Demonstrating real-world clinical utility through prospective impact studies will be essential to validate technical performance and build trust in FL-enabled decision support as a safe, fair, and effective tool for patient care⁶².

Methods

Cohort definition and episode extraction

Data of individuals diagnosed with MS were systematically collected and combined from 146 distinct centers, as documented in the MSBase registry up to September 2020^34,35. The data was collected during routine clinical care at tertiary MS centers. The preliminary extraction of data from MSBase was governed by certain inclusion criteria: a minimum follow-up period of 12 months, a minimum age of 18 years, and a diagnosis of either relapsing remitting MS, secondary progressive MS, primary progressive MS, or clinically-isolated syndrome. The resulting dataset encompassed 44,886 patients. To uphold the integrity of the data, several quality assurance measures were employed. These entailed the elimination of duplicate or inconsistent visits recorded on the same day, removal of visits dated before 1970, and exclusion of patients exhibiting clinically isolated syndrome at their last documented visit.

Each patient’s clinical trajectory was segmented into multiple, potentially overlapping, episodes, using the exact methodology for definition and extraction established and validated in prior work by De Brouwer et al.¹³. For clarity and completeness within this manuscript, we specify the definition used: Each episode represents a distinct instance for predicting disability progression and comprises three core components:

1.
A Baseline EDSS Measurement: A single Expanded Disability Status Scale (EDSS) score recorded at time t = 0.
2.
An Observation Window: This includes the complete available clinical history for the patient prior to the baseline measurement (t ≤ 0), encompassing all recorded EDSS scores, Kurtzke Functional Systems (KFS) scores, relapse information, treatment history, and other relevant covariates from the MSBase registry. The duration of this observation window is therefore variable, depending on the length of the patient’s recorded history up to t = 0.
3.
A Disability Progression Label: A binary outcome indicating whether confirmed disability progression occurred within the two-year period following the baseline EDSS measurement (0 < t ≤ 2 years). Confirmed disability progression required demonstrating a sustained increase in EDSS, based on thresholds defined by Kalincik et al.⁶³, confirmed over a period of at least six months, and excluding any EDSS measurements taken within one month of a recorded relapse.

An episode was considered valid for inclusion only if: (i) the observation window contained at least three EDSS measurements within the 3.25 years immediately preceding the baseline (t = 0), ensuring sufficient recent data density; and (ii) adequate follow-up data existed after t = 0 to ascertain the confirmed disability progression status within the 2-year prediction window. Critically, although episodes from the same patient may share common historical data, each valid episode, defined by its unique baseline time point and subsequent 2-year outcome period, was treated as an independent instance for model training and evaluation.

This curation and episode extraction process yielded a final dataset D comprising ∣D∣ = 283,115 valid episodes derived from 26,246 unique patients. This dataset forms the basis for the binary classification task aimed at predicting disease disability progression within a two-year horizon. For a more comprehensive description of data variables, their definitions, and their preprocessing, we refer to the publication by De Brouwer et al¹³.

As the objective of this study is to evaluate the effectiveness of FL, it was essential to partition the centralized global dataset D to set up FL experiments. The global dataset (preprocessed dataset from ref. ¹³) D included a key feature indicating the geographical origin of the data. Using this feature, the dataset D was divided into 32 disjoint subsets \({D}_{{C}_{i}}\), each corresponding to a different country, as defined by: \(D=\mathop{\bigcup }\nolimits_{i = 0}^{31}{D}_{{C}_{i}},\quad | {D}_{{C}_{i}}| \ge 5,\quad \forall i\in \{0,1,\ldots ,31\}\). Within each subset, the data was split into 60% training, 20% validation, and 20% test. Normalization was performed independently for each partition using statistics (mean and standard deviation) derived from its respective training set.

Upon detailed examination of this partitioning scheme, significant variations were evident in the dataset sizes across the created countries, as depicted in Fig. 4a. Figure 4c highlights that, ~75% of countries included fewer than 5000 samples. This partitioning was particularly revealing when considered alongside the pie chart analysis in Fig. 4d, which showed that six countries (CZ, IT, TR, ES, CA, AU) accounted for 82% of the total cohort size. This comparison indicated that while the majority of the dataset was concentrated in a few countries, many countries did not hold a significant share of the total data.

Further analysis revealed substantial class imbalance across countries, captured in Fig. 4b and Table 5. Both illustrate pronounced variability in the proportions of Class 0 (“MS worsening not confirmed”) and Class 1 (“MS worsening confirmed”), with several countries exhibiting complete absence of one class. Although Class 0 predominated overall, both the magnitude of label imbalance and the variation in dataset sizes differed markedly across clients. This compounded heterogeneity in outcome distributions and sample availability reflects a fundamental deviation from the classical assumption of identically and independently distributed (IID) data, presenting additional challenges for federated model development in real-world clinical settings.

Table 5 Country-level dataset sizes and class distributions, sorted by dataset size and organized column-wise

Full size table

Predicting disability progression

This analysis sets the stage for addressing a key clinical question in MS research: the progression of disability. This dimension of MS research is critical, as underscored in the literature⁹, due to its substantial impact on PwMS. The precise prediction and thorough monitoring of disability progression are instrumental for clinicians in formulating effective treatment strategies, personalizing patient care, and ultimately, enhancing patient outcomes^64,65. Our study contributes to this by investigating methodologies that not only aim to augment patient care but also seek to expand the medical community’s comprehension of MS. This is achieved by harnessing insights from RWD, stepping towards the conversion of these insights into tangible real world evidence⁶⁶.

Building on the foundational work of De Brouwer et al.¹³, this study adopted the FL approach for predicting confirmed disability progression over a two-year period with a 6-month confirmation window, utilizing RWD. This research leveraged the decentralized and privacy-preserving attributes of FL, marking a significant shift from conventional centralized data analyses. The investigation stood at the convergence of clinical need and technological innovation, with the potential to optimize the utilization of RWD in MS research. In the following section, we outline the experimental setup used in this study, which includes federated, adaptive, fine-tuned federated, local, and centralized models.

Federated model

After partitioning the dataset by countries, we trained the FL models using each country's dataset. The experiment simulated a server-client architecture, with the server coordinating the learning process and the clients participating in distributed training. The server initiated the training process by setting up the model and distributing this initial model parameters to all available clients. Following this, each client starts local training on their respective dataset.

During each training cycle, or federation round, clients train their local models, these being the global model received from the server, for E epochs. After training these E epochs, the clients send their updated models back to the server, along with relevant metrics and the sizes of their test set. The server then executes the federated strategy to update the global model.

This process is iterative, with the server distributing the updated global model to the clients in each subsequent federation round. The cycle continues until a predetermined number of federation rounds F were completed.

In our experiments, we selected a Multi-Layer Perceptron (MLP) with 42 input features as the baseline model to facilitate a comprehensive analysis. While De Brouwer et al.¹³ explored various architectural frameworks in a centralized setting, we chose the MLP for its reliable performance and lack of significant differences compared to other models in our analyses.

The training parameters were set with a batch size of B = 512, a local client learning rate η_k = 1e-4, and a maximum number of epochs E set to either 10 or 20, depending on the specific experiment. Weight decay was applied with λ = 5e-5, and early stopping was employed with a patience parameter P = 5, indicating that training would halt if validation loss did not improve after five epochs. Regarding the model parameters, both the baseline and AdaptiveDualBranchNet (Core layers) models had h = 512 hidden units, a dropout rate of δ = 0.1, and l = 5 layers. The AdaptiveDualBranchNet model was further enhanced with the ability to dynamically add up to \({l}_{k}^{\,\text{ext}\,}=5\) extra layers, each comprising h_ext = 64 hidden units. For the FL setup, we conducted F = 350 federation rounds across K = 32 clients, with all clients participating in both training and evaluation processes. The entire experimental process was repeated R = 10 times for robustness.

In terms of the specific federated optimization strategies, FedProx was configured with a proximal term of μ = 1e-3. For the FedYogi optimization, parameters included η = 1e-2, η_k = 9.5e-2, τ = 1e-8, β₁ = 0.6, and β₂ = 0.999. The FedAdam strategy shared the server learning rate of η = 1e-2 and local learning rate η_k = 9.5e-2, with a regularization value τ = 1e-8, and momentum parameters β₁ = 0.6 and β₂ = 0.999. Lastly, the FedAdagrad model used η = 1e-2, η_k = 1e-2, and τ = 1e-8.

For training, we used Python 3.9.19 with PyTorch 1.13.1, Flower 1.8.0, Scikit-learn 1.5.0, and Pandas 2.2.2. The complete source code of this study is openly accessible at https://github.com/ashkan-pirmani/FL-MS-RWD. The resources and services used in this work were provided by the VSC (Flemish Supercomputer Center), funded by the Research Foundation, Flanders (FWO) and the Flemish Government utilizing Intel Xeon Platinum 8468 CPUs (Sapphire Rapids) cluster.

Personalized federated learning

In FL, applying a uniform model architecture across heterogeneous clients poses significant challenges³³. In our setting, clients varied substantially in dataset sizes and class distributions, deviating from the classical assumption of IID data. Although it is common practice to deploy a fixed architecture, for example an MLP with static depth and width, we observed that such one-size-fits-all designs introduce important inefficiencies: larger models often overfit on data-sparse partitions, whereas smaller models fail to fully leverage data-rich clients. This discrepancy underscores the need for adaptive modeling strategies that can adjust dynamically to local data characteristics.

Allowing each client to have a distinct architecture would conceptually address this issue but would render aggregation across clients infeasible due to mismatched model structures. To overcome this, we propose AdaptiveDualBranchNet, a model that dynamically modulates its complexity while maintaining architectural compatibility for aggregation. The network features a dual-branch design: a Core branch, comprising five fixed hidden layers with 512 neurons each, and a flexible Extension branch whose depth is determined by the local data volume. The Extension branch can add up to five additional hidden layers, each with 64 neurons, following a logarithmic scaling heuristic. Clients with larger datasets (e.g., more than 25000 samples) utilize all extension layers, while clients with smaller datasets (e.g., fewer than 2000 samples) omit the Extension branch entirely to reduce overfitting risk. Intermediate clients proportionally incorporate one to four extension layers based on their dataset size. Outputs from the Core and Extension branches are merged before the final prediction layer, ensuring a shared representational space across all clients. During federated training, only non-extension layers parameters are communicated and aggregated globally, preserving consistency while enabling localized adaptation. The design parameters and scaling thresholds were empirically optimized using development data to balance predictive accuracy, computational efficiency, and generalization across heterogeneous client populations. Pseudocode for the AdaptiveDualBranchNet algorithm is provided in Algorithm 1, and a schematic comparison with a standard MLP is shown in Fig. 5.

**Fig. 5: The diagram depicts the structure of the Baseline and AdaptiveDualBranchNet models.**

Algorithm 1

AdaptiveDualBranchNet: A Step-by-step Pseudocode Representation

Initialization:

for each client k in K do:

Obtain the dataset size n_k

Compute the number of extension layers \({l}_{k}^{\,\text{ext}\,}\) using the function \({l}_{k}^{\,\text{ext}}=\text{calculate}\_\text{extension}\,({n}_{k})\)

Initialize the local model parameters \({\Theta }_{k,f}=\{{\Theta }_{k,f}^{\,\text{input}},{\Theta }_{k,f}^{{\rm{core}}},{\Theta }_{k,f}^{{\rm{ext}}},{\Theta }_{k,f}^{{\rm{combine}}},{\Theta }_{k,f}^{\text{output}\,}\}\)

where:

\({\Theta }_{k,f}^{\,\text{input}\,}\) are the parameters of the input layer,

\({\Theta }_{k,f}^{\,\text{core}\,}\) are the parameters of core layers,

\({\Theta }_{k,f}^{\,\text{ext}\,}\) are the parameters of the \({l}_{k}^{\,\text{ext}\,}\) extension layers,

\({\Theta }_{k,f}^{\,\text{combine}\,}\) are the parameters of the combining layer, and

\({\Theta }_{k,f}^{\,\text{output}\,}\) are the parameters of the output layer.

end for

for federation round f = 1 to F do:

Local Training (Round f):

for each client k in K do:

Train the local model Θ_k,f on local data for E epochs or until an early stopping criterion is met

Update local model parameters to \(\Theta {{\prime} }_{k,f}=\{\Theta {{\prime} }_{k,f}^{\,\text{input}},\Theta {{\prime} }_{k,f}^{{\rm{core}}},\Theta {{\prime} }_{k,f}^{{\rm{ext}}},\Theta {{\prime} }_{k,f}^{{\rm{combine}}},\Theta {{\prime} }_{k,f}^{\text{output}\,}\}\)

end for

Local Model Upload (Round f):

for each client k do:

Send parameters of all non-extension layers to the central server:

\(\{\Theta {{\prime} }_{k,f}^{\,\text{input}},\Theta {{\prime} }_{k,f}^{{\rm{core}}},\Theta {{\prime} }_{k,f}^{{\rm{combine}}},\Theta {{\prime} }_{k,f}^{\text{output}\,}\}\)

end for

Aggregation (Round f):

Aggregate uploaded non-extension layers’ parameters using the chosen Federated Strategy:

\({\Phi }_{f}=\,\text{FedStrategy}\,(\{\Theta {{\prime} }_{1,f}^{\,\text{input}},\Theta {{\prime} }_{1,f}^{{\rm{core}}},\Theta {{\prime} }_{1,f}^{{\rm{combine}}},\Theta {{\prime} }_{1,f}^{{\rm{output}}},\ldots ,\Theta {{\prime} }_{K,f}^{{\rm{input}}},\Theta {{\prime} }_{K,f}^{{\rm{core}}},\Theta {{\prime} }_{K,f}^{{\rm{combine}}},\Theta {{\prime} }_{K,f}^{\text{output}\,}\})\)

Example of FedAVG Strategy:

\({\Phi }_{f}=\mathop{\sum }\nolimits_{k = 1}^{K}\frac{{n}_{k}}{n}\left(\Theta {{\prime} }_{k,f}^{\,\text{input}}+\Theta {{\prime} }_{k,f}^{{\rm{core}}}+\Theta {{\prime} }_{k,f}^{{\rm{combine}}}+\Theta {{\prime} }_{k,f}^{\text{output}\,}\right)\)

Global Model Distribution (Round f):

Send updated global non-extension model Φ_f to clients

for each client k do:

Update local model to \({\Theta }_{k,f+1}=\{{\Phi }_{f}+\Theta {{\prime} }_{k,f}^{\,\text{ext}\,}\}\)

end for

Repeat:

Repeat steps 3–9 for F federation rounds

Our experimental design aimed to go beyond only training FL models. To achieve this, we introduced an additional fine-tuning phase. This phase was crucial for evaluating the impact on model performance, as it aimed to further optimize the models following the initial FL process. To make the process clear, initially, a global model was trained using a federated approach. Upon completion of this global training, the model was disseminated back to each client for further refinement. Subsequently, each model locally underwent retraining exclusively with data from its corresponding client. To be more specific, each client’s model is individually optimized using client-specific data. The strategic importance of fine-tuning lies in its ability to enhance the models’ sensitivity to the unique attributes of their respective clients. This process synergizes the extensive, general learning acquired during the global federated training with the detailed, localized understanding extracted from each client’s data. The objective was to strike a balance between the global model’s generalization capabilities and the local datasets’ specificity.

For fine-tuning, the local client learning rate was reduced to η_k = 1e-4 and dynamically adjusted using a scheduler with a patience threshold of five epochs. To allow the model to better explore the data, the batch size was also reduced to B = 128 across all clients, with training extended up to E = 50 epochs. This setup enabled the model to process the data more thoroughly during the fine-tuning phase.

Local model

In the local model setup, each client independently trained a model using only their own partition, without any data federation or pooling. This approach, lacking centralized coordination or parameter sharing, served as a baseline for comparing the efficacy of FL methods from another viewpoint.

Centralized model

In this setting, the global dataset D, was employed to train a centralized model. This served as another benchmark, where all data were aggregated and utilized in a conventional, non-federated manner for model training and evaluation.

Evaluation method

In centralized learning, performance is typically gauged using a unified global test set. However, FL introduces the flexibility of performing evaluations either on the server-side or directly on the client-side⁶⁷. Server-side evaluation necessitates the existence of one global test set located on the server. This approach encounters significant obstacles due to the distributed nature of sensitive RWD across multiple stakeholders and the rigorous demands of data privacy and regulatory standards. These challenges severely limit the feasibility of consolidating a singular global test set on the server-side.

Another challenge with server-side evaluation is ensuring the representativeness of the test set. Particularly with heterogeneous, non-IID settings, there is an increased risk of not accurately capturing the full diversity of the distributed datasets. Such biases could inadvertently skew the analysis, leading to findings that are less reliable or generalizable, thus compromising the study’s validity.

Considering the need to reflect a real-world scenario, our study chose a federated (client-side) evaluation approach. To guarantee a fair and representative assessment and to avoid reliance on a potentially biased global test set, we implemented a consistent test set across all experiments (including centralized, federated and fine-tuned). This dataset, selected based on data partitioned by each country. The local model was tested on the unseen test sets of each country, and the performance metrics from these tests were aggregated using a weighted average based on the test set size of each country.

To explain the evaluation process on the client side clear, let K be the total number of clients (in our experiment K = 32). The size of the dataset for the i-th client is n_i, where i ranges from 1 to K. The evaluation metric achieved by the i-th client is represented as M_i. The total size of all clients’ datasets combined is N, calculated as \(N=\mathop{\sum }\nolimits_{i = 1}^{K}{n}_{i}\). The overall evaluation metric E for the global FL model is given by the formula: \(E=\frac{1}{N}\mathop{\sum }\nolimits_{i = 1}^{K}({n}_{i}{E}_{i}),\) which reflects the model’s performance across all clients. This method gives weight to the individual characteristics of each client, thus offering a detailed understanding of the model’s performance in different environments. However, this can cause issues, especially if some clients have much larger datasets than others, leading to a biased evaluation metric. This potential bias makes it difficult to directly compare the performance of the FL model with that of a centralized model gauged on independent test set.

Metrics

During all experimental setups, the evaluation metrics used included the ROC–AUC, AUC–PR, and the total experiment time, measured from the beginning of training to the end of the last federation round. These metrics served as robust indicators for assessing both the performance and computational efficiency of the models under investigation.

Ethics declarations

This study was submitted to KU Leuven’s Privacy and Ethics platform (PRET). The project application was scrutinized in view of principles and obligations laid down in the General Data Protection Regulation of the European Parliament and of the Council of 27 April 2016. Based on the information presented and given the researcher’s explicit declaration that the project carried out accordingly, KU Leuven issued a favorable advice and confirmed that the project may be implemented as such. All relevant information concerning the processing of personal data in the framework of this project has been registered in KU Leuven’s records of processing activities. The project application was also reviewed by the Social and Societal Ethics Committee (SMEC) of KU Leuven. The Committee confirmed that the project application meets the expected ethical standards regarding the voluntary involvement of human participants in scientific research. SMEC’s decision regarding this protocol is favorable (PRET approval number: G-2023-6771).

Data availability

The dataset used in this study can be accessed by requesting permission from the MSBase principal investigators involved. MSBase acts as the central contact point, facilitating data-sharing agreements with individual data custodians to ensure compliance with ownership requirements. Requests should be directed to info@msbase.org. Access is controlled to protect patient data and adhere to data ownership policies.

Code availability

In accordance with the FAIR principles in scientific research, all code utilized in this study has been made publicly available. The preprocessing scripts can be accessed at https://gitlab.com/edebrouwer/ms_benchmark, while the training pipeline is available at https://github.com/ashkan-pirmani/FL-MS-RWD. The ML pipeline was developed using PyTorch⁶⁸, and metrics were computed using Scikit-learn. Figures were generated with Matplotlib, and the FL process was facilitated using Flower⁶⁹. A complete list of dependencies is provided in the environment file located within the training repository.

References

Walton, C. et al. Rising prevalence of multiple sclerosis worldwide: insights from the atlas of MS, third edition. Mult. Scler. J. 26, 1816–1821 (2020).
Article Google Scholar
McGinley, M. P., Goldschmidt, C. H. & Rae-Grant, A. D. Diagnosis and treatment of multiple sclerosis: a review. JAMA 325, 765–779 (2021).
Article CAS PubMed Google Scholar
Reich, D. S., Lucchinetti, C. F. & Calabresi, P. A. Multiple sclerosis. N. Engl. J. Med. 378, 169–180 (2018).
Article CAS PubMed PubMed Central Google Scholar
Degenhardt, A., Ramagopalan, S. V., Scalfari, A. & Ebers, G. C. Clinical prognostic factors in multiple sclerosis: a natural history review. Nat. Rev. Neurol. 5, 672–682 (2009).
Article PubMed Google Scholar
Pellegrini, F. et al. Predicting disability progression in multiple sclerosis: insights from advanced statistical modeling. Mult. Scler. J. 26, 1828–1836 (2020).
Article Google Scholar
Seker, B. I. O. et al. Prognostic models for predicting clinical disease progression, worsening and activity in people with multiple sclerosis. Cochrane Database Syst. Rev. 2020, CD013606 (2020).
Google Scholar
Sherman, R. E. et al. Real-world evidence—what is it and what can it tell us? N. Engl. J. Med. 375, 2293–2297 (2016).
Article PubMed Google Scholar
Brown, F. S. et al. Systematic review of prediction models in relapsing remitting multiple sclerosis. PLoS ONE 15, 1–13 (2020).
Article Google Scholar
Havas, J. et al. Predictive medicine in multiple sclerosis: a systematic review. Mult. Scler. Relat. Disord. 40, 101928 (2020).
Article PubMed Google Scholar
Seccia, R. et al. Machine learning use for prognostic purposes in multiple sclerosis. Life 11, 122 (2021).
Article PubMed PubMed Central Google Scholar
Hartmann, M., Fenton, N. & Dobson, R. Current review and next steps for artificial intelligence in multiple sclerosis risk research. Comput. Biol. Med. 132, 104337 (2021).
Article PubMed Google Scholar
Brouwer, E. D. et al. Longitudinal modeling of MS patient trajectories improves predictions of disability progression. Comput. Methods. Prog. Biomed. 208, 106180 (2020).
De Brouwer, E. et al. Machine-learning-based prediction of disability progression in multiple sclerosis: an observational, international, multi-center study. PLOS Digit. Health 3, 1–25 (2024).
Article Google Scholar
Pirmani, A. et al. The journey of data within a global data sharing initiative: a federated 3-layer data analysis pipeline to scale up multiple sclerosis research. JMIR Med. Inform. 11, e48030 (2023).
Article PubMed PubMed Central Google Scholar
Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13, 395–405 (2012).
Article CAS PubMed Google Scholar
Wu, J., Roy, J. & Stewart, W. F. Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Med. Care 48, S106–S113 (2010).
Article PubMed Google Scholar
Weiskopf, N. G. & Weng, C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J. Am. Med. Inform. Assoc. 20, 144–151 (2013).
Article PubMed PubMed Central Google Scholar
Wilkinson, M. D. et al. The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Article PubMed PubMed Central Google Scholar
Antunes, R. S., André da Costa, C., Küderle, A., Yari, I. A. & Eskofier, B. Federated learning for healthcare: systematic review and architecture proposal. ACM Trans. Intell. Syst. Technol. 13, 1–23 (2022).
Article Google Scholar
Xu, J. et al. Federated learning for healthcare informatics. J. Healthc. Inform. Res. 5, 1–19 (2021).
Article PubMed Google Scholar
Li, S. et al. Federated and distributed learning applications for electronic health records and structured medical data: a scoping review. J. Am. Med. Inform. Assoc. 30, 2041–2049 (2023).
Article PubMed PubMed Central Google Scholar
Rieke, N. et al. The future of digital health with federated learning. NPJ Digit. Med. 3, 119 (2020).
Article PubMed PubMed Central Google Scholar
Brisimi, T. S. et al. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inform. 112, 59–67 (2018).
Article PubMed PubMed Central Google Scholar
Yin, X., Zhu, Y. & Hu, J. A comprehensive survey of privacy-preserving federated learning: a taxonomy, review, and future directions. ACM Comput. Surv. 54, 1–36 (2021).
Article Google Scholar
Wang, W. et al. A privacy preserving framework for federated learning in smart healthcare systems. Inf. Process. Manag. 60, 103167 (2023).
Article Google Scholar
Truex, S. et al. A hybrid approach to privacy-preserving federated learning. Inform. Spektrum 42, 356–357 (2019).
Article Google Scholar
Donkada, S. et al. Uncovering promises and challenges of federated learning to detect cardiovascular diseases: a scoping literature review (2023).
Yi, L. et al. Su-net: an efficient encoder-decoder model of federated learning for brain tumor segmentation. In Artificial Neural Networks and Machine Learning - ICANN 2020: 29th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 15–18, 2020, Proceedings, Part I, 761–773. https://doi.org/10.1007/978-3-030-61609-0_60 (Springer-Verlag, Berlin, Heidelberg, 2020).
Oldenhof, M. et al. Industry-scale orchestrated federated learning for drug discovery. Proc. AAAI Conf. Artif. Intell. 37, 15576–15584 (2024).
Google Scholar
Liu, D. et al. Multiple sclerosis lesion segmentation: revisiting weighting mechanisms for federated learning. Front. Neurosci. 17, 1167612 (2023).
Article PubMed PubMed Central Google Scholar
Denissen, S. et al. Towards multimodal machine learning prediction of individual cognitive evolution in multiple sclerosis. J. Pers. Med. 11, 1349 (2021).
Article PubMed PubMed Central Google Scholar
Denissen, S. et al. Transfer learning on structural brain age models to decode cognition in MS: a federated learning approach. medRxiv. https://www.medrxiv.org/content/early/2023/04/26/2023.04.22.23288741 (2023).
Tan, A. Z., Yu, H., Cui, L. & Yang, Q. Towards personalized federated learning. IEEE Trans. Neural Netw. Learn. Syst. 34, 9587–9603 (2023).
Article PubMed Google Scholar
Butzkueven, H. et al. Msbase: an international, online registry and platform for collaborative outcomes research in multiple sclerosis. Mult. Scler. J. 12, 769–774 (2006).
Article CAS Google Scholar
Kalincik, T. & Butzkueven, H. The MSBase registry: informing clinical practice. Mult. Scler. J. 25, 1828–1834 (2019).
Article Google Scholar
McMahan, H. B., Moore, E., Ramage, D., Hampson, S. & y Arcas, B. A. Communication-efficient learning of deep networks from decentralized data https://arxiv.org/abs/1602.05629 (2023).
Li, T., Sanjabi, M., Zaheer, M., Talwalkar, A. & Smith, V. On the convergence of federated optimization in heterogeneous networks (2018).
Reddi, S. J. et al. Adaptive federated optimization. In Proc. International Conference on Learning Representations https://openreview.net/forum?id=LkFG3lB13U5 (2021).
Biewald, L. Experiment tracking with weights and biases https://www.wandb.com/. Software available from wandb.com (2020).
Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks. In: Proc. 34th International Conference on Machine Learning - Volume 70, ICML’17, 1321–1330 (JMLR, 2017).
Davis, S. E., Greevy, R. A., Lasko, T. A., Walsh, C. G. & Matheny, M. E. Detection of calibration drift in clinical prediction models to inform model updating. J. Biomed. Inform. 112, 103611 (2020).
Article PubMed PubMed Central Google Scholar
Arivazhagan, M. G., Aggarwal, V., Singh, A. K. & Choudhary, S. Federated learning with personalization layers https://arxiv.org/abs/1912.00818 (2019).
PyTorch. ReduceLROnPlateau—PyTorch 2.6 documentation. https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html. Accessed 28 Mar 2025 (2019).
Khodak, M. et al. Federated hyperparameter tuning: challenges, baselines, and connections to weight-sharing https://arxiv.org/abs/2106.04502 (2021).
Zawad, S. & Yan, F. Hyperparameter tuning for federated learning—systems and practices. In Nguyen, L. M., Hoang, T. N. & Chen, P.-Y. (eds.) Federated Learning, 219–235. https://www.sciencedirect.com/science/article/pii/B9780443190377000211 (Academic Press, 2024).
Zhang, H. et al. Federated learning hyperparameter tuning from a system perspective. IEEE Internet Things J. 10, 14102–14113 (2023).
Article Google Scholar
Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? In: Proc. 28th International Conference on Neural Information Processing Systems—Volume 2, NIPS’14, 3320–3328 (MIT Press, 2014).
Houlsby, N. et al. Parameter-efficient transfer learning for NLP. ArXivabs/1902.00751. https://doi.org/10.48550/arXiv.1902.00751 (2019).
Howard, J. & Ruder, S. Universal language model fine-tuning for text classification. In Fine-tuning, 328–339 (2018).
Kairouz, P. et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 14, 1–210 (2021).
Article Google Scholar
Li, T., Sahu, A. K., Talwalkar, A. & Smith, V. Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37, 50–60 (2020).
CAS Google Scholar
Pirmani, A., Moreau, Y. & Peeters, L. M. Unlocking the power of real-world data: a framework for sustainable healthcare. Stud. Health Technol. Inform. 316, 1582–1583 (2024).
PubMed Google Scholar
EU. European Health Data Space regulation (EHDS). https://health.ec.europa.eu/ehealth-digital-health-and-care/european-health-data-space-regulation-ehds_en. Accessed: 26 Mar 2025.
Auffray, C. et al. Making sense of big data in health research: Towards an EU action plan. Genome Med. 8, 71 (2016).
Article PubMed PubMed Central Google Scholar
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): the tripod statement. BMC Med. 13, 1 (2015).
Article PubMed PubMed Central Google Scholar
Peeters, L. M. et al. Covid-19 in people with multiple sclerosis: a global data sharing initiative. Mult. Scler. J. 26, 1157–1162 (2020).
Article CAS Google Scholar
Heyndrickx, W. et al. Melloddy: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J. Chem. Inf. Model. 64, 2331–2344 (2024).
Article CAS PubMed Google Scholar
Andorra, M. et al. Predicting disease severity in multiple sclerosis using multimodal data and machine learning. J. Neurol. 271, 1133–1149 (2024).
Article CAS PubMed Google Scholar
Dwork, C. Differential privacy. In Bugliesi, M., Preneel, B., Sassone, V. & Wegener, I. (eds.) Automata, Languages and Programming, 1–12 (Springer Berlin Heidelberg, 2006).
Goldreich, O. Secure multi-party computation. Manuscript. Preliminary version 78 (1998).
López-Blanco, R., Alonso, R. S., González-Arrieta, A., Chamoso, P. & Prieto, J. Federated learning of explainable artificial intelligence (FED-XAI): a review. In Ossowski, S. et al. (eds.) Distributed Computing and Artificial Intelligence, 20th International Conference, 318–326 (Springer Nature, 2023).
Choi, A. et al. A novel deep learning algorithm for real-time prediction of clinical deterioration in the emergency department for a multimodal clinical decision support system. Sci. Rep. 14, 30116 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kalincik, T. et al. Towards personalized therapy for multiple sclerosis: prediction of individual treatment response. Brain 140, 2426–2443 (2017).
Article PubMed Google Scholar
Leray, E. et al. Evidence for a two-stage disability progression in multiple sclerosis. Brain 133, 1900–1913 (2010).
Article PubMed PubMed Central Google Scholar
Andersson, P. B., Waubant, E., Gee, L. & Goodkin, D. E. Multiple sclerosis that is progressive from the time of onset: clinical characteristics and progression of disability. Arch. Neurol. 56, 1138–1142 (1999).
Article CAS PubMed Google Scholar
Liu, M., Qi, Y., Wang, W. & Sun, X. Toward a better understanding about real-world evidence. Eur. J. Hosp. Pharm. 29, 8–11 (2022).
Article CAS PubMed Google Scholar
Pirmani, A., Oldenhof, M., Peeters, L. M., De Brouwer, E. & Moreau, Y. Accessible ecosystem for clinical research (federated learning for everyone): development and usability study. JMIR Form. Res. 8, e55496 (2024).
Article PubMed PubMed Central Google Scholar
Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In: Proc. 33rd International Conference on Neural Information Processing Systems (Curran Associates Inc., 2019).
Beutel, D. J. et al. Flower: a friendly federated learning research framework https://doi.org/10.48550/arXiv.2007.14390 (2020).

Download references

Acknowledgments

A.Pirmani., Y.M. and M.O. are funded by (1) VLAIO PM: Augmenting Therapeutic Effectiveness through Novel Analytics (HBC.2019.2528); Research Council KU Leuven: Symbiosis 4 (C14/22/125), Symbiosis3 (C14/18/092; CELSA—Active Learning (CELSA/21/019). Y.M., M.O., and A.Pirmani. are affiliated to Leuven.AI and received funding from the Flemish Government (AI Research Program, FWO SBO (S003422N), and ELIXIR Belgium (I002819N)). E.D.B. and A.Passiemers. was funded by a FWO-SB grant. R.G. received honoraria as consultant on scientific advisory boards or as speaker from Biogen, Celgene-BMS, Janssen, Merck, Novartis, Roche, Sanofi-Genzyme, Sandoz, Teva and Viatris. B.W. received consultancy and advisory board fees from Roche, Sanofi-Genzyme, Biogen, Merck-Serono, Bayer-Schering, Novartis and Allergan; received congress support from Biogen, Merck-Serono, Teva and Roche. She has also received research support from Novartis, Biogen, Roche, FWO (Research Foundation Flanders) and Fonds D.V. (Ligue Nationale Belge de la Sclerose en Plaques, Fondation Roi Baudouin). D.H .was supported by the Charles University: Cooperatio Program in Neuroscience, by the project National Institute for Neurological Research (Program EXCELES, ID Project No. LX22NPO5107)—Funded by the European Union—Next Generation EU, and by General University Hospital in Prague project MH CZ-DRO-VFN64165.She also received compensation for travel, speaker honoraria and consultant fees from Biogen Idec, Novartis, Merck, Bayer, Sanofi Genzyme, Roche, and Teva, as well as support for research activities from Biogen Idec. E.K.H. received travel and/or consultancy compensation from Sanofi-Genzyme, Roche, Teva, Merck, Novartis, Celgene, Biogen. F.P. received personal compensation for serving on advisory board by Almirall, Alexion, Biogen, Bristol, Janssen, Merck, Novartis and Roche. He further received research grant by Alexion, Almirall, Biogen, Bristol, Merck, Novartis and Roche and by FISM, Reload Association (Onlus), Italian Health Minister, and University of Catania.AL received speakers fees and travel grants from Novartis, Biogen, T’évalua, Sanofi. V.T. served on scientific advisory boards for Roche, Janssen, Sanofi-Genzyme, Novartis and Merck, received conference fee and travel support from Novartis, Biogen, Sanofi-Genzyme, Teva, Abbvie and Merck and received educational event support from Novartis. E.C. received honoraria/research support from Biogen, Merck Serono, Novartis, Roche, and Teva; has been member of advisory boards for Actelion, Biogen, Celgene, Merck Serono, Novars, and Sanofi Genzyme; received honoraria/research support from Biogen, Merck Serono, Novartiss, Roche, and Teva; has been member of advisory boards for Actelion, Biogen, Celgene, Merck Serono, Novartis, and Sanofi Genzyme; and has been supported by the Czech Ministry of Education—project Cooperatio LF1, research area Neuroscience, and the project National Institute for Neurological Research (Program EXCELES, ID project No LX22NPO5107)—funded by the European Union-Next Generation EU. I.R. received speaking/consulting fees and/or travel funding from Almirall, Biogen, Bristol Myers Squibb, Janssen, Merck, Novartis, Roche, Sanofi-Genzyme and Teva. C.B. received speaking honoraria from Biogen, Novartis, Sanofi, Merck, Roche, Almirall and Teva. R.A. has received speaker honoraria and consultant fees from Biogen Idec, Novartis, Merck, Janssen, Bristol-Meyers, Bayer, Sanofi Genzyme, Roche and Teva. K.B. has no disclosures to declare. C.S. received personal compensation for consulting, serving on a scientific advisory board, speaking or other activities from Alexion, Amgen/Horizon, Biogen, Bristol Myers Squibb, Janssen/Johnson\&Johnson, Merck Serono, Novartis, Roche and Sanofi/Genzyme, and her institutions received research grants from Novartis, Roche and Sanofi/GenzymeOG has received consultation and speaker fees, travel grants and research support from: Biogen, Sanofi Genzyme, Merck, Novartis, Roche, Alexion, Viatris, Janssen, Bristol Myers Squibb, Almirall, Lundbeck. A.S. has served in advisory boards for Novartis, EMD Serono, Roche, Biogen idec, Sanofi Genzyme, Pendopharm and has received grant support from Genzyme and Roche, has received research grants for his institution from Biogen idec, Sanofi Genzyme, EMD Serono. J.K. has no disclosures. J.L.S.M. served on scientific advisory boards or as a consultant for MS International Federation and World Health Organisation, Therapeutic Goods Administration, BMS, Roche, Janssen, Genzyme, Novartis, Merck and Biogen, received conference travel support and/or speaker honoraria from WebMD Global, Merck, Sandoz, Novartis, Biogen, Roche, Eisai, Genzyme, Teva and BioCSL and received research or educational event support from Biogen, Novartis, Genzyme, Roche, Celgene and Merck. D.S. received speaking honoraria from Biogen, Novartis, Merck and Teva. T.C. served on scientific advisory boards, received conference travel support and/or speaker honoraria from Roche, Novartis, Merck and Biogen. I.R. is supported by MS Australia and the Trish MS Research Foundation. BVM received conference travel support from Biogen, Novartis, Bayer-Schering, Merck and Teva; has participated in clinical trials by Sanofi Aventis, Roche and Novartis. R.A. received honoraria as a speaker and for serving on scientific advisory boards from Bayer, Biogen, GSK, Merck, Novartis, Roche and Sanofi-Genzyme.VVPreceived honoraria as consultant on scientific advisory boards by Biogen, Bayer-Schering, Merck, Teva and Sanofi-Aventis; has received research grants by Biogen, Bayer-Schering, Merck, Teva and Novartis. N.J. received speaker honoraria and/or education support from Biogen, Teva, Novartis, Genzyme-Sanofi, Roche, Merck and Alexion; has been a member of advisory boards for Merck and Biogen. D.M. received honoraria and consulting fees from Bayer Schering, Novartis, Merck, Biogen and Genzyme. B.W.G. received travel grants from Novartis, Bayer-Schering, Merck and Teva; has participated in clinical trials by Sanofi Aventis, Roche and Novartis. G.L. travel compensation from Novartis, Biogen, Roche and Merck. Her institution receives the honoraria for talks and advisory board commitment as well as research grants from Biogen, Merck, Roche and Novartis. R.G. has received research grant and / or advisory board honoraria from Biogen, Hikma, Merck, Roche and Sanofi. A.A. accepted travel compensation from Novartis, Biogen, Genzyme, Teva, and speaking honoraria from Biogen, Novartis, Genzyme and Teva. A.A.A. has received a MENACTRIMS clinical fellowship grant (2020). A.V.D.W. received honoraria or research funding from Biogen, Genzyme, Novartis, Teva Neurosciences, and ATARA Pharmaceuticals.Helmut Butzkueven served on scientific advisory boards for Merck, Genzyme, Almirall, and Biogen; received honoraria and travel grants from Sanofi Aventis, Novartis, Biogen, Merck, Genzyme and Teva. B.T. received compensation for serving on IDMC for Biogen TAH has received travel grants from Merck Healthcare KGaA (Darmstadt, Germany), Biogen, Sanofi, Bristol Meyer Squibb, Almirall, Roche and Eisai. His institution has received research grants and consultancy fees from Roche, Biogen, Sanofi, Merck Healthcare KGaA (Darmstadt, Germany), Bristol Meyer Squibb, Janssen, Almirall, Novartis Pharma, Alexion, Neuraxpharm and Eisai. C.R. received speaker fees, research support, travel support, and/or served on advisory boards by Swiss MS Society, Swiss National Research Foundation (320030\_189140/1), University of Basel, Progressive MS Alliance, Alnylam, Bayer, Biogen, Bristol Myers Squibb, Celgene, Immunic, Merck, Neurogenesis, Novartis, Octave Bioscience, Quanterix, Roche, Sanofi, Stata DX. O.G. accepted travel compensation from Novartis, Merck and Biogen, speaking honoraria from Biogen, Novartis, Sanofi, Merck, Almirall, Bayer and Teva and has participated in clinical trials by Biogen, Merck and Roche. D.D. received honoraria as a consultant on scientific advisory boards by Bayer-Schering, Novartis and Sanofi-Aventis and compensation for travel from Novartis, Biogen, Sanofi-Aventis, Teva and Merck. A.G.K. received research funding, speaker honoraria and compensation for travel from and served as a consultant on advisory board for Bayer-Schering, Teva, Biogen, Merck, Genzyme and Novartis. M.F.P. has no relevant disclosure. M.S. received speaker honoraria/conference travel support from Biogen, Merck, Novartis, Roche, Sanofi-Aventis and Teva. N.S. has received honoraria as a speaker from Biogen, Merck, Novartis and Sanofi-Genzyme and for serving on scientific advisory boards from Novartis. B.S. Richard Macdonell or his institution have received remuneration for his speaking engagements, advisory board memberships, research and travel from Biogen, Merck, Genzyme, Bayer, Roche, Teva, Novartis, CSL, BMS, MedDay and NHMRC. M.C. NAJ is a PI on commercial MS studies sponsored by Novartis, Roche and Sanofi. He has received speaker’s honoraria and consultancy fees from Merck. He has had conference travel and registration reimbursement; and consultancy fees from Novartis. PM received speaker honoraria for Advisory Board and travel grants from Alexion, Almirall, Bayer, Biogen, Bristol Myers Squibb, Merck, Novartis, Roche, Sanofi-Genzyme, and Teva. P.L. received conference travel support from Novartis, Teva, Biogen, Bayer and Merck and has participated in a clinical trials by Biogen, Novartis, Teva and Actelion. C.A.S. served as a consultant for Biogen, EMD Serono, Novartis, Genentech, Celgene/Bristol Meyers Squibb, Sanofi Genzyme, Bayer, Janssen, Labcorp, Horizon and SANA. Dr. Weinstock-Guttman also has received grant/research support from Novartis, Biogen, Horizon/Amgen. She serves in the editorial board for Children, CNS Drugs, MS International, Journal of Neurology Frontiers Epidemiology. S.H. has received research funding from the National Health and Medical Research Council (NHMRC, Australia), the Petre Foundation, the Brain Foundation, the Royal Australasian College of Physicians, and the University of Sydney. She is supported by an NHMRC Investigator Grant (GNT2008339). She serves as a consultant on the International Steering Committee for a clinical trial led by UCB (NCT05063162). She is on the advisory board for educational activities led by Limbic Neurology. She has been an invited speaker for educational/research sessions coordinated by Biogen, Alexion, Novartis, Excemed and Limbic Neurology. She is on the medical advisory board (non-remunerated positions) of The MOG Project and the Sumaira Foundation.

Author information

Authors and Affiliations

STADIUS, ESAT, KU Leuven, Leuven, Belgium
Ashkan Pirmani, Edward De Brouwer, Ádám Arany, Martijn Oldenhof, Antoine Passemiers & Yves Moreau
Biomedical Research Institute, Hasselt University, Hasselt, Belgium
Ashkan Pirmani, Axel Faes & Liesbet M. Peeters
Data Science Institute, Hasselt University, Hasselt, Belgium
Ashkan Pirmani, Axel Faes & Liesbet M. Peeters
University Multiple Sclerosis Center, Hasselt University, Hasselt, Belgium
Ashkan Pirmani, Axel Faes & Liesbet M. Peeters
Department of Neurology, Neuroimmunology Centre, Royal Melbourne Hospital, Melbourne, VIC, Australia
Tomas Kalincik & Izanne Roos
Medical Point Hospital, Izmir University of Economics, Izmir, Turkey
Serkan Ozakbas
Department of Neurology, LR 18SP03, Clinical Investigation Centre Neurosciences and Mental Health, Razi University Hospital, Tunis, Tunisia
Riadh Gouider
Department of Neurology, Antwerp University Hospital, Edegem, Belgium
Barbara Willekens
University of Antwerp, Antwerp, Belgium
Barbara Willekens
Department of Neurology and Center of Clinical Neuroscience, First Faculty of Medicine, Charles University in Prague and General University Hospital, Prague, Czechia
Dana Horakova & Eva Kubala Havrdova
Department of Medical and Surgical Sciences and Advanced Technologies, GF Ingrassia, Catania, Italy
Francesco Patti
CHUM and Universite de Montreal, Montreal, QC, Canada
Alexandre Prat
Dipartimento di Scienze Biomediche e Neuromotorie, Università di Bologna, Bologna, Italy
Alessandra Lugaresi
Institute for Advanced Biomedical Technologies (ITAB), Dept Neurosciences, Imaging and Clinical Sciences, University G. d’Annunzio of Chieti-Pescara, Chieti, Italy
Valentina Tomassini
CISSS Chaudière-Appalache, Levis, QC, Canada
Pierre Grammond
Neurology Unit, AST Macerata, Macerata, Italy
Elisabetta Cartechini
Department of Neurology, Medical Faculty, Karadeniz Technical University, Trabzon, Turkey
Cavit Boz
Division of Neurology, Department of Medicine, Amiri Hospital, Sharq, Kuwait
Raed Alroughani
Department NEUROFARBA, University of Florence, Florence, Italy
Maria Pia Amato
IRCCS Fondazione Don Carlo Gnocchi, Florence, Italy
Maria Pia Amato
Department of Neurosciences, Box Hill Hospital, Box Hill, VIC, Australia
Katherine Buzzard
Hunter Medical Research Institute, University Newcastle, Newcastle, NSW, Australia
Jeannette Lechner-Scott
Department of Neurology, Unidade Local de Saúde de São João, Porto, Portugal
Joana Guimarães
Department of Clinical Neurosciences and Mental Health, Faculty of Medicine of University of Porto, Porto, Portugal
Joana Guimarães
Neurology Unit, Galliera Hospital, Genova, Italy
Claudio Solaro
Academic MS Center Zuyd, Department of Neurology, Zuyderland Medical Center, Sittard-Geleen, The Netherlands
Oliver Gerlach
Bakirkoy Education and Research Hospital for Psychiatric and Neurological Diseases, Istanbul, Turkey
Aysun Soysal
Department of Neurology, University Hospital and University of Basel, Basel, Switzerland
Jens Kuhle
Department of Neurology, Galdakao-Usansolo University Hospital, Osakidetza-Basque Health Service, Galdakao, Spain
Jose Luis Sanchez-Menoyo
Azienda Ospedaliera di Rilievo Nazionale San Giuseppe Moscati Avellino, Avellino, Italy
Daniele Spitaleri
Faculty of Medicine, University of Debrecen, Debrecen, Hungary
Tunde Csepany
Noorderhart Hospitals, Rehabilitation & MS University MS Centre, Hasselt-Pelt, Belgium
Bart Van Wijmeersch
Nemocnice Jihlava, Jihlava, Czechia
Radek Ampapa
CSSS Saint-Jérôme, Saint-Jerome, QC, Canada
Julie Prevost
Nehme and Therese Tohme Multiple Sclerosis Center, American University of Beirut Medical Center, Beirut, Lebanon
Samia J. Khoury
Department of Neurology, Cliniques Universitaires Saint-Luc, Brussels, Belgium
Vincent Van Pesch
Department of Medicine, School of Clinical Sciences, Monash University, Clayton, VIC, Australia
Nevin John
Centro Sclerosi Multipla, UOC Neurologia, Azienda Opsedaliera per l’Emergenza Cannizzaro, Catania, Italy
Davide Maimone
Department of Neurology, Jacobs MS Center for Treatment and Research, New York, NY, USA
Bianca Weinstock-Guttman
Department of Neurology, Universitary Hospital Ghent, Ghent, Belgium
Guy Laureys
Department of Neurology, Royal Brisbane Hospital, Brisbane, QLD, Australia
Pamela McCombe
Service of Neurology, Center of Neuroimmunology, Hospital Clinic de Barcelona, Barcelona, Spain
Yolanda Blanco
Department of Neurology, School of Medicine and Koc University Research Center for Translational Medicine (KUTTAM), Koc University, Istanbul, Turkey
Ayse Altintas
College of Medicine & Health Sciences, Sultan Qaboos University, Al-Khodh, Oman
Abdullah Al-Asmi
Department of Neurology, Westmead Hospital, Sydney, NSW, Australia
Justin Garber
Department of Neurology, The Alfred Hospital, Melbourne, VIC, Australia
Anneke Van der Walt & Helmut Butzkueven
Groene Hart Ziekenhuis, Gouda, The Netherlands
Koen de Gans
Jahn Ferenc Teaching Hospital, Budapest, Hungary
Csilla Rozsa
Royal Hobart Hospital, Hobart, TAS, Australia
Bruce Taylor
Neurology Department, King Fahad Specialist Hospital-Dammam, Dammam, Saudi Arabia
Talal Al-Harbi
Department of Neurology and Stroke, BAZ County Hospital, Miskolc, Hungary
Attila Sas
Department of Neurology, University of Szeged, Szeged, Hungary
Cecilia Rajda
South Eastern HSC Trust, Belfast, UK
Orla Gray
AZ Alma Ziekenhuis, Damme, Belgium
Danny Decoo
Perron Institute for Neurological and Translational Science, Sir Charles Gairdner Hospital, The University of Western Australia, Perth, WA, Australia
William M. Carroll
Perron Institute, QEII Medical Centre, University of Western Australia, Nedlands, WA, Australia
Allan G. Kermode
Perron Institute for Neurological and Translational Science, The University of Western Australia, Perth, WA, Australia
Marzena Fabis-Pedrini
Christchurch Hospital, Christchurch, New Zealand
Deborah Mason
Neurology Unit, Hospital General Universitario de Alicante, Alicante, Spain
Angel Perez-Sempere
University of Medicine and Pharmacy Victor Babes Timisoara, Timisoara, Romania
Mihaela Simu
St Vincents Hospital Fitzroy, Melbourne, VIC, Australia
Neil Shuey
Bombay Hospital Institute of Medical Sciences, Mumbai, India
Bhim Singhal
Neurosciences Department, Mater Dei Hospital, Birkirkara, Malta
Marija Cauchi
Department of Neurology, Concord Repatriation General Hospital, Sydney, NSW, Australia
Todd A. Hardy
Translational Neuroimmunology Group, Kids Neuroscience Centre and Brain and Mind Centre, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
Sudarshini Ramanathan
Department of Clinical Neurosciences, Division of Neurology, Unit of Neuroimmunology, Geneva University Hospitals and Faculty of Medicine, Geneva, Switzerland
Patrice Lalive
Clinical Neurosciences Department, ‘Carol Davila’ University of Medicine and Pharmacy, Bucharest, Romania
Carmen-Adella Sirbu
Royal Victoria Hospital, Belfast, UK
Stella Hughes
Hospital Universitario Donostia and IIS Biodonostia, San Sebastián, Spain
Tamara Castillo Trivino

Authors

Ashkan Pirmani
View author publications
Search author on:PubMed Google Scholar
Edward De Brouwer
View author publications
Search author on:PubMed Google Scholar
Ádám Arany
View author publications
Search author on:PubMed Google Scholar
Martijn Oldenhof
View author publications
Search author on:PubMed Google Scholar
Antoine Passemiers
View author publications
Search author on:PubMed Google Scholar
Axel Faes
View author publications
Search author on:PubMed Google Scholar
Tomas Kalincik
View author publications
Search author on:PubMed Google Scholar
Serkan Ozakbas
View author publications
Search author on:PubMed Google Scholar
Riadh Gouider
View author publications
Search author on:PubMed Google Scholar
Barbara Willekens
View author publications
Search author on:PubMed Google Scholar
Dana Horakova
View author publications
Search author on:PubMed Google Scholar
Eva Kubala Havrdova
View author publications
Search author on:PubMed Google Scholar
Francesco Patti
View author publications
Search author on:PubMed Google Scholar
Alexandre Prat
View author publications
Search author on:PubMed Google Scholar
Alessandra Lugaresi
View author publications
Search author on:PubMed Google Scholar
Valentina Tomassini
View author publications
Search author on:PubMed Google Scholar
Pierre Grammond
View author publications
Search author on:PubMed Google Scholar
Elisabetta Cartechini
View author publications
Search author on:PubMed Google Scholar
Izanne Roos
View author publications
Search author on:PubMed Google Scholar
Cavit Boz
View author publications
Search author on:PubMed Google Scholar
Raed Alroughani
View author publications
Search author on:PubMed Google Scholar
Maria Pia Amato
View author publications
Search author on:PubMed Google Scholar
Katherine Buzzard
View author publications
Search author on:PubMed Google Scholar
Jeannette Lechner-Scott
View author publications
Search author on:PubMed Google Scholar
Joana Guimarães
View author publications
Search author on:PubMed Google Scholar
Claudio Solaro
View author publications
Search author on:PubMed Google Scholar
Oliver Gerlach
View author publications
Search author on:PubMed Google Scholar
Aysun Soysal
View author publications
Search author on:PubMed Google Scholar
Jens Kuhle
View author publications
Search author on:PubMed Google Scholar
Jose Luis Sanchez-Menoyo
View author publications
Search author on:PubMed Google Scholar
Daniele Spitaleri
View author publications
Search author on:PubMed Google Scholar
Tunde Csepany
View author publications
Search author on:PubMed Google Scholar
Bart Van Wijmeersch
View author publications
Search author on:PubMed Google Scholar
Radek Ampapa
View author publications
Search author on:PubMed Google Scholar
Julie Prevost
View author publications
Search author on:PubMed Google Scholar
Samia J. Khoury
View author publications
Search author on:PubMed Google Scholar
Vincent Van Pesch
View author publications
Search author on:PubMed Google Scholar
Nevin John
View author publications
Search author on:PubMed Google Scholar
Davide Maimone
View author publications
Search author on:PubMed Google Scholar
Bianca Weinstock-Guttman
View author publications
Search author on:PubMed Google Scholar
Guy Laureys
View author publications
Search author on:PubMed Google Scholar
Pamela McCombe
View author publications
Search author on:PubMed Google Scholar
Yolanda Blanco
View author publications
Search author on:PubMed Google Scholar
Ayse Altintas
View author publications
Search author on:PubMed Google Scholar
Abdullah Al-Asmi
View author publications
Search author on:PubMed Google Scholar
Justin Garber
View author publications
Search author on:PubMed Google Scholar
Anneke Van der Walt
View author publications
Search author on:PubMed Google Scholar
Helmut Butzkueven
View author publications
Search author on:PubMed Google Scholar
Koen de Gans
View author publications
Search author on:PubMed Google Scholar
Csilla Rozsa
View author publications
Search author on:PubMed Google Scholar
Bruce Taylor
View author publications
Search author on:PubMed Google Scholar
Talal Al-Harbi
View author publications
Search author on:PubMed Google Scholar
Attila Sas
View author publications
Search author on:PubMed Google Scholar
Cecilia Rajda
View author publications
Search author on:PubMed Google Scholar
Orla Gray
View author publications
Search author on:PubMed Google Scholar
Danny Decoo
View author publications
Search author on:PubMed Google Scholar
William M. Carroll
View author publications
Search author on:PubMed Google Scholar
Allan G. Kermode
View author publications
Search author on:PubMed Google Scholar
Marzena Fabis-Pedrini
View author publications
Search author on:PubMed Google Scholar
Deborah Mason
View author publications
Search author on:PubMed Google Scholar
Angel Perez-Sempere
View author publications
Search author on:PubMed Google Scholar
Mihaela Simu
View author publications
Search author on:PubMed Google Scholar
Neil Shuey
View author publications
Search author on:PubMed Google Scholar
Bhim Singhal
View author publications
Search author on:PubMed Google Scholar
Marija Cauchi
View author publications
Search author on:PubMed Google Scholar
Todd A. Hardy
View author publications
Search author on:PubMed Google Scholar
Sudarshini Ramanathan
View author publications
Search author on:PubMed Google Scholar
Patrice Lalive
View author publications
Search author on:PubMed Google Scholar
Carmen-Adella Sirbu
View author publications
Search author on:PubMed Google Scholar
Stella Hughes
View author publications
Search author on:PubMed Google Scholar
Tamara Castillo Trivino
View author publications
Search author on:PubMed Google Scholar
Liesbet M. Peeters
View author publications
Search author on:PubMed Google Scholar
Yves Moreau
View author publications
Search author on:PubMed Google Scholar

Contributions

A.Pirmani. conceived the study, performed the experiments, analyzed the data, and led the manuscript writing. E.D.B., L.M.P., and Y.M. coordinated and supervised the study design, contributed to result interpretation, and critically reviewed the manuscript. M.O., A.A., A.Passiemers., and A.F. participated in data analysis and manuscript revision. Authors from T.K. to T.C.T. (the MSBase Study Group) contributed data. All authors reviewed and approved the final manuscript for submission.

Corresponding author

Correspondence to Yves Moreau.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Pirmani, A., De Brouwer, E., Arany, Á. et al. Personalized federated learning for predicting disability progression in multiple sclerosis using real-world routine clinical data. npj Digit. Med. 8, 478 (2025). https://doi.org/10.1038/s41746-025-01788-8

Download citation

Received: 13 February 2025
Accepted: 10 June 2025
Published: 24 July 2025
DOI: https://doi.org/10.1038/s41746-025-01788-8

Subjects

Abstract

Similar content being viewed by others

Estimating individual treatment effect on disability progression in multiple sclerosis using deep learning

Patient's perspective in clinical practice to assess and predict disability in multiple sclerosis

Prediction of disease progression and outcomes in multiple sclerosis with machine learning

Introduction

Results

Centralized superiority in overall performance, bridged by personalized federated learning

Federated models struggle against centralized model at the client level

Local models capture specific client insights but lack generalization

Personalization enhances client-level performance in federated learning

Federated vs. local models: impact of dataset size

Performance trends in the largest data-contributing countries

Discussion

Methods

Cohort definition and episode extraction

Predicting disability progression

Federated model

Personalized federated learning

Algorithm 1

Local model

Centralized model

Evaluation method

Metrics

Ethics declarations

Data availability

Code availability

References

Acknowledgments

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links