FedOcw: optimized federated learning for cross-lingual speech-based Parkinson’s disease detection

Quan, Changqin; Chen, Zhonglue; Ren, Kang; Luo, Zhiwei

doi:10.1038/s41746-025-01763-3

Download PDF

Article
Open access
Published: 13 June 2025

FedOcw: optimized federated learning for cross-lingual speech-based Parkinson’s disease detection

Changqin Quan¹,
Zhonglue Chen²,
Kang Ren² &
…
Zhiwei Luo¹

npj Digital Medicine volume 8, Article number: 357 (2025) Cite this article

3610 Accesses
3 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Accurate detection of Parkinson’s disease (PD) through speech analysis holds great promise for early diagnosis and improved patient management. However, developing robust machine learning models is challenging due to the decentralized nature of medical data and the substantial heterogeneity in multilingual PD speech datasets. Conventional federated learning (FL) methods struggle in these heterogeneous, non-independent and identically distributed (non-IID) environments, where differences in data distributions arise from variations in language, speech content, recording conditions, medical measurement techniques, and dataset sizes. To address these challenges, we propose FedOcw, an optimized FL framework designed to enhance cross-lingual knowledge transfer and improve convergence stability. Through extensive multilingual experiments, we demonstrate that FedOcw consistently outperforms traditional FL models by achieving superior diagnostic accuracy while ensuring adaptive and equitable weight distribution across clients. These findings highlight FedOcw as an effective FL solution for privacy-preserving, speech-based PD detection across diverse linguistic and institutional settings.

Detection of Parkinson disease using multiclass machine learning approach

Article Open access 15 June 2024

Parkinson’s disease detection based on features refinement through L1 regularized SVM and deep neural network

Article Open access 16 January 2024

ParkMAE: a cross-linguistic masked autoencoder framework for robust Parkinson’s disease detection from speech

Article Open access 10 December 2025

Introduction

Parkinson’s disease (PD), one of the most prevalent neurodegenerative disorders, is projected to affect over 12 million individuals globally by 2040, driven by an aging population¹. Speech impairments are among the earliest and most prevalent symptoms of PD, with 89% of patients exhibiting vocal disorders, 45% experiencing articulatory impairments, and 20% suffering from fluency issues, as reported by Logeman et al.². The perceptual, acoustic, and kinematic characteristics of PD-related speech deterioration have been extensively documented^3,4,5,6, underscoring the potential for speech-based diagnostic tools to enhance early detection and disease management.

Recent advances in machine learning (ML) have significantly contributed to the development of automated PD diagnosis from speech signals. Traditional ML approaches, such as Support Vector Machines^7,8,9, K-Nearest Neighbors^9,10, Decision Trees¹¹, Naïve Bayes (NB)¹¹, Genetic Algorithms¹², and Gaussian Process Classification¹³, typically rely on hand-engineered speech features, including Mel-frequency cepstral coefficients (MFCC), pitch, jitter, and shimmer, to distinguish PD patients from healthy controls. More recently, deep learning (DL) models, such as multi-layer perceptrons (MLPs), convolutional neural networks (CNNs), and long short-term memory (LSTM) networks, have shown improved performance by automatically extracting salient patterns from input features. Building on this, end-to-end architectures based on CNNs and recurrent neural networks have demonstrated the ability to capture complex temporal and spectral characteristics directly from raw audio^14,15. Current research has increasingly focused on more robust and generalizable approaches for speech-based Parkinson’s disease detection, particularly self-supervised speech encoders and Transformer-based architectures. These foundational models have demonstrated superior performance compared to traditional methods¹⁶. In parallel, model interpretability has become a critical focus, with efforts to elucidate the inner workings of these deep models and their alignment with clinical speech markers¹⁷.

Despite these advancements, the performance of ML models is highly dependent on the availability of large and diverse training datasets. However, medical speech data is often decentralized across institutions, with significant variations in measurement techniques, dataset sizes, and linguistic content. The privacy-sensitive nature of medical data further complicates data sharing, limiting the potential for robust model training. Federated learning (FL) presents a compelling solution by enabling collaborative model training across institutions without centralizing patient data. FL has demonstrated success in medical applications, including brain anomaly detection¹⁸, COVID-19 diagnosis^19,20, breast tumor classification^21,22,23, and predicting high-risk gastric cancer recurrence²⁴, as well as in biomedical natural language processing²⁵. Specifically for PD detection, FL has been explored in functional MRI-based studies²⁶, and its feasibility has been validated for speech-based FL models across multiple institutions while preserving patient privacy²⁷.

However, conventional FL approaches face substantial challenges in heterogeneous (non-IID) data environments, which are particularly pronounced in multilingual PD speech datasets. Existing FL methods, such as Federated Averaging (FedAvg)²⁸, aggregate local models by averaging client updates, often leading to suboptimal generalization when data distributions vary significantly. To address statistical heterogeneity, FedProx²⁹ introduces a proximal term that constrains local updates, but it does not provide personalized solutions tailored to individual clients³⁰. Besides, Local customization methods offer an alternative solution by customizing a well-trained global model. The global model undergoes local fine-tuning by incorporating the private data of each client to create personalized models for those clients^31,32. Alternative strategies, including Scaffold³³ (variance reduction for gradient stabilization) and FedNova³⁴ (adaptive model selection), have sought to improve FL robustness, yet they remain limited in their ability to handle both statistical and linguistic diversity.

Some recent studies have explored pairwise collaboration strategies in FL. Huang et al.³⁵ introduced federated attentive message passing to facilitate collaboration among clients with similar data, while Smith et al.³⁶ modeled pairwise collaboration by extending distributed multi-task learning to FL. However, these methods struggle when data exhibits both statistical heterogeneity and variations in linguistic features, making it difficult to form effective collaboration groups.

Real-world multilingual PD speech datasets exhibit high variability in medical measurement techniques, speech content, and language structure. Several studies have demonstrated that PD detection performance can vary significantly depending on speech input. For instance, an LSTM model achieved 88.08% classification accuracy with sentence-based speech data but only 73.52% with the sustained vowel sound ‘/a/’³⁷. Similarly, an end-to-end deep learning model trained on a dataset of Chinese short sentences achieved only 49.4% accuracy when tested on a Spanish dataset³⁸, highlighting the critical impact of cross-lingual generalization on model performance. Moreover, Botelho et al.³⁹ emphasized that performance discrepancies may arise not only from linguistic differences but also from technical factors such as variations in recording conditions or equipment.

While FL offers a promising solution for privacy-preserving PD diagnosis, many existing studies fail to adequately address the challenges introduced by heterogeneous and cross-lingual data distributions, particularly prevalent in multilingual speech datasets. To address this gap, this study introduces FedOcw, a dynamic optimization-based aggregation framework that enables client nodes to develop customized models that adapt to their local datasets. This targeted optimization enables more effective knowledge transfer across linguistically and clinically diverse datasets, enhancing the robustness and accuracy of cross-lingual PD detection.

Additionally, we integrate an end-to-end deep learning model that combines time-distributed 2D convolutional neural networks (2D-CNNs) and 1D convolutional neural networks (1D-CNNs). This architecture is designed to capture both temporal and spatial features from speech data, enhancing model robustness for PD detection across linguistic and institutional variations.

Our study seeks to address the challenges posed by data heterogeneity in federated learning for speech-based Parkinson’s disease detection through the proposed FedOcw framework. Specifically, we aim to (1) evaluate FedOcw’s effectiveness in enhancing Parkinson’s disease detection across diverse, multilingual, and institutionally heterogeneous datasets; (2) investigate the impact of dynamically optimized client weighting on the stability and efficiency of the global federated learning process; and (3) analyze the statistical properties and linguistic diversity of client datasets that influence their aggregation weights and contributions to global learning. By achieving these objectives, our work aims to advance the scalability, personalization, and adaptability of federated learning, supporting the development of privacy-preserving AI-driven diagnostic tools for Parkinson’s disease.

Results

To evaluate the effectiveness of our proposed federated learning framework for speech-based Parkinson’s disease detection, we utilized five multilingual datasets, incorporating Spanish, Italian, Chinese, Czech, and English speech samples. These datasets vary in recording conditions, linguistic structure, and phonetic tasks, providing a diverse and heterogeneous training environment that closely resembles real-world clinical scenarios.

Dataset-1 (Spanish), sourced from the PC-GITA repository⁴⁰, comprises speech recordings from 100 individuals, including 50 Parkinson’s disease (PD) patients and 50 healthy controls (HCs). All recordings were conducted in professional soundproof booths at 44.1 kHz sampling frequency with 16-bit resolution. The PD participants, aged 33 to 81 years, were evaluated in the ON state by three expert phoneticians. Speech samples included:

(1)
Sustained vowels: Three repetitions of /a/, /i/, /e/, /o/, and /u/.
(2)
Isolated words: /blusa/, /petaka/, /apto/, /campana/, /llueve/, /reina/, /braso/, and /viaje/.
(3)
Sentence reading: Simple (/laura/, /loslibros/, /luisa/, etc.) and complex (/preocupado/, /juan/, etc.) structures.
(4)
Spontaneous speech: Monologues (~44.86 s on average).

These tasks were designed to capture phonation, articulation, and prosody impairments, which are critical for detecting Parkinsonian dysarthria.

Dataset-2 (Italian) was originally developed to assess speech intelligibility in PD patients using automatic speech recognition systems⁴¹. This dataset⁴² includes 28 PD patients and 37 HCs, featuring recordings of:

(1)
Phonemically balanced text reading (twice, with a 30-s pause).
(2)
Repetitions of syllables (/pa/ and /ta/ for 5 s each).
(3)
Sustained vowels (/a/, /i/, /e/, /o/, /u/).
(4)
Phonemically balanced word and phrase reading.

Recordings were conducted in low-noise, echo-free environments, with microphones placed 15–25 cm from the speaker’s lips. All PD participants were receiving antiparkinsonian treatment.

Dataset-3 (Chinese), obtained from the GYENNO SCIENCE Parkinson’s Disease Research Center³⁷, consists of 30 PD patients and 15 HCs, aged 37 to 75 years. Speech tasks included:

(1)
Sustained vowels (/a/ and /e/).
(2)
Short sentence reading (e.g., /si shi si zhi shi shi zi/).

Speech samples were recorded using smartphones positioned 10 cm from the speaker’s mouth. All PD participants were assessed by two neurologists and recorded in the ON state.

Dataset-4 (Czech) was designed to differentiate idiopathic Parkinson’s disease from other parkinsonian syndromes via prolonged vowel analysis⁴³. The dataset includes 22 PD patients, alongside 21 patients with multiple system atrophy, 18 with progressive supranuclear palsy, and 22 HCs. For this study, we utilized data from PD patients and HCs only.

Recordings were performed using a headset condenser microphone (5 cm from the lips) at 48 kHz sampling frequency with 16-bit resolution. Participants were instructed to sustain vowels (/A/ and /I/) in a modal voice for as long and steadily as possible.

Dataset-5 (English), the MDVR-KCL dataset (Mobile Device Voice Recordings at King’s College London)⁴⁴, was developed to explore non-invasive Parkinson’s disease monitoring through smartphone-based voice analysis⁴⁵. This dataset includes 16 PD patients and 21 HCs, recorded using a Motorola Moto G4 smartphone at 44.1 kHz sampling frequency with 16-bit resolution.

Table 1 provides a summary of the demographic, clinical, and recording characteristics of participants, including the distribution of PD and HC groups by gender, age, disease severity, recording conditions, and speech tasks across Dataset-1 to Dataset-5. These datasets provide a comprehensive and multilingual foundation for evaluating federated learning models in Parkinson’s disease detection.

Table 1 Demographic, clinical, and recording characteristics of participants across Dataset-1 to Dataset-5

Full size table

To evaluate the effectiveness of the proposed federated learning approach on multilingual speech data, we define five experimental scenarios that integrate five datasets with varying language distributions and client allocations:

Scenario A (Fig. 1): Speech data from Dataset-1 (Spanish) and Dataset-2 (Italian) is distributed across eight clients (C0–C7), with an uneven distribution of Parkinson’s disease (PD) cases and healthy controls (HCs). Clients C0–C3 are assigned data from Dataset-1 (Spanish), while Clients C4–C7 receive data from Dataset-2 (Italian).

**Fig. 1: FedOcw for Parkinson’s disease detection from speech across five scenarios, showing data distributions and model performance.**

Scenario B (Fig. 1): Speech data from Dataset-1 (Spanish) and Dataset-3 (Chinese) is allocated to seven clients (C0–C6). Clients C0–C3 are assigned data from Dataset-1 (Spanish), while Clients C4–C6 are assigned data from Dataset-3 (Chinese).

Scenario C (Fig. 1): Speech data from Dataset-2 (Italian) and Dataset-3 (Chinese) is used, with seven clients (C0–C6). Clients C0–C3 receive data from Dataset-2 (Italian), and Clients C4–C6 are assigned data from Dataset-3 (Chinese).

Scenario D (Fig. 1): Speech data from Dataset-1 (Spanish), Dataset-2 (Italian), and Dataset-4 (Czech) is used. Clients C0–C3 are assigned Dataset-1 (Spanish), C4–C7 receive data from Dataset-2 (Italian), C8 is allocated Dataset-4 (Czech).

Scenario E (Fig. 1): All five datasets are incorporated for a comprehensive multilingual evaluation. Clients C0–C3 are assigned Dataset-1 (Spanish), C4–C7 receive Dataset-2 (Italian), C8–C10 are allocated Dataset-3 (Chinese), C11 is assigned Dataset-4 (Czech), and C12 receives Dataset-5 (English).

This experimental setup enables a comprehensive evaluation of the federated model’s generalization across linguistically diverse datasets. Each client was assigned speech samples from its respective dataset, which included a variety of task types such as sustained vowels, sentence reading, and spontaneous speech. A single model was trained per client using the entire local training dataset, without further partitioning based on individual speech tasks. This approach reflects real-world deployment conditions in federated learning, where heterogeneity in assessment protocols and data characteristics is common across different clinical sites.

To promote robust generalization, all speech samples were partitioned into non-overlapping training and test sets. Training data remained strictly localized on each edge client, while evaluation was independently performed on each client’s separate test node. Importantly, no speaker overlap existed across clients’ training sets, strengthening the model’s ability to generalize across languages and participants. For final evaluation, each client was tested on its corresponding test node using local testing samples, providing a comprehensive assessment of the model’s cross-lingual performance.

Regarding the choice of languages for the bilingual experiments, Spanish, Italian, and Chinese were prioritized due to the availability of well-balanced datasets with large sample sizes and a diverse set of speech tasks. These characteristics provided a robust and heterogeneous foundation for evaluating cross-lingual generalization. In contrast, the English and Czech datasets, while valuable, had comparatively smaller sample sizes and fewer speech tasks, limiting their suitability for the bilingual scenarios. Instead, English and Czech were incorporated in Scenarios D and E to further explore the impact of increasing language diversity on model performance.

Figure 1 provides a circular visualization of client distributions, including sample sizes, case-control ratios (shown as bar plots), and the number of participants (indicated in brackets). Percentages represent each client’s relative contribution to the overall training dataset.

In Fig. 1, the box plots present the evaluation results over 100 rounds of federated aggregation, capturing performance across accuracy, F1-score, and Matthews correlation coefficient (Mcc). The mathematical formulations for these metrics are detailed in Eqs. (1)–(5).

$${accuracy}=\frac{{TP}+{TN}}{{TP}+{FP}+{TN}+{FN}}$$

(1)

$$F1-{score}=\frac{2\times {specifity}\times {sensitivity}}{{specificity}+{sensitivity}}$$

(2)

$${specifity}=\frac{{TP}}{{TP}+{FP}}$$

(3)

$${sensitivity}=\frac{{TP}}{{TP}+{FN}}$$

(4)

$${Mcc}=\frac{{TP}\times {TN}-{FP}\times {FN}}{\sqrt{({TP}+{FP})({TP}+{FN})({TN}+{FP})({TN}+{FN})}}$$

(5)

Here, TP, TN, FP, and FN denote the numbers of true positives, true negatives, false positives, and false negatives, respectively. Sensitivity and specifity quantify the model’s ability to correctly identify positive and negative cases. The F1-score represents the harmonic mean of sensitivity and specifity, providing a balanced assessment of classification performance. Mcc measures the overall quality of binary classifications, ranging from −1 to +1, where +1 indicates perfect prediction, −1 signifies total disagreement between predictions and actual labels, and 0 reflects performance equivalent to random guessing.

Tables 2–6 summarize the average performance across 100 aggregation rounds for Scenarios A, B, C, D, and E with the best-performing federated learning methods highlighted in bold. Alongside federated learning approaches, the tables also report the average results for clients trained and evaluated on their isolated local datasets (Local) and the outcomes of centralized learning for comparison.

Table 2 Mean performance metrics for Scenario A using Dataset-1 (Spanish) and Dataset-2 (Italian) over 100 federated aggregation rounds

Full size table

Table 3 Mean performance metrics for Scenario B using Dataset-1 (Spanish) and Dataset-3 (Chinese) over 100 federated aggregation rounds

Full size table

Table 4 Mean performance metrics for Scenario C using Dataset-2 (Italian) and Dataset-3 (Chinese) over 100 federated aggregation rounds

Full size table

Table 5 Mean performance metrics for scenario D using Dataset-1 (Spanish), Dataset-2 (Italian), and Dataset-4 (Czech) over 100 aggregation rounds

Full size table

Table 6 Average results for scenario E on all five datasets over 100 aggregation rounds

Full size table

To ensure a fair and meaningful comparison, all baseline methods were carefully tuned and evaluated under consistent experimental conditions. For FedProx, we explored the proximal term coefficient μ ∈ {0.01, 0.1, 1.0} and selected the value μ = 0.1 that achieved the best performance in each experimental setting. For SCAFFOLD, we followed the standard configuration, with control variates updated at the end of every local training round. The standard implementation of FedNova was used without modification. To maintain comparability across methods, all experiments used the same optimization settings: Adam optimizer with a learning rate of 0.001, 10 local epochs per round, and a training batch size of 8.

As shown in Tables 2–6, FedOcw consistently outperforms conventional FL methods across all evaluated scenarios (A–E) in accuracy, F1-score, specifity, sensitivity, and Mcc, demonstrating superior stability and training effectiveness. It not only surpasses FedAvg, FedProx, Scaffold, and FedNova, but also outperforms centralized learning across all key metrics. These findings highlight the advantages of federated models in privacy-preserving and heterogeneous learning environments.

FedOcw’s adaptability to linguistic diversity is evident across all scenarios. In the Spanish–Italian setting (Scenario A), it achieves the highest accuracy (74.81%) and Mcc (0.502), demonstrating effective knowledge transfer between related languages. In the Spanish–Chinese scenario (Scenario B), the model maintains strong performance with 67.85% accuracy and an Mcc of 0.288, though the increased linguistic divergence presents convergence challenges. In the Italian–Chinese setting (Scenario C), FedOcw achieves high specifity (84.19%) and sensitivity (83.44%), indicating a balanced classification approach. In the trilingual scenario (Scenario D), it maintains top performance with the highest accuracy (72.53%), F1-score (69.8%), and Mcc (0.465). Even in the most heterogeneous multilingual scenario (Scenario E), the model sustains robust performance, achieving 72.63%accuracy and an Mcc of 0.435, highlighting its robustness and ability to generalize across linguistic domains.

Table 7 reports the p values for the mean Accuracy, F1-score, and Mcc metrics across all five scenarios, evaluating the statistical significance of differences between our proposed federated model (FedOcw) and alternative methods.

Table 7 Statistical significance (p values) of comparisons between the proposed FedOcw model and other learning methods across all evaluation metrics

Full size table

Table 7 shows that FedOcw achieves statistically significant improvements over individual learning (Local), alternative federated learning methods, and centralized learning in key performance metrics, including accuracy, F1-score, specifity, and Mcc. However, sensitivity is an exception, where it performs comparably to FedProx (p = 0.0525), indicating no significant difference. Compared to centralized learning, FedOcw demonstrates significant advantages across all evaluation metrics, reinforcing its effectiveness in diverse settings. Notably, FedOcw outperforms FedAvg with strong statistical significance in specifity (p = 0.0003) and Mcc (p = 0.0011). However, the lack of statistical significance in some comparisons (p > 0.05) for sensitivity with FedProx) suggests that certain methods may still be competitive in specific aspects. These findings highlight FedOcw’s robustness in handling heterogeneous and multilingual datasets, reinforcing its potential for broader cross-linguistic and clinical applications.

To better understand the model’s behavior across multilingual settings, we conducted a language-wise accuracy analysis of client models in Scenarios A–E. Figure 2 presents the individual accuracy scores for each language (Spanish, Italian, Chinese, Czech, and English), highlighting the specific contributions of each client group to the overall federated learning performance.

**Fig. 2: Accuracy by language group scross Scenarios A–E.**

As shown in Fig. 2, the Italian client consistently achieves the highest accuracy across scenarios in which it is present, reaching up to 94% in Scenario C and 91.6% in Scenario A. This suggests that the Italian dataset may contain more consistent or discriminative speech features for Parkinson’s detection, possibly due to better recording conditions, more clearly defined task protocols, or less intra-class variability. In contrast, Spanish and Chinese clients show more variable performance, with Chinese accuracy rising from 63.14% in Scenario B to 67.8% in Scenario C, depending on the pairing.

The performance gap between Scenario B (Spanish–Chinese) and Scenario C (Italian–Chinese) is particularly informative. While both involve cross-lingual collaboration with Chinese data, Scenario C significantly outperforms Scenario B. This may be attributed to greater similarity in task structure or feature distribution between Italian and Chinese datasets, leading to more effective model generalization. Alternatively, the Spanish dataset may differ more substantially in prosody, phonetic structure, or participant characteristics, making knowledge transfer more challenging.

A similar trend is observed when comparing Scenario A (Spanish–Italian) and Scenario D (Spanish–Italian–Czech). In Scenario A, a large performance gap exists between Italian (91.6%) and Spanish (58.01%), suggesting unbalanced contributions and potential dominance of the Italian dataset during model aggregation. However, when Czech is added in Scenario D, the gap narrows: Italian performance drops slightly to 83.87%, while Spanish improves to 63.52%, and Czech reaches 63.24%. This shift indicates that adding a third, linguistically distinct client introduces more diversity into the training process, which likely promotes better generalization across heterogeneous clients.

In Scenario E, where five languages are present, performance becomes more balanced across clients with different languages, though Italian still maintain relatively strong accuracy. This suggests that FedOcw is able to preserve generalization even under high linguistic and distributional heterogeneity.

To evaluate the global stability and efficiency of the federated learning framework, Fig. 3 presents the training loss convergence of various federated learning models across five evaluation scenarios (A, B, C, D, and E). The models compared include FedAvg, FedProx, Scaffold, FedNova, and the proposed FedOcw. The x-axis denotes the number of communication rounds, while the y-axis represents the average training loss across local clients. A lower training loss over time indicates improved convergence and model stability.

**Fig. 3: Training loss convergence across federated learning models.**

As shown in Fig. 3, FedOcw consistently achieves the lowest training loss across all scenarios, demonstrating superior convergence stability and effectiveness. In Scenario A, it stabilizes at a loss of ~0.3, while other models exhibit significant fluctuations, indicating sensitivity to data heterogeneity. Scenario B follows a similar pattern, with FedOcw maintaining low and stable training loss, whereas FedAvg and FedNova experience sharp oscillations, leading to poor convergence. In Scenario C, FedOcw again outperforms all models, stabilizing around 0.2, while the other methods struggle to converge, with increasing training loss over rounds, reflecting poor adaptation to the scenario. Similar trends are observed in Scenario D and E, where FedOcw demonstrates the best stability, while FedAvg, FedProx, and FedNova continue to show erratic loss patterns. These findings underscore FedOcw’s robustness in addressing non-IID data challenges, offering enhanced convergence stability and adaptability across diverse multilingual datasets. The observed training loss trends further highlight its resilience in handling complex learning environments, making it a promising candidate for real-world federated learning applications.

To examine the impact of the weighting strategy on individual clients during the federated learning process, we analyze client model C0 across five scenarios (A, B, C, D, and E) as case studies, focusing on the optimized client weights assigned by FedOcw. Table 8 presents the sample standard deviation (STDEV.S) over 100 rounds for the optimized weights of local clients when updating client model C0 in the five scenarios, considering various layer parameters of the deep learning model.

Table 8 Sample standard deviation (STDEV.S) of optimized client weights over 100 rounds when updating client model C0 across five scenarios (A, B, C, D, and E), considering various layer parameters of the deep learning model

Full size table

As presented in Table 8, the weights assigned to the Time-Distributed 2D-CNN layer exhibit the highest variability across aggregation rounds, underscoring their critical role in shaping the deep learning model’s performance. A similar trend is observed across other client models, indicating the central influence of this layer in the federated learning process. Given this, we focus on the Time-Distributed 2D-CNN layer for a more in-depth analysis of how the weighting strategy impacts individual clients during training.

Figure 4 shows the adjacency matrix of the weights assigned to the Time-Distributed 2D-CNN layer across five scenarios (A, B, C, D, and E). The y-axis represents the clients receiving updates, with each row corresponding to the aggregate weights assigned to the local clients. The weights are averaged over 100 rounds. The color bar visually indicates the weight values, emphasizing the relative importance of each client’s input space to the target client receiving updates.

**Fig. 4: Aggregation weights in Time-Distributed 2D-CNN across scenarios.**

As shown in Fig. 4, FedOcw does not confine weight assignment to clients within the same language group across all scenarios. Instead, updates are exchanged between clients from different linguistic backgrounds, demonstrating that the model enables cross-lingual knowledge transfer without imposing language-based isolation. The weight distribution remains relatively balanced, ensuring that model updates are equitably shared, allowing each client to both contribute to and benefit from diverse sources. Additionally, certain clients receive higher-weighted updates, suggesting that the personalization strategy enhances model performance by dynamically prioritizing influential clients. Importantly, these higher-weighted assignments do not consistently correspond to a specific language group, reinforcing the model’s adaptability.

To better understand the dynamics behind these weight assignments, we examined Scenario E to determine whether the most influential clients, defined as those consistently receiving higher weights, correlate with dataset-specific attributes such as training sample size, class distribution, or speech task diversity. Table 9 presents hypothetical examples illustrating this analysis.

Table 9 Preliminary Observations (Hypothetical Example Based on Scenario E)

Full size table

As shown in Table 9, The analysis of FedOcw’s weight assignment strategy in Scenario E reveals that client influence is not determined solely by dataset size or task diversity. While one might expect larger or more diverse datasets to receive higher weights, FedOcw instead appears to prioritize clients with balanced class distributions, as these tend to contribute more reliable and generalizable updates. Notably, the Czech client (C11) with relatively small dataset and limited task diversity receives one of the highest weight assignments, suggesting that FedOcw values the informativeness and alignment of updates over raw data quantity. This indicates that FedOcw adopts a nuanced aggregation strategy that promotes fairness and generalization by emphasizing the quality and complementary value of each client’s contribution rather than relying on size or frequency alone.

Discussion

Our findings highlight the advantages of federated learning (FL) in multilingual settings, with FedOcw enabling cross-lingual knowledge transfer while preserving privacy. Among FL approaches, FedOcw excels in handling heterogeneous data distributions, particularly in linguistically diverse scenarios.

FedOcw consistently outperforms FedAvg, FedProx, Scaffold, and FedNova across key metrics, with statistically significant improvements in specifity (p = 0.0003) and Mcc (p = 0.0011) compared to FedAvg. While its sensitivity is comparable to that of FedProx (p = 0.0525), FedOcw surpasses centralized learning across all evaluated metrics. Statistical validation confirms its significant (p < 0.05) improvements in most metrics, reinforcing FL’s effectiveness in privacy-preserving, heterogeneous environments.

FedOcw’s adaptability is evident across different language pairings. In Spanish–Italian (Scenario A), it achieves the highest accuracy (74.81%) and Mcc (0.502), demonstrating effective transfer between related languages. In Spanish–Chinese (Scenario B), greater linguistic divergence introduces challenges, yet the model maintains strong performance (67.85% accuracy, Mcc = 0.288). In Italian–Chinese (Scenario C), FedOcw achieves high specifity (84.19%) and sensitivity (83.44%), reflecting balanced classification. In the trilingual scenario (Scenario D), it maintains top performance with the highest accuracy (72.53%), F1-score (69.8%), and Mcc (0.465). Even in the most heterogeneous multilingual setting (Scenario E), it maintains robust accuracy (72.63%) and Mcc (0.435), demonstrating its strong generalization ability across diverse linguistic domains.

Language-Wise performance analysis show that Italian clients consistently achieve the highest accuracy, reaching up to 94% in Scenario C, likely due to more consistent speech features or better data quality. Spanish and Chinese clients exhibit more variable performance, influenced by differences in task similarity and language characteristics. The better performance of Scenario C (Italian–Chinese) compared to Scenario B (Spanish–Chinese) suggests greater alignment between Italian and Chinese datasets. Introducing Czech in Scenario D narrows the accuracy gap between Italian and Spanish clients, indicating that increased linguistic diversity enhances generalization. In Scenario E, which includes all five languages, performance becomes more balanced, underscoring FedOcw’s ability to generalize effectively amid high linguistic and distributional heterogeneity.

Convergence analysis further highlights FedOcw’s stability and efficiency. Unlike competing FL models that struggle with non-IID distributions, FedOcw consistently achieves the lowest and most stable training loss across all scenarios, demonstrating its resilience in handling complex multilingual data.

Examining client weight distributions reveals that FedOcw enables effective cross-lingual knowledge sharing without restricting weight assignments to specific language groups. The model ensures balanced contributions across clients while integrating personalization, prioritizing influential clients without linguistic bias. In Scenario E, the weight distribution remains stable, further affirming FedOcw’s capacity for multilingual generalization.

An in-depth analysis of client weights in Scenario E reveals that FedOcw does not favor clients solely based on dataset size or task diversity. Instead, higher weights are assigned to clients with balanced class distributions, as they provide more reliable and generalizable updates. For example, the Czech client, despite the relatively small dataset and limited task diversity, receives one of the highest weight assignments. This indicates that FedOcw’s aggregation strategy prioritizes update quality and alignment over raw data quantity, promoting fairness and improved generalization.

Despite its advantages, the weighting strategy in FedOcw may inadvertently give disproportionate influence to clients with poor convergence, which can degrade overall performance in highly heterogeneous environments. To address this limitation, future work will explore adaptive weighting mechanisms that account for both convergence dynamics and local model quality. Reinforcement learning–based aggregation strategies also present a promising direction for optimizing weight assignments and enhancing robustness in practical deployments.

In our current experimental setup (Scenarios A–E), cross-lingual heterogeneity is simulated by assigning distinct language-specific datasets to different groups of clients. However, we recognize that a more stringent setting, where each client is assigned a fully unique dataset with no overlap, would more closely reflect the diversity encountered in real-world federated learning scenarios. To address this, we plan to extend our framework in future work to support a strict one-to-one mapping between clients and datasets. This may involve assigning each client a different language, recording condition, or assessment protocol, enabling a more comprehensive evaluation of the proposed model’s generalizability and robustness in highly heterogeneous environments.

Methods

Federated learning framework

In this study, we propose a novel method within the federated learning framework for determining the weights of client models. This approach, termed FedOcw (Optimized Client Weights for Federated Learning), enables client nodes to develop customized models that adapt to their local datasets.

Figure 5 provides an overview of FedOcw. In the federated learning setup with $M$ clients (as illustrated in Fig. 5), at the onset of a federated aggregation round $(t=0)$, the central server initiates the process by dispatching the initial global model with parameters ${\theta }^{0}$ to each local client $k\in \{1,\ldots ,M\}$. Each client then performs local training on its private dataset, producing an updated local model ${\theta }_{k}^{t}$ and the gradient $\nabla {l}_{k}\left({\theta }_{k}^{t}\right)$ of the local loss function ${l}_{k}\left({\theta }_{k}^{t}\right)$ with respect to ${\theta }_{k}^{t}$.

**Fig. 5: Overview of the training process with FedOcw.**

These client-specific updates $\{{\theta }_{k}^{t},\,\nabla {l}_{k}\left({\theta }_{k}^{t}\right)\}$ are sent back to the server. Upon receiving all updates, the server computes a personalized aggregation for each client. Specifically, for client $k$, the server derives a weighted combination of all client models based on an optimized vector of weights ${{\boldsymbol{w}}}_{k}^{t}={[{w}_{k(1)}^{t},\cdots ,{w}_{k(M)}^{t}]}^{{\rm{T}}}$, where ${w}_{k(m)}^{t}$ denotes the contribution of client $m$’s model to the updated model for client $k$. This design allows the central server to tailor each client’s updated model ${\theta }_{k}^{t+1}$ using a dynamically weighted fusion of all available models, as shown in the equation at the top of Fig. 5. The updated models ${\{\theta }_{1}^{t+1},\cdots ,\,{\theta }_{M}^{t+1}\}$ are then sent back to their corresponding clients, and the process repeats over a certain number of aggregation rounds.

Please note that in the context of federated learning, the client-specific weight vector ${{\boldsymbol{w}}}_{k}^{t}$ is dynamically optimized for each participating client $k$ based on the local trained model ${\theta }_{k}^{t}$ and the gradient $\nabla {l}_{k}\left({\theta }_{k}^{t}\right)$ of the local loss function. Consequently, the updated model ${\theta }_{k}^{t+1}$ is tailored to better fit the local data distribution of client $k$.

The pseudo code for FedOcw is provided in Algorithm 1.

Algorithm 1

Optimizing Client Weights for Federated Learning (FedOcw)

Input:

• Each client is indexed by $k$

• Each communication round is indexed by $t$

• $M$: number of clients participating in round $t$

• ${n}_{k}$: number of data samples on client $k$

• ${\theta }_{k}^{t}$: model parameters for client $k$ at round $t$

• $\nabla {l}_{k}\left({\theta }_{k}^{t}\right)$ : gradient of the local loss function ${l}_{k}\left({\theta }_{k}^{t}\right)$ with respect to ${\theta }_{k}^{t}$.

1. Initialize model parameters ${\theta }^{0}$ and distribute them to all clients.

2. For each aggregation round $t$:

a. Client-side (executed in parallel on all clients):

• Train the local model using end-to-end deep learning on the client’s private dataset.

• Compute and send the updated local parameters ${\theta }_{k}^{t}$ and gradient $\nabla {l}_{k}\left({\theta }_{k}^{t}\right)$ to the server.

b. Server-side:

• Upon receiving ${\theta }_{k}^{t}$ and $\nabla {l}_{k}\left({\theta }_{k}^{t}\right)$ from all clients:

• For each client $k$ (in parallel):

• Compute the optimal aggregation weights ${{\boldsymbol{w}}}_{k}^{t}$ using the proposed optimization method.

• Update the personalized model for client $k$ as:

$${\theta }_{k}^{t+1}=\mathop{\sum }\limits_{m=1}^{M}{w}_{k(m)}^{t}\left[{\theta }_{m}^{t}-\eta \nabla {l}_{m}\left({\theta }_{m}^{t}\right)\right]$$

As depicted in Algorithm 1, in each aggregation round, every client computes a personalized weight determined by its local empirical loss and the gradient of the local loss function concerning the parameters. This mechanism empowers clients to develop tailored models that aptly adjust to their local datasets.

Previous studies have generally minimized the global loss across all clients by using uniform or static aggregation weights. In contrast, this study adopts a dynamic and personalized approach: at each aggregation round $t$, the optimal client weights are calculated to maximize the expected reduction in local loss for each client. That is, the weight vector ${{\boldsymbol{w}}}_{k}^{t}={[{w}_{k(1)}^{t},\cdots ,{w}_{k(M)}^{t}]}^{{\rm{T}}}$ for client $k$ defines the relative contributions of each client $m\in \{1,\ldots ,M\}$ to client k’s next-round model.

Let ${l}_{k}\left({\theta }_{k}^{t}\right)$ denote the empirical loss of client $k$ at round $t$. The goal is to compute ${{\boldsymbol{w}}}_{k}^{t}$ such that the expected loss reduction for client $k$ is maximized:

$$\mathop{\max }\limits_{{{\boldsymbol{w}}}_{k}^{t}}\{{l}_{k}\left({\theta }_{k}^{t}\right)-{l}_{k}\left({\theta }_{k}^{t+1}\right)\}$$

(6)

where ${\theta }_{k}^{t}$ represents current model of client $k$ and ${\theta }_{k}^{t+1}$ is the updated model.

The updated model for client $k$ is computed as:

$${\theta }_{k}^{t+1}=\mathop{\sum }\limits_{m=1}^{M}{w}_{k(m)}^{t}\left[{\theta }_{m}^{t}-\eta \nabla {l}_{m}\left({\theta }_{m}^{t}\right)\right]$$

(7)

Here, $M$ is the total number of participating clients, ${w}_{k(m)}^{t}$ denotes the contribution of client $m$'s update to the model of client $k$, and $\eta$ is the learning rate. $\nabla {l}_{m}\left({\theta }_{m}^{t}\right)$ is the gradient of the local loss function ${l}_{m}\left({\theta }_{m}^{t}\right)$ with respect to ${\theta }_{m}^{t}$

To simplify the optimization, we apply a first-order Taylor approximation to the loss function:

$${l}_{k}\left({\theta }_{k}^{t+1}\right)\approx {l}_{k}\left({\theta }_{k}^{t}\right)+\nabla {l}_{k}\left({\theta }_{k}^{t}\right)\cdot ({\theta }_{k}^{t+1}-{\theta }_{k}^{t})$$

(8)

Replacing this approximation and using the aggregation function (Eq. (7)) for ${\theta }_{k}^{t+1}$, we obtain Eq. (9):

$$\begin{array}{ll}\,\mathop{\max }\limits_{{{\boldsymbol{w}}}_{k}^{t}}\{{l}_{k}\left({\theta }_{k}^{t}\right)-{l}_{k}\left({\theta }_{k}^{t+1}\right)\}\,\approx \mathop{\max }\limits_{{{\boldsymbol{w}}}_{k}^{t}}\left\{{l}_{k}\left({\theta }_{k}^{t}\right)\right.\left.-{l}_{k}\left({\theta }_{k}^{t}\right)-\nabla {l}_{k}\left({\theta }_{k}^{t}\right)\cdot ({\theta }_{k}^{t+1}-{\theta }_{k}^{t})\right\}\\\qquad\qquad\qquad\qquad\qquad\;\;\;\approx \mathop{\max }\limits_{{{\boldsymbol{w}}}_{k}^{t}}\{-\nabla {l}_{k}\left({\theta }_{k}^{t}\right)\cdot ({\theta }_{k}^{t+1}-{\theta }_{k}^{t})\}\\\qquad\qquad\qquad\qquad\qquad\,\,\,\,\approx \mathop{\max }\limits_{{{\boldsymbol{w}}}_{k}^{t}}\{-\nabla {l}_{k}\left({\theta }_{k}^{t}\right)\cdot (\mathop{\sum }\limits_{m=1}^{M}{w}_{k(m)}^{t}\left[{\theta }_{m}^{t}-\eta \nabla l\left({\theta }_{m}^{t}\right)\right]-{\theta }_{k}^{t})\}\end{array}$$

(9)

The objective function (6) can be reformulated into a minimization problem by negating Eq. (9), yielding Eq. (10):

$$\mathop{{\rm{argmin}}}\limits_{{{\boldsymbol{w}}}_{k}^{t}}\left\{\left({{\boldsymbol{w}}}_{k}^{t}-{{\boldsymbol{\alpha }}}_{k}^{t}\right)\cdot \nabla {l}_{k}\left({\theta }_{k}^{t}\right)\cdot \left({{\boldsymbol{\theta }}}^{t}-{\eta {\boldsymbol{\nabla }}}^{t}\right)-{\eta}{\left\Vert\right.\nabla {l}_{k}\left({\theta }_{k}^{t}\right)\left\Vert\right.}^{2}\right\}$$

(10)

Since $\eta > 0$, this minimization problem can be further represented by Eq. (11)

$$\mathop{{\rm{argmin}}}\limits_{{{\boldsymbol{w}}}_{k}^{t}}\left({{\boldsymbol{w}}}_{k}^{t}-{{\boldsymbol{\alpha }}}_{k}^{t}\right)\cdot \nabla {l}_{k}\left({\theta }_{k}^{t}\right)\cdot ({\boldsymbol{\theta}}^{t}-{\eta {\boldsymbol{\nabla}}}^{t})$$

(11)

subject to

$${{\boldsymbol{w}}}_{k}^{t}=\left[\begin{array}{c}{w}_{k(1)}^{t}\\ \vdots \\ {w}_{k(M)}^{t}\end{array}\right]\ge {\mathbf{0}}$$

(12)

(where $M$ is the number of clients participating at round $t$)and

$$\mathop{\sum }\limits_{i=1}^{M}{w}_{k(i)}^{t}=1$$

(13)

$${{\boldsymbol{\alpha }}}_{k}^{t}=\left[\begin{array}{c}{\alpha }_{k(1)}^{t}\\ \vdots \\ {\alpha }_{k(M)}^{t}\end{array}\right],\qquad{\alpha }_{k(i)}^{t}\triangleq \left\{\begin{array}{l}1,\quad{if}i=k\\ 0,\quad{if}i\,\ne\, k\end{array}\right.$$

(14)

$${{\boldsymbol{\theta }}}^{t}=\left[\begin{array}{c}{\theta }_{1}^{t}\\ \vdots \\ {\theta }_{M}^{t}\end{array}\right]$$

(15)

$${{\boldsymbol{\nabla }}}^{t}=\left[\begin{array}{c}\nabla {l}_{1}\left({\theta }_{1}^{t}\right)\\ \vdots \\ \nabla {l}_{M}\left({\theta }_{M}^{t}\right)\end{array}\right]$$

(16)

The first constraint in equation (12) ensures all client weights are non-negative, while the second constraint in Eq. (13) enforces the sum of weights for all clients to be equal to one, thereby maintaining a balanced contribution to the model parameters ${\theta }_{k}^{t}$ and an effective learning rate. However, this optimization problem may have a trivial solution where all weights ${w}_{k\left(i\right)}^{t}$ converge to ${\alpha }_{k(i)}^{t}$, except for the client with the smallest value of $\nabla {l}_{k}\left({\theta }_{k}^{t}\right)\cdot {({\boldsymbol{\theta }}}^{t}-{\eta {\boldsymbol{\nabla }}}^{t})$. In such a scenario, the system would rely solely on one client to update the parameters, severely hampering aggregation efficiency. To prevent this, an additional regularization term is required, penalizing small values of $\nabla {l}_{k}\left({\theta }_{k}^{t}\right)\cdot {({\boldsymbol{\theta }}}^{t}-{\eta {\boldsymbol{\nabla }}}^{t})$ from a standard weight ${w}_{k}^{* }=\frac{{n}_{k}}{N}$ for client $k$ (where ${n}_{k}$ is the number of data samples on client $k,\,N={\sum }_{j}{n}_{j}$ is the total number of data samples).

Let ${{\boldsymbol{v}}}_{k}^{t}=\left[\begin{array}{c}{v}_{k(1)}^{t}\\ \vdots \\ {v}_{k(M)}^{t}\end{array}\right]={{\boldsymbol{w}}}_{k}^{t}-{{\boldsymbol{\alpha }}}_{k}^{t}$. By adding the regularization term, this minimization problem can be expressed as Eq. (17):

$$\mathop{{\rm{argmin}}}\limits_{{{\boldsymbol{V}}}_{k}^{t}}\left\{{{\boldsymbol{v}}}_{k}^{t}\cdot \nabla {l}_{k}\left({\theta }_{k}^{t}\right)\cdot {({\boldsymbol{\theta }}}^{t}-{\eta {\boldsymbol{\nabla }}}^{t}){\boldsymbol{+}}\mu {\left|\left|{{\boldsymbol{v}}}_{k}^{t}-{{\boldsymbol{\beta }}}_{k}^{t}\right|\right|}^{2}\right\}$$

(17)

subject to

$${{\boldsymbol{v}}}_{k}^{t}\,\ge\,{-}{{\boldsymbol{\alpha }}}_{k}^{t}$$

(18)

and

$$\mathop{\sum }\limits_{i=1}^{M}{v}_{k(i)}^{t}=0$$

(19)

$${{\boldsymbol{\beta }}}_{k}^{t}=\left[\begin{array}{c}{\beta }_{k(1)}^{t}\\ \vdots \\ {\beta }_{k(M)}^{t}\end{array}\right]={w}_{k}^{* }-{{\boldsymbol{\alpha }}}_{k}^{t}$$

(20)

where

$${w}_{k}^{* }=\frac{{n}_{k}}{N}$$

(21)

This quadratic problem can be effectively solved using suitable optimization algorithms, with the regularization term controlled by parameter $\mu > 0.$ The value of μ is empirically set and consistently fixed at 0.05 in all experiments.

Once we have obtained ${{\boldsymbol{v}}}_{k}^{t}$ by solving this quadratic problem, the weight ${{\boldsymbol{w}}}_{k}^{t}$ can be obtained by:

$${{\boldsymbol{w}}}_{k}^{t}={{\boldsymbol{v}}}_{k}^{t}+{{\boldsymbol{\alpha }}}_{k}^{t}$$

(22)

The computation complexity of the optimization problem (Eq. (17)) primarily depends on the precomputation of the values of vectors ${{\boldsymbol{\nabla }}}^{t}$ and ${{\boldsymbol{\theta }}}^{t}$. As these values can be obtained by locally training the client model $k$, they can be saved and transmitted directly to the central server.

Client-end deep learning model and hyperparameter settings

This study adopts an end-to-end deep learning architecture for client-side training, which combines time-distributed 2D Convolutional Neural Networks (2D-CNNs) with a 1D Convolutional Neural Network (1D-CNN). The architecture is composed of two key modules: (1) Time-distributed 2D-CNN Module: This module applies a series of 2D convolutional operations independently to each time step, extracting time-series dynamic features from the input log Mel-spectrogram. (2) 1D-CNN Module: The resulting time-series features are then passed through a 1D-CNN block, which captures temporal dependencies between segments. Further details about this model can be found in³⁸.

The input to the model consists of log Mel-spectrograms derived from speech signals. Each input sample is represented as a sequence of overlapping segments extracted from the spectrogram of a speech recording. Audio recordings were resampled to 22,050 Hz and processed using Librosa⁴⁶, with a hop length of 512 and 55 Mel-frequency bands. To handle varying lengths of input recordings, zero-padding was applied to ensure uniform tensor dimensions compatible with PyTorch. The spectrogram frame count and time-series length were fixed at 40 and 50, respectively, to maintain consistent input shapes across batches.

The architecture and parameter settings were guided by prior work³⁸, where this configuration achieved a favorable balance between model complexity and performance. It is particularly well-suited to resource-constrained and privacy-sensitive environments such as federated learning.

All experiments were implemented in PyTorch and conducted in a consistent computing environment featuring an NVIDIA RTX A6000 GPU, an Intel Xeon Silver 4210 CPU (2.20 GHz), Ubuntu 20.04.6 LTS (64-bit), and 64 GiB of memory. The quadratic optimization problem (Eq. (17)) was solved using CVXOPT⁴⁷.

The same set of hyperparameters was used across all experiments. Details of the client-end model architecture and the control parameters used in the CVXOPT solver are summarized in Table 10.

Table 10 Hyperparameter settings for the client-end deep learning model and the control parameters for the CVXOPT solver

Full size table

Code availability

The code utilized for training and evaluating the federated learning models in this study is available upon reasonable request. To ensure reproducibility and facilitate further research, the implementation details, including data preprocessing scripts, model architecture, and training configurations, can be accessed by contacting the corresponding author. Additionally, we are committed to open science and intend to release the full codebase on a public repository upon acceptance of this manuscript, ensuring accessibility and transparency in our research methods.

References

Dorsey, E. R. & Bloem, B. R. The Parkinson pandemic—A call to action. JAMA Neurol. 75, 9–10 (2018).
Article PubMed Google Scholar
Logemann, J. A., Fisher, H. B., Boshes, B. & Blonsky, E. R. Frequency and cooccurrence of vocal tract dysfunctions in the speech of a large sample of Parkinson patients. J. Speech Hear. Disord. 43, 47–57 (1978).
Article CAS PubMed Google Scholar
Sapir, S. et al. Voice and speech abnormalities in Parkinson disease: relationship to severity of motor impairment, duration of disease, medication, depression, gender, and age. J. Med. Speech-Lang Pathol. 9, 213–226 (2001).
Google Scholar
Forrest, K., Weismer, G. & Turner, G. S. Kinematic, acoustic, and perceptual analyses of connected speech produced by Parkinsonian and normal geriatric adults. J. Acoust Soc. Am. 85, 2608–2622 (1989).
Article CAS PubMed Google Scholar
Ackermann, H. & Ziegler, W. Articulatory deficits in parkinsonian dysarthria: an acoustic analysis. J. Neurol. Neurosurg. Psychiatr. 5, 1093–1098 (1991).
Article Google Scholar
Ho, A. K., Iansek, R., Marigliani, C., Bradshaw, J. L. & Gates, S. Speech impairment in a large sample of patients with Parkinson’s disease. Behav. Neurol. 11, 131–137 (1998).
Article PubMed Google Scholar
Narendra, N. P., Schuller, B. & Alku, P. The detection of Parkinson’s disease from speech using voice source information. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 1925–1936 (2021).
Article Google Scholar
Tsanas, A., Little, M. A., McSharry, P. E., Spielman, J. & Rami, L. O. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 59, 1264–1271 (2012).
Article PubMed Google Scholar
Sakar, B. E. et al. Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J. Biomed. Health Inf. 17, 828–834 (2013).
Article Google Scholar
Tuncer, T., Dogan, S. & Acharya, U. R. Automated detection of Parkinson’s disease using minimum average maximum tree and singular value decomposition method with vowels. Biocybern. Biomed. Eng. 40, 211–220 (2020).
Article Google Scholar
Lahmiri, S., Dawson, D. A. & Shmuel, A. Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures. Biomed. Eng. Lett. 8, 29–39 (2017).
Article PubMed PubMed Central Google Scholar
Shahbakhi, M., Far, D. & Tahami, E. Speech analysis for diagnosis of Parkinson’s disease using genetic algorithm and support vector machine. J. Biomed. Sci. Eng. 7, 147–156 (2014).
Article Google Scholar
Despotovic, V., Skovranek, T. & Schommer, C. Speech based estimation of Parkinson’s disease using Gaussian processes and automatic relevance determination. Neurocomputing 401, 173–181 (2020).
Article Google Scholar
Vásquez-Correa, J. C. et al. Multimodal assessment of Parkinson’s disease: a deep learning approach. IEEE J. Biomed. Health Inf. 23, 1618–1630 (2019).
Article Google Scholar
Fujita, T., Luo, Z., Quan, C., Mori, K. & Cao, S. Performance evaluation of RNN with hyperbolic secant in gate structure through application of Parkinson’s disease detection. Appl. Sci. 11, https://doi.org/10.3390/app11104361 (2021).
La Quatra, M. et al. Exploiting foundation models and speech enhancement for Parkinson’s disease detection from speech in real-world operative conditions. In Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), https://doi.org/10.21437/Interspeech.2024-522 (2024).
Gimeno-Gómez, D. et al. Unveiling interpretability in self-supervised speech representations for Parkinson’s diagnosis. IEEE J. Sel. Top. Signal Process. 99, 1–14 (2025).
Article Google Scholar
Bercea, C. I. et al. Federated disentangled representation learning for unsupervised brain anomaly detection. Nat. Mach. Intell. 4, 685–695 (2022).
Article Google Scholar
Dou, Q. et al. Author Correction: Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study. npj Digit. Med. 5, 56 (2022).
Article PubMed PubMed Central Google Scholar
Dayan, I. et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nat. Med. 27, 1735–1743 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ogier du Terrail, J. et al. Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer. Nat. Med. 29, 135–146 (2023).
Article CAS PubMed Google Scholar
Kumar, A., Purohit, V., Bharti, V., Singh, R. & Singh, S. K. Medisecfed: private and secure medical image classification in the presence of malicious clients. IEEE Trans. Ind. Inf. 18, 5648–5657 (2021).
Article Google Scholar
Agbley, B. L. Y. et al. Federated fusion of magnified histopathological images for breast tumor classification in the internet of medical things. IEEE J. Biomed. Health Inf. 28, 3389–3400 (2023).
Feng, B. et al. Robustly federated learning model for identifying high-risk patients with postoperative gastric cancer recurrence. Nat. Commun. 15, 742 (2024).
Article CAS PubMed PubMed Central Google Scholar
Peng, L. et al. An in-depth evaluation of federated learning on biomedical natural language processing for information extraction. npj Digit. Med. 7, 127 (2024).
Article PubMed PubMed Central Google Scholar
Dipro, S. H., Islam, M., Nahian, M., Al, A. & Azad, M. S. A federated learning approach for detecting Parkinson’s Disease through privacy preserving by blockchain. Ph.D. Thesis, Brac University (2022).
Sarlas, A., Kalafatelis, A., Alexandridis, G., Kourtis, M.-A. & Trakadas, P. Exploring Federated learning for speech-based Parkinson’s disease detection. In The 18th International Conference on Availability, Reliability and Security (ARES 2023), August 29-September 01, 2023, Benevento, Italy. https://doi.org/10.1145/3600160.3605088 (ACM, 2023).
McMahan, H. B. et al. Communication-efficient learning of deep networks from decentralized data. In International Conference on Artificial Intelligence and Statistics (PMLR, 2017).
Li, T. et al. Federated optimization in heterogeneous networks. In Proc. MLSys Conference, Austin, TX, USA (PMLR, 2020).
Huang, W. et al. Fairness and accuracy in horizontal federated learning. Inf. Sci. 589, 170–185 (2022).
Article Google Scholar
Mansour, Y., Mohri, M., Ro, J. & Suresh, A. T. Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619 (2020).
Wang, K. et al. Federated evaluation of on-device personalization. arXiv preprint arXiv:1910.10252 (2019).
Karimireddy, S. P. et al. Scaffold: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning, 5132–5143 (PMLR, 2020).
Wang, J., Liu, Q., Liang, H., Joshi, G. & Poor, H. V. Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural Inf. Process. Syst. 33, 7611–7623 (2020).
Huang, Y. et al. Personalized cross-silo federated learning on non-iid data. In Proc. AAAI Conference on Artificial Intelligence, Vol. 35, 7865–7873 (ACM, 2021).
Smith, V., Chiang, C.-K., Sanjabi, M., Talwalkar, A. S. Federated multi-task learning. Adv. Neural Inform. Process. Syst. 30, 4424–4434 (2017).
Quan, C., Ren, K. & Luo, Z. A deep learning-based method for Parkinson’s disease detection using dynamic features of speech. IEEE Access 9, 10239–10252 (2021).
Article Google Scholar
Quan, C., Ren, K., Luo, Z., Chen, Z. & Ling, Y. End-to-end deep learning approach for Parkinson’s disease detection from speech signals. Biocybern. Biomed. Eng. 42, 556–574 (2022).
Article Google Scholar
Botelho, C., Schultz, T., Abad, A. & Trancoso, I. Challenges of using longitudinal and cross-domain corpora on studies of pathological speech. In Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), 1921–1925. https://doi.org/10.21437/Interspeech.2022-10995 (2022).
Orozco-Arroyave, J. R., Arias-Londoño, J. D., Vargas-Bonilla, J. F., González-Rátiva, M. C. & Nöth, E. New Spanish speech corpus database for the analysis of people suffering from Parkinson’s disease. In Proc. LREC 2014, 342–347 (ELRA, 2014).
Dimauro, G., Di Nicola, V., Bevilacqua, V., Caivano, D. & Girardi, F. Assessment of speech intelligibility in Parkinson’s disease using a speech-to-text system. IEEE Access 5, 22199–22208 (2017).
Article Google Scholar
Dimauro, G. & Girardi, F. Italian Parkinson’s voice and speech. IEEE Dataport, June 11, https://doi.org/10.21227/aw6b-tg17 (2019).
Hlavnička, J., Čmejla, R., Tykalová, T., Štastná, M. & Rektorová, I. Acoustic tracking of pitch, modal, and subharmonic vibrations of vocal folds in Parkinson’s disease and parkinsonism. IEEE Access 7, 150339–150354 (2019).
Article Google Scholar
Trivedi, D., Jaeger, H. & Stadtschnitzer, M. Mobile device voice recordings at King’s College London (MDVR-KCL) from both early and advanced Parkinson’s disease patients and healthy controls. Zenodo https://doi.org/10.5281/zenodo.2867216 (2019).
Huang, F., Xu, H., Shen, T. & Jin, L. Recognition of Parkinson’s disease based on residual neural network and voice diagnosis. In Proc. 2021 IEEE 5th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), 381–386 https://doi.org/10.1109/ITNEC52019.2021.9586915 (2021).
McFee, B. et al. Librosa: v0.5.0 (2021). https://doi.org/10.5281/zenodo.293021. Accessed 1 March 2023.
Andersen, M. S., Dahl, J. & Vandenberghe, L. CVXOPT: A Python package for convex optimization (2019). http://cvxopt.org. Accessed 10 July 2023.

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Numbers JP25K15078.

Author information

Authors and Affiliations

Graduate School of System Informatics, Kobe University, Kobe, Japan
Changqin Quan & Zhiwei Luo
GYENNO SCIENCE CO. LTD., Shenzhen, China
Zhonglue Chen & Kang Ren

Authors

Changqin Quan
View author publications
Search author on:PubMed Google Scholar
Zhonglue Chen
View author publications
Search author on:PubMed Google Scholar
Kang Ren
View author publications
Search author on:PubMed Google Scholar
Zhiwei Luo
View author publications
Search author on:PubMed Google Scholar

Contributions

C.Q. conceptualized the study, developed the federated learning framework, implemented the FedOcw algorithm, and drafted the manuscript. Z.C. performed experimental analysis and performance evaluations. K.R. managed data collection and preprocessing. Z.L. provided critical feedback, contributing to research design, analysis, and manuscript revision. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Changqin Quan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Quan, C., Chen, Z., Ren, K. et al. FedOcw: optimized federated learning for cross-lingual speech-based Parkinson’s disease detection. npj Digit. Med. 8, 357 (2025). https://doi.org/10.1038/s41746-025-01763-3

Download citation

Received: 02 April 2025
Accepted: 01 June 2025
Published: 13 June 2025
Version of record: 13 June 2025
DOI: https://doi.org/10.1038/s41746-025-01763-3

FedOcw: optimized federated learning for cross-lingual speech-based Parkinson’s disease detection

Subjects

Abstract

Similar content being viewed by others

Detection of Parkinson disease using multiclass machine learning approach

Parkinson’s disease detection based on features refinement through L1 regularized SVM and deep neural network

ParkMAE: a cross-linguistic masked autoencoder framework for robust Parkinson’s disease detection from speech

Introduction

Results

Discussion

Methods

Federated learning framework

Algorithm 1

Client-end deep learning model and hyperparameter settings

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Detection of Parkinson disease using multiclass machine learning approach

Parkinson’s disease detection based on features refinement through L1 regularized SVM and deep neural network

ParkMAE: a cross-linguistic masked autoencoder framework for robust Parkinson’s disease detection from speech

Introduction

Results

Discussion

Methods

Federated learning framework

Algorithm 1

Client-end deep learning model and hyperparameter settings

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links