Abstract
The application of deep learning technologies in constructing infectious disease prediction models has significantly enhanced public health strategies; however, the imperative for medical data privacy often prevents institutions from sharing diverse datasets, leading to data silos and diminished predictive accuracy. To address these challenges, we propose a multi-layered privacy-preserving framework that balances security and computational performance. First, we introduce a Random Transmission Hybrid Homomorphic algorithm that integrates CKKS fully homomorphic encryption with Paillier semi-homomorphic mechanisms, optimized by a random transmission sequence. Experimental evaluations demonstrate that this hybrid approach achieves a 25% improvement in computational and communication efficiency compared to conventional homomorphic encryption methods by reducing ciphertext overhead and skipping redundant update cycles. Second, we developed the Data Selection-Distributed Selection Stochastic Gradient Descent (DS-DSSGD) algorithm to optimize the trade-off between training speed and predictive accuracy. By filtering insignificant gradient updates and focusing on high-contribution features, the DS-DSSGD algorithm ensures high model precision even under the increased computational demands of privacy-preserving technologies. Finally, these innovations are integrated into the XDP Privacy Data Sharing Platform, providing a secure environment for end-to-end data lifecycle management. Collectively, our results indicate that the proposed framework not only safeguards sensitive health information but also maintains the high-precision forecasting capabilities essential for effective epidemic response.
Data availability
The data that support the findings of this study are available from [BaseBit] but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of [BaseBit].
References
Dembek, Z. F., Chekol, T. & Wu, A. Best practice assessment of disease modelling for infectious disease outbreaks. Epidemiol. Infect. 146 (10), 1207–1215 (2018).
Fan, S. et al. ASTM: developing the web service for anthrax related Spatiotemporal characteristics and meteorology study. Quant. Biology. 10 (1), 67–78 (2022).
Liu, K. et al. Developing a Visual Analysis Platform of Human Rabies for Hubei Province of China (VAP-HRHB). in. IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2021. IEEE. (2021).
Ma, F. et al. An integrated platform for Brucella with knowledge graph technology: from genomic analysis to epidemiological projection. Front. Genet. 13, 981633 (2022).
Tong, X. et al. Development of an Agent-Based model (ABM) to simulate the immune system and integration of a regression method to estimate the key ABM parameters by fitting the experimental data. PLoS One. 10 (11), e0141295 (2015).
Wu, W. et al. Exploring the dynamics and interplay of human papillomavirus and cervical tumorigenesis by integrating biological data into a mathematical model. BMC Bioinform. 21 (Suppl 7), 152 (2020).
Zhang, L. et al. MCDB: A comprehensive curated mitotic catastrophe database for retrieval, protein sequence alignment, and target prediction. Acta Pharm. Sin B. 11 (10), 3092–3104 (2021).
Zhou, K. et al. ONE health approach to address zoonotic brucellosis: A Spatiotemporal associations study between animals and humans. Front. Vet. Sci. 7, 521 (2020).
Gao, H. et al. Developing an Agent-Based drug model to investigate the synergistic effects of drug combinations. Molecules 22 (12), 2209 (2017).
Gao, J. et al. Boosting your context by dual similarity checkup for In-Context learning medical image segmentation. IEEE Trans. Med. Imaging. 44 (1), 310–319 (2025).
Gao, J. et al. Unsupervised cross-disease domain adaptation by lesion scale matching. in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer. (2022).
Gao, J. et al. Anatomically guided Cross-Domain repair and screening for ultrasound fetal biometry. IEEE J. Biomed. Health Inf. 27 (10), 4914–4925 (2023).
Gao, J. et al. Robust needle localization and enhancement algorithm for ultrasound by deep learning and beam steering methods. J. Comput. Sci. Technol. 36 (2), 334–346 (2021).
Lai, X. et al. A disease network-based deep learning approach for characterizing melanoma. Int. J. Cancer. 150 (6), 1029–1044 (2022).
You, Y. et al. Developing a predictive platform for Salmonella antimicrobial resistance based on a large Language model and quantum computing. Engineering 48, 174–184 (2025).
Li, B. et al. DGHNN: a deep graph and hypergraph neural network for pan-cancer related gene prediction. Bioinformatics 41 (7), btaf379 (2025).
Jiang, Z. et al. Diffusion Model-Based Multi-Channel EEG representation and forecasting for early epileptic seizure warning. Interdiscip Sci., 1–12. (2025).
Shahid, F., Zameer, A. & Muneeb, M. Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos Solitons Fractals. 140, 110212 (2020).
Chen, J. et al. A privacy policy text compliance reasoning framework with large Language models for healthcare services. Tsinghua Sci. Technol. 30 (4), 1831–1845 (2025).
Lu, J. et al. Dual-Modality integration attention with Graph-Based feature extraction for visual question and answering. Tsinghua Sci. Technol. 30 (5), 2133–2145 (2025).
Wang, M. et al. Towards federated learning driving technology for Privacy-Preserving Micro-Expression recognition. Tsinghua Sci. Technol. 30 (5), 2169–2183 (2025).
Jiang, Y. et al. From pretraining to privacy: federated ultrasound foundation model with self-supervised learning. NPJ Digit. Med. 8 (1), 714 (2025).
Wu, X. et al. A differential privacy DNA motif finding method based on closed frequent patterns. Cluster Comput. 22 (S2), 2907–2919 (2018).
Sathish Kumar, G. et al. Differential privacy scheme using Laplace mechanism and statistical method computation in deep neural network for privacy preservation. Eng. Appl. Artif. Intell. 128, 107399 (2024).
Kumar, G. S. & Premalatha, K. STIF: intuitionistic fuzzy Gaussian membership function with statistical transformation weight of evidence and information value for private information preservation. Distrib. Parallel Databases. 41 (3), 1–34 (2023).
Kumar, G. S. & Premalatha, K. Securing private information by data perturbation using statistical transformation with three dimensional shearing. Appl. Soft Comput. 112, 107819 (2021).
Wang, H. & Wu, X. IPP: an intelligent Privacy-Preserving scheme for detecting interactions in genome association studies. IEEE/ACM Trans. Comput. Biol. Bioinform. 20 (1), 455–464 (2023).
Wang, H. et al. An intelligent blockchain-based access control framework with federated learning for genome-wide association studies. Comput. Stand. Interfaces. 84, 103694 (2023).
Sathish Kumar, G. et al. No more privacy concern: A privacy-chain based homomorphic encryption scheme and statistical method for privacy preservation of user’s private and sensitive data. Expert Syst. Appl. 234, 121071 (2023).
Chae, S., Kwon, S. & Lee, D. Predicting infectious disease using deep learning and big data. Int. J. Environ. Res. Public. Health. 15 (8), 1596 (2018).
Alzu’bi, A. et al. A review of privacy and security of edge computing in smart healthcare systems: Issues, Challenges, and research directions. Tsinghua Sci. Technol. 29 (4), 1152–1180 (2024).
Yeh, C. K. An authentication protocol for ubiquitous health monitoring systems. J. Med. Biol. Eng. 33 (4), 415–419 (2013).
Anusuya, R., Oviya, S. & Sangavi, R. Secured data sharing of medical images for disease diagnosis using deep learning models and federated learning framework. in International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS). 2023. IEEE. 2023. IEEE. (2023).
Zaremba, W., Sutskever, I. & Vinyals, O. Recurrent neural network regularization. (2014).
Miao, Y. et al. Privacy-Preserving Byzantine-Robust federated learning via blockchain systems. IEEE Trans. Inf. Forensics Secur. 17, 2848–2861 (2022).
Paillier, P. Public-key cryptosystems based on composite degree residuosity classes. in International conference on the theory and applications of cryptographic techniques. Springer. (1999).
Xia, Y. et al. Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model. BMC Genom. 18 (Suppl 1), 950 (2017).
Zhang, Q. et al. Developing a physiological Signal-Based, mean threshold and Decision-Level fusion algorithm (PMD) for emotion recognition. Tsinghua Sci. Technol. 28 (4), 673–685 (2023).
You, Y. et al. Artificial intelligence in cancer target identification and drug discovery. Signal. Transduct. Target. Ther. 7 (1), 156 (2022).
Zhang, T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. in Proceedings of the twenty-first international conference on Machine learning. (2004).
de Ariel, D. et al. How to perform a Meta-Analysis: A practical Step-by-Step guide using R software and Rstudio. Acta Ortop. Bras. 30 (3), e248775 (2022).
De Martinez, J. JBrowse jupyter: a python interface to JBrowse 2. Bioinformatics 39 (1), btad032 (2023).
Brouard, J. S. & Bissonnette, N. Variant calling from RNA-seq data using the GATK joint genotyping workflow, in Variant Calling: Methods and Protocols. Springer. 205–233.(eds Charlotte Ng) (2012).
Kendig, K. I. et al. Sentieon DNASeq variant calling workflow demonstrates strong computational performance and accuracy. Front. Genet. 10, 736 (2019).
Asali, M. & Asali, M. Goldfinger bypassing and En bloc stapling without dissection of renal vessels during laparoscopic nephrectomy. Arch. Ital. Urol. Androl. 94 (4), 380–383 (2022).
Schrag, D. et al. Blood-based tests for multicancer early detection (PATHFINDER): a prospective cohort study. Lancet 402 (10409), 1251–1260 (2023).
Bellare, M. et al. A concrete security treatment of symmetric encryption. in Proceedings 38th annual symposium on foundations of computer science. IEEE. (1997).
Robbins, H. & Monro, S. A stochastic approximation method. : pp. 400–407. (1951).
Shokri, R. & Shmatikov, V. Privacy-preserving deep learning. in Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. (2015).
Fang, H. & Qian, Q. Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet. 13 (4), 94 (2021).
Messaoud, A. A. et al. Gradsec: A tee-based scheme against federated learning inference attacks. in Proceedings of the First Workshop on Systems Challenges in Reliable and Secure Federated Learning. (2021).
Ma, J. et al. Privacy-preserving federated learning based on multi‐key homomorphic encryption. Int. J. Intell. Syst. 37 (9), 5880–5901 (2022).
Cai, Y. et al. SecFed: A secure and efficient federated learning based on Multi-Key homomorphic encryption. IEEE Trans. Dependable Secur. Comput. 21 (4), 3817–3833 (2024).
Barker, E. & Barker, W. Recommendation for Key Management, Part 2: Best Practices for Key Management Organization (National Institute of Standards and Technology, 2018).
Park, J., Yu, N. Y. & Lim, H. Privacy-preserving federated learning using homomorphic encryption with different encryption keys. in 13th International Conference on Information and Communication Technology Convergence (ICTC). 2022. IEEE. 2022. IEEE. (2022).
Cheon, J. H. et al. Homomorphic encryption for arithmetic of approximate numbers. in International conference on the theory and application of cryptology and information security. Springer. (2017).
McMahan, B. et al. Communication-efficient Learning of Deep Networks from Decentralized data. In Artificial Intelligence and Statistics (PMLR, 2017).
Schaffer, A. L., Dobbins, T. A. & Pearson, S. A. Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: a guide for evaluating large-scale health interventions. BMC Med. Res. Methodol. 21 (1), 58 (2021).
Chung, J. et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. (2014).
Vaswani, A. et al. Attention is all you need. (2017).
Funding
This work was supported by the Noncommunicable Chronic Diseases-National Science and Technology Major Project (No. 2024ZD0532900), the National Natural Science Foundation of China (No. 62372316), and the Sichuan Science and Technology Program Key Project (Nos. 2024YFHZ0091, 2025YFHZ0066).
Author information
Authors and Affiliations
Contributions
Xinhang Wang: investigation, experiment and writing; Yuncheng Jiang: writing and revision; Guangming Pan: validation, investigation and data collection; Zhen Luo: Supervision and data collection; Ming Xiao: supervision; Li Yang: supervision; Xiaoqiu Shi: supervision; Ying Huo: supervision; Mianyang Li: supervision; Le Zhang (Corresponding Author): conceptualization, supervision and writing.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, X., Jiang, Y., Pan, G. et al. A data privacy protection method for infectious disease prediction models with balanced training speed and accuracy. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38906-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-38906-9