Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
A data privacy protection method for infectious disease prediction models with balanced training speed and accuracy
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 05 February 2026

A data privacy protection method for infectious disease prediction models with balanced training speed and accuracy

  • Xinhang Wang1 na1,
  • Yuncheng Jiang2,3 na1,
  • Guangming Pan4,
  • Zhen Luo4,
  • Ming Xiao1,
  • Li Yang5,
  • Xiaoqiu Shi6,7,8,
  • Ying Huo9,
  • Mianyang Li10 &
  • …
  • Le Zhang1 

Scientific Reports , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Diseases
  • Mathematics and computing

Abstract

The application of deep learning technologies in constructing infectious disease prediction models has significantly enhanced public health strategies; however, the imperative for medical data privacy often prevents institutions from sharing diverse datasets, leading to data silos and diminished predictive accuracy. To address these challenges, we propose a multi-layered privacy-preserving framework that balances security and computational performance. First, we introduce a Random Transmission Hybrid Homomorphic algorithm that integrates CKKS fully homomorphic encryption with Paillier semi-homomorphic mechanisms, optimized by a random transmission sequence. Experimental evaluations demonstrate that this hybrid approach achieves a 25% improvement in computational and communication efficiency compared to conventional homomorphic encryption methods by reducing ciphertext overhead and skipping redundant update cycles. Second, we developed the Data Selection-Distributed Selection Stochastic Gradient Descent (DS-DSSGD) algorithm to optimize the trade-off between training speed and predictive accuracy. By filtering insignificant gradient updates and focusing on high-contribution features, the DS-DSSGD algorithm ensures high model precision even under the increased computational demands of privacy-preserving technologies. Finally, these innovations are integrated into the XDP Privacy Data Sharing Platform, providing a secure environment for end-to-end data lifecycle management. Collectively, our results indicate that the proposed framework not only safeguards sensitive health information but also maintains the high-precision forecasting capabilities essential for effective epidemic response.

Data availability

The data that support the findings of this study are available from [BaseBit] but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of [BaseBit].

References

  1. Dembek, Z. F., Chekol, T. & Wu, A. Best practice assessment of disease modelling for infectious disease outbreaks. Epidemiol. Infect. 146 (10), 1207–1215 (2018).

    Google Scholar 

  2. Fan, S. et al. ASTM: developing the web service for anthrax related Spatiotemporal characteristics and meteorology study. Quant. Biology. 10 (1), 67–78 (2022).

    Google Scholar 

  3. Liu, K. et al. Developing a Visual Analysis Platform of Human Rabies for Hubei Province of China (VAP-HRHB). in. IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2021. IEEE. (2021).

  4. Ma, F. et al. An integrated platform for Brucella with knowledge graph technology: from genomic analysis to epidemiological projection. Front. Genet. 13, 981633 (2022).

    Google Scholar 

  5. Tong, X. et al. Development of an Agent-Based model (ABM) to simulate the immune system and integration of a regression method to estimate the key ABM parameters by fitting the experimental data. PLoS One. 10 (11), e0141295 (2015).

    Google Scholar 

  6. Wu, W. et al. Exploring the dynamics and interplay of human papillomavirus and cervical tumorigenesis by integrating biological data into a mathematical model. BMC Bioinform. 21 (Suppl 7), 152 (2020).

    Google Scholar 

  7. Zhang, L. et al. MCDB: A comprehensive curated mitotic catastrophe database for retrieval, protein sequence alignment, and target prediction. Acta Pharm. Sin B. 11 (10), 3092–3104 (2021).

    Google Scholar 

  8. Zhou, K. et al. ONE health approach to address zoonotic brucellosis: A Spatiotemporal associations study between animals and humans. Front. Vet. Sci. 7, 521 (2020).

    Google Scholar 

  9. Gao, H. et al. Developing an Agent-Based drug model to investigate the synergistic effects of drug combinations. Molecules 22 (12), 2209 (2017).

    Google Scholar 

  10. Gao, J. et al. Boosting your context by dual similarity checkup for In-Context learning medical image segmentation. IEEE Trans. Med. Imaging. 44 (1), 310–319 (2025).

    Google Scholar 

  11. Gao, J. et al. Unsupervised cross-disease domain adaptation by lesion scale matching. in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer. (2022).

  12. Gao, J. et al. Anatomically guided Cross-Domain repair and screening for ultrasound fetal biometry. IEEE J. Biomed. Health Inf. 27 (10), 4914–4925 (2023).

    Google Scholar 

  13. Gao, J. et al. Robust needle localization and enhancement algorithm for ultrasound by deep learning and beam steering methods. J. Comput. Sci. Technol. 36 (2), 334–346 (2021).

    Google Scholar 

  14. Lai, X. et al. A disease network-based deep learning approach for characterizing melanoma. Int. J. Cancer. 150 (6), 1029–1044 (2022).

    Google Scholar 

  15. You, Y. et al. Developing a predictive platform for Salmonella antimicrobial resistance based on a large Language model and quantum computing. Engineering 48, 174–184 (2025).

    Google Scholar 

  16. Li, B. et al. DGHNN: a deep graph and hypergraph neural network for pan-cancer related gene prediction. Bioinformatics 41 (7), btaf379 (2025).

    Google Scholar 

  17. Jiang, Z. et al. Diffusion Model-Based Multi-Channel EEG representation and forecasting for early epileptic seizure warning. Interdiscip Sci., 1–12. (2025).

  18. Shahid, F., Zameer, A. & Muneeb, M. Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos Solitons Fractals. 140, 110212 (2020).

    Google Scholar 

  19. Chen, J. et al. A privacy policy text compliance reasoning framework with large Language models for healthcare services. Tsinghua Sci. Technol. 30 (4), 1831–1845 (2025).

    Google Scholar 

  20. Lu, J. et al. Dual-Modality integration attention with Graph-Based feature extraction for visual question and answering. Tsinghua Sci. Technol. 30 (5), 2133–2145 (2025).

    Google Scholar 

  21. Wang, M. et al. Towards federated learning driving technology for Privacy-Preserving Micro-Expression recognition. Tsinghua Sci. Technol. 30 (5), 2169–2183 (2025).

    Google Scholar 

  22. Jiang, Y. et al. From pretraining to privacy: federated ultrasound foundation model with self-supervised learning. NPJ Digit. Med. 8 (1), 714 (2025).

    Google Scholar 

  23. Wu, X. et al. A differential privacy DNA motif finding method based on closed frequent patterns. Cluster Comput. 22 (S2), 2907–2919 (2018).

    Google Scholar 

  24. Sathish Kumar, G. et al. Differential privacy scheme using Laplace mechanism and statistical method computation in deep neural network for privacy preservation. Eng. Appl. Artif. Intell. 128, 107399 (2024).

    Google Scholar 

  25. Kumar, G. S. & Premalatha, K. STIF: intuitionistic fuzzy Gaussian membership function with statistical transformation weight of evidence and information value for private information preservation. Distrib. Parallel Databases. 41 (3), 1–34 (2023).

    Google Scholar 

  26. Kumar, G. S. & Premalatha, K. Securing private information by data perturbation using statistical transformation with three dimensional shearing. Appl. Soft Comput. 112, 107819 (2021).

    Google Scholar 

  27. Wang, H. & Wu, X. IPP: an intelligent Privacy-Preserving scheme for detecting interactions in genome association studies. IEEE/ACM Trans. Comput. Biol. Bioinform. 20 (1), 455–464 (2023).

    Google Scholar 

  28. Wang, H. et al. An intelligent blockchain-based access control framework with federated learning for genome-wide association studies. Comput. Stand. Interfaces. 84, 103694 (2023).

    Google Scholar 

  29. Sathish Kumar, G. et al. No more privacy concern: A privacy-chain based homomorphic encryption scheme and statistical method for privacy preservation of user’s private and sensitive data. Expert Syst. Appl. 234, 121071 (2023).

    Google Scholar 

  30. Chae, S., Kwon, S. & Lee, D. Predicting infectious disease using deep learning and big data. Int. J. Environ. Res. Public. Health. 15 (8), 1596 (2018).

    Google Scholar 

  31. Alzu’bi, A. et al. A review of privacy and security of edge computing in smart healthcare systems: Issues, Challenges, and research directions. Tsinghua Sci. Technol. 29 (4), 1152–1180 (2024).

    Google Scholar 

  32. Yeh, C. K. An authentication protocol for ubiquitous health monitoring systems. J. Med. Biol. Eng. 33 (4), 415–419 (2013).

    Google Scholar 

  33. Anusuya, R., Oviya, S. & Sangavi, R. Secured data sharing of medical images for disease diagnosis using deep learning models and federated learning framework. in International Conference on Intelligent Systems for Communication, IoT and Security (ICISCoIS). 2023. IEEE. 2023. IEEE. (2023).

  34. Zaremba, W., Sutskever, I. & Vinyals, O. Recurrent neural network regularization. (2014).

  35. Miao, Y. et al. Privacy-Preserving Byzantine-Robust federated learning via blockchain systems. IEEE Trans. Inf. Forensics Secur. 17, 2848–2861 (2022).

    Google Scholar 

  36. Paillier, P. Public-key cryptosystems based on composite degree residuosity classes. in International conference on the theory and applications of cryptographic techniques. Springer. (1999).

  37. Xia, Y. et al. Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model. BMC Genom. 18 (Suppl 1), 950 (2017).

    Google Scholar 

  38. Zhang, Q. et al. Developing a physiological Signal-Based, mean threshold and Decision-Level fusion algorithm (PMD) for emotion recognition. Tsinghua Sci. Technol. 28 (4), 673–685 (2023).

    Google Scholar 

  39. You, Y. et al. Artificial intelligence in cancer target identification and drug discovery. Signal. Transduct. Target. Ther. 7 (1), 156 (2022).

    Google Scholar 

  40. Zhang, T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. in Proceedings of the twenty-first international conference on Machine learning. (2004).

  41. de Ariel, D. et al. How to perform a Meta-Analysis: A practical Step-by-Step guide using R software and Rstudio. Acta Ortop. Bras. 30 (3), e248775 (2022).

    Google Scholar 

  42. De Martinez, J. JBrowse jupyter: a python interface to JBrowse 2. Bioinformatics 39 (1), btad032 (2023).

    Google Scholar 

  43. Brouard, J. S. & Bissonnette, N. Variant calling from RNA-seq data using the GATK joint genotyping workflow, in Variant Calling: Methods and Protocols. Springer. 205–233.(eds Charlotte Ng) (2012).

  44. Kendig, K. I. et al. Sentieon DNASeq variant calling workflow demonstrates strong computational performance and accuracy. Front. Genet. 10, 736 (2019).

    Google Scholar 

  45. Asali, M. & Asali, M. Goldfinger bypassing and En bloc stapling without dissection of renal vessels during laparoscopic nephrectomy. Arch. Ital. Urol. Androl. 94 (4), 380–383 (2022).

    Google Scholar 

  46. Schrag, D. et al. Blood-based tests for multicancer early detection (PATHFINDER): a prospective cohort study. Lancet 402 (10409), 1251–1260 (2023).

    Google Scholar 

  47. Bellare, M. et al. A concrete security treatment of symmetric encryption. in Proceedings 38th annual symposium on foundations of computer science. IEEE. (1997).

  48. Robbins, H. & Monro, S. A stochastic approximation method. : pp. 400–407. (1951).

  49. Shokri, R. & Shmatikov, V. Privacy-preserving deep learning. in Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. (2015).

  50. Fang, H. & Qian, Q. Privacy preserving machine learning with homomorphic encryption and federated learning. Future Internet. 13 (4), 94 (2021).

    Google Scholar 

  51. Messaoud, A. A. et al. Gradsec: A tee-based scheme against federated learning inference attacks. in Proceedings of the First Workshop on Systems Challenges in Reliable and Secure Federated Learning. (2021).

  52. Ma, J. et al. Privacy-preserving federated learning based on multi‐key homomorphic encryption. Int. J. Intell. Syst. 37 (9), 5880–5901 (2022).

    Google Scholar 

  53. Cai, Y. et al. SecFed: A secure and efficient federated learning based on Multi-Key homomorphic encryption. IEEE Trans. Dependable Secur. Comput. 21 (4), 3817–3833 (2024).

    Google Scholar 

  54. Barker, E. & Barker, W. Recommendation for Key Management, Part 2: Best Practices for Key Management Organization (National Institute of Standards and Technology, 2018).

  55. Park, J., Yu, N. Y. & Lim, H. Privacy-preserving federated learning using homomorphic encryption with different encryption keys. in 13th International Conference on Information and Communication Technology Convergence (ICTC). 2022. IEEE. 2022. IEEE. (2022).

  56. Cheon, J. H. et al. Homomorphic encryption for arithmetic of approximate numbers. in International conference on the theory and application of cryptology and information security. Springer. (2017).

  57. McMahan, B. et al. Communication-efficient Learning of Deep Networks from Decentralized data. In Artificial Intelligence and Statistics (PMLR, 2017).

  58. Schaffer, A. L., Dobbins, T. A. & Pearson, S. A. Interrupted time series analysis using autoregressive integrated moving average (ARIMA) models: a guide for evaluating large-scale health interventions. BMC Med. Res. Methodol. 21 (1), 58 (2021).

    Google Scholar 

  59. Chung, J. et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. (2014).

  60. Vaswani, A. et al. Attention is all you need. (2017).

Download references

Funding

This work was supported by the Noncommunicable Chronic Diseases-National Science and Technology Major Project (No. 2024ZD0532900), the National Natural Science Foundation of China (No. 62372316), and the Sichuan Science and Technology Program Key Project (Nos. 2024YFHZ0091, 2025YFHZ0066).

Author information

Author notes
  1. Xinhang Wang and Yuncheng Jiang Xinhang Wang and Yuncheng Jiang contribute equally to this work and are listed as co-first authors.

Authors and Affiliations

  1. College of Computer Science, Sichuan University, Chengdu, 610065, China

    Xinhang Wang, Ming Xiao & Le Zhang

  2. Department of General Surgery & Laboratory of Gastric Cancer, State Key Laboratory of Biotherapy, Collaborative Innovation Center of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu, 610065, China

    Yuncheng Jiang

  3. Gastric Cancer Center, West China Hospital, Sichuan University, Chengdu, 610065, China

    Yuncheng Jiang

  4. BaseBit, Shanghai, 200050, China

    Guangming Pan & Zhen Luo

  5. Sansure BiotechIncorporation, Changsha, 410000, Hunan, China

    Li Yang

  6. School of Manufacturing Science and Engineering, Southwest University of Science and Technology, Mianyang Sichuan, 621010, China

    Xiaoqiu Shi

  7. Mianyang Science and Technology City Intelligent Manufacturing Industry Technology Innovation Institute, Mianyang Sichuan, 621023, China

    Xiaoqiu Shi

  8. State Key Laboratory of Intelligent Manufacturing Equipment and Technology, School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan Hubei, 430074, China

    Xiaoqiu Shi

  9. Chengdu Information Technology Co., Ltd. Of Chinese Academy of Sciences, Chengdu, 610213, China

    Ying Huo

  10. Department of Clinical Laboratory Medicine, The First Medical Center, Chinese PLA General Hospital, Beijing, China

    Mianyang Li

Authors
  1. Xinhang Wang
    View author publications

    Search author on:PubMed Google Scholar

  2. Yuncheng Jiang
    View author publications

    Search author on:PubMed Google Scholar

  3. Guangming Pan
    View author publications

    Search author on:PubMed Google Scholar

  4. Zhen Luo
    View author publications

    Search author on:PubMed Google Scholar

  5. Ming Xiao
    View author publications

    Search author on:PubMed Google Scholar

  6. Li Yang
    View author publications

    Search author on:PubMed Google Scholar

  7. Xiaoqiu Shi
    View author publications

    Search author on:PubMed Google Scholar

  8. Ying Huo
    View author publications

    Search author on:PubMed Google Scholar

  9. Mianyang Li
    View author publications

    Search author on:PubMed Google Scholar

  10. Le Zhang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Xinhang Wang: investigation, experiment and writing; Yuncheng Jiang: writing and revision; Guangming Pan: validation, investigation and data collection; Zhen Luo: Supervision and data collection; Ming Xiao: supervision; Li Yang: supervision; Xiaoqiu Shi: supervision; Ying Huo: supervision; Mianyang Li: supervision; Le Zhang (Corresponding Author): conceptualization, supervision and writing.

Corresponding author

Correspondence to Le Zhang.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Jiang, Y., Pan, G. et al. A data privacy protection method for infectious disease prediction models with balanced training speed and accuracy. Sci Rep (2026). https://doi.org/10.1038/s41598-026-38906-9

Download citation

  • Received: 27 September 2025

  • Accepted: 31 January 2026

  • Published: 05 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-38906-9

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Infectious disease
  • Deep learning
  • Privacy computing
  • Federated learning
  • Artificial intelligence
Download PDF

Associated content

Collection

Computational biology and mathematical modelling of biological systems

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on Twitter
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics