Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
A multimodal embedding model for sepsis data representation
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 23 February 2026

A multimodal embedding model for sepsis data representation

  • Tuo Liu1 na1,
  • Yonglin Li2,3,4 na1,
  • Hongyi Chen1,
  • Naiqing Li1,
  • Yan Zhang5,6,
  • Xuanqi Huang1,
  • Jin Wang2,3,4,
  • Rui Chen3,4,
  • Yuping Zeng7,
  • Yuntao Liu3,8,
  • Danwen Zheng3,4,
  • Darong Wu3,8,
  • Changdong Wang1,
  • Tao Yu5,
  • Xiaotu Xi3,4 &
  • …
  • Zhongde Zhang3,8 

npj Digital Medicine , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Biomarkers
  • Computational biology and bioinformatics
  • Diseases
  • Mathematics and computing
  • Medical research

Abstract

Sepsis research has long been constrained by limited labeled data and models designed for specific tasks that primarily rely on tabular inputs, overlooking the valuable insights contained in clinical text. To address these limitations, we propose the Sepsis Data Representation Model (SepsisDRM), an embedding model that jointly processes tabular and textual data to capture comprehensive patient representations. Trained on a dataset comprising 19,526 sepsis patients, SepsisDRM demonstrates strong generalization across diverse sepsis-related tasks without task-specific tuning. It effectively stratifies patients into four clinically interpretable phenotypes and achieves robust performance in predicting 28-day outcomes, with AUC scores of 0.92, 0.94, and 0.78 on retrospective, prospective, and external datasets, respectively. As the first embedding model developed specifically for sepsis, SepsisDRM establishes a novel paradigm for sepsis research and offers a promising approach for studies in other fields that involve the integration of both tabular and textual data.

Data availability

GDHCM dataset used to train SepsisDRM, and GDHCM retrospective dataset, GDHCM prospective dataset, SYSMH external validation dataset used to test SepsisDRM, are not publicly available due to its potentially identifiable nature.

Code availability

To ensure long-term accessibility and facilitate reproducibility, the complete source code of SepsisDRM, along with the synthetically generated toy datasets and pre-trained model weights, has been archived on Zenodo with the persistent identifier https://doi.org/10.5281/zenodo.17828465. The repository includes detailed documentation and environment configuration files.

References

  1. Evans, L. et al. Surviving sepsis campaign: International guidelines for management of sepsis and septic shock 2021. Intensiv. Care Med. 47, 1181–1247 (2021).

    Google Scholar 

  2. Levy, M. M. et al. 2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference. Crit. Care Med. 31, 1250 (2003).

    Google Scholar 

  3. Rudd, K. E. et al. Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the Global Burden of Disease Study. Lancet (Lond., Engl.) 395, 200–211 (2020).

    Google Scholar 

  4. Vincent, J.-L. et al. Sepsis in European intensive care units: results of the SOAP study. Crit. Care Med. 34, 344–353 (2006).

    Google Scholar 

  5. Leligdowicz, A. et al. Association between source of infection and hospital mortality in patients who have septic shock. Am. J. Respir. Crit. Care Med. 189, 1204–1213 (2014).

    Google Scholar 

  6. Antonucci, E. et al. Myocardial depression in sepsis: from pathogenesis to clinical manifestations and treatment. J. Crit. Care 29, 500–511 (2014).

    Google Scholar 

  7. Seymour, C. W. et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. JAMA 321, 2003–2017 (2019).

    Google Scholar 

  8. Zhang, Z. et al. Exploring disease axes as an alternative to distinct clusters for characterizing sepsis heterogeneity. Intensiv. Care Med. 49, 1349–1359 (2023).

    Google Scholar 

  9. Guo, F. et al. Clinical applications of machine learning in the survival prediction and classification of sepsis: coagulation and heparin usage matter. J. Transl. Med. 20, 265 (2022).

    Google Scholar 

  10. Yan, F. et al. Association between the stress hyperglycemia ratio and 28-day all-cause mortality in critically ill patients with sepsis: a retrospective cohort study and predictive model establishment based on machine learning. Cardiovasc. Diabetol. 23, 163 (2024).

    Google Scholar 

  11. Ibarra-Estrada, M. et al. Early adjunctive methylene blue in patients with septic shock: a randomized controlled trial. Crit. Care 27, 110 (2023).

    Google Scholar 

  12. Gabarre, P. et al. Albumin versus saline infusion for sepsis-related peripheral tissue hypoperfusion: a proof-of-concept prospective study. Crit. Care 28, 43 (2024).

    Google Scholar 

  13. Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

    Google Scholar 

  14. Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).

    Google Scholar 

  15. Pai, S. et al. Foundation model for cancer imaging biomarkers. Nat. Mach. Intell. 6, 354–367 (2024).

    Google Scholar 

  16. Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).

    Google Scholar 

  17. Christensen, M., Vukadinovic, M. & Yuan, N. Vision-language foundation model for echocardiogram interpretation. Nat. Med. 30, 1481–1488 (2024).

    Google Scholar 

  18. Desautels, T. et al. Prediction of early unplanned intensive care unit readmission in a UK tertiary care hospital: a cross-sectional machine learning approach. BMJ Open 7, e017199 (2017).

    Google Scholar 

  19. Shashikumar, S. P., Shah, A. J., Li, Q., Clifford, G. D. & Nemati, S. A deep learning approach to monitoring and detecting atrial fibrillation using wearable technology. In 2017 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), 141–144 (IEEE, 2017).

  20. Huang, K., Altosaar, J. & Ranganath, R. Clinicalbert: Modeling clinical notes and predicting hospital readmission. arXiv preprint arXiv:1904.05342 (2019).

  21. Alsentzer, E. et al. Publicly available clinical bert embeddings. In Proceedings of the 2nd Clinical Natural Language Processing Workshop, 72–78 (2019).

  22. Cheerla, A. & Gevaert, O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics 35, i446–i454 (2019).

    Google Scholar 

  23. Kline, A. et al. Multimodal machine learning in precision health: a scoping review. NPJ Digit. Med. 5, 171 (2022).

    Google Scholar 

  24. Fleuren, L. M. et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensiv. Care Med. 46, 383–400 (2020).

    Google Scholar 

  25. Zuin, G. et al. Prediction of SARS-CoV-2 positivity from million-scale complete blood counts using machine learning. Commun. Med. 2, 72 (2022).

    Google Scholar 

  26. Liu, Y. et al. Roberta: a robustly optimized BERT pretraining approach. arXiv:1907.11692 (2019).

  27. Song, Y. et al. Xuebijing injection versus placebo for critically ill patients with severe community-acquired pneumonia: a randomized controlled trial. Crit. Care Med. 47, e735–e743 (2019).

    Google Scholar 

  28. Li, C. et al. The current evidence for the treatment of sepsis with xuebijing injection: bioactive constituents, findings of clinical studies and potential mechanisms. J. Ethnopharmacol. 265, 113301 (2021).

    Google Scholar 

  29. Sinha, P. et al. Identifying molecular phenotypes in sepsis: an analysis of two prospective observational cohorts and secondary analysis of two randomised controlled trials. Lancet Respir. Med. 11, 965–974 (2023).

    Google Scholar 

  30. G, E. et al. Sepsis-induced endothelial dysfunction drives acute-on-chronic liver failure through angiopoietin-2-HGF-C/EBPβ pathway. Hepatology (Baltimore, MD) 78, (2023) https://pubmed.ncbi.nlm.nih.gov/36943063/.

  31. Cheng, C. et al. Pharmacologically significant constituents collectively responsible for anti-sepsis action of XueBiJing, a Chinese herb-based intravenous formulation. Acta Pharmacol. Sin. 45, 1077–1092 (2024).

    Google Scholar 

  32. Rey, C. et al. Procalcitonin and c-reactive protein as markers of systemic inflammatory response syndrome severity in critically ill children. Intensiv. Care Med. 33, 477–484 (2007).

    Google Scholar 

  33. Pierrakos, C. & Vincent, J.-L. Sepsis biomarkers: a review. Crit. Care 14, R15 (2010).

    Google Scholar 

  34. Cawley, G. C. & Talbot, N. L. On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010).

    Google Scholar 

  35. Kenward, M. G. & Carpenter, J. Multiple imputation: current perspectives. Stat. Methods Med. Res. 16, 199–218 (2007).

    Google Scholar 

  36. MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability Vol. 1 (eds Le Cam, L. M. & Neyman, J.), 281–297 (University of California Press, 1967).

  37. Ward Jr, J. H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963).

    Google Scholar 

  38. Ng, A. Y., Jordan, M. I. & Weiss, Y. On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. (NeurIPS) 14, 849–856 (2002).

    Google Scholar 

  39. Reynolds, D. A. Gaussian mixture models. In Encyclopedia of Biometrics (eds Li, S. Z. & Jain, A.), 659–663 (Springer US, 2009).

  40. Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. of the Second International Conference on Knowledge Discovery and Data Mining (KDD) (eds Simoudis, E., Han, J. & Fayyad, U. M.), 226–231 (AAAI Press, 1996).

  41. Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).

    Google Scholar 

  42. Caliński, T. & Harabasz, J. A dendrite method for cluster analysis. Commun. Stat.-theory Methods 3, 1–27 (1974).

    Google Scholar 

  43. Davies, D. L. & Bouldin, D. W. A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1, 224–227 (1979).

    Google Scholar 

  44. Hosmer Jr, D. W., Lemeshow, S. & Sturdivant, R. X. Applied Logistic Regression (John Wiley & Sons, 2013).

  45. Gorishniy, Y., Rubachev, I., Khrulkov, V. & Babenko, A. Revisiting deep learning models for tabular data. Adv. Neural Inf. Process. Syst. 34, 18932–18943 (2021).

    Google Scholar 

  46. Salton, G. & Buckley, C. Term-weighting approaches in automatic text retrieval. In Information Processing & Management Vol. 24, 513–523 (Elsevier, 1988).

  47. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proc. of NAACL-HLT, 4171–4186 (2019).

  48. Beltagy, I., Peters, M. E. & Cohan, A. Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020).

  49. Zaheer, M. et al. Big Bird: transformers for longer sequences. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.), 17283–17297 (Curran Associates, Inc., 2020).

  50. Ngiam, J. et al. Multimodal deep learning. In Proc. of the 28th International Conference on Machine Learning (ICML) (eds Getoor, L. & Scheffer, T.) 689–696 (International Machine Learning Society, 2011).

  51. Baltrušaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2019).

    Google Scholar 

  52. Tsai, Y.-H. H. et al. Multimodal transformer for unaligned multimodal language sequences. In Proc. of ACL (eds Korhonen, A., Traum, D. & Márton, G.) 6558–6569 (Association for Computational Linguistics, 2019).

  53. Lu, J., Batra, D., Parikh, D. & Lee, S. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) (Curran Associates, Inc., 2019).

  54. Alberti, C. et al. Fusion of detected objects in text for visual question answering. In Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP-IJCNLP) (eds Inui, K. et al.) 2131–2140 (Association for Computational Linguistics, 2019).

  55. Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR) (eds Bach, F. & Blei, D.) (OpenReview.net, 2019).

  56. Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proc. of the IEEE International Conference on Computer Vision (ICCV) (eds Venice, G. et al.) 2980–2988 (IEEE Computer Society, 2017).

  57. Gao, T., Yao, X. & Chen, D. Simcse: Simple contrastive learning of sentence embeddings. In Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Moens, M., Huang, X., Specia, L. & Yih, S. W.) 6894–6910 (Association for Computational Linguistics, 2021).

  58. Lilliefors, H. W. On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J. Am. Stat. Assoc. 62, 399–402 (1967).

    Google Scholar 

  59. Fisher, R. A. Statistical methods for research workers. In Breakthroughs in Statistics, Springer Series in Statistics (eds Kotz, S. & Johnson, N. L.) 66–70 (Springer, 1992).

  60. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodological) 57, 289–300 (1995).

    Google Scholar 

  61. Kruskal, W. H. & Wallis, W. A. Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47, 583–621 (1952).

    Google Scholar 

  62. Dunn, O. J. Multiple comparisons using rank sums. Technometrics 6, 241–252 (1964).

    Google Scholar 

  63. Paszke, A. et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, Vol. 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada 8024–8035 (2019).

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2024YFA1011900), Science and Technology Program of Guangzhou, China (2024A03J1188), Guangdong Provincial Key Laboratory of Research on Emergency in TCM (2023B1212060062), National Natural Science Foundation of China (82374392), the Incubation Program for the Science and Technology Development of Chinese Medicine Guangdong Laboratory (HQL2024PZ022), National Major Projects for Science and Technology Development (2025ZD01903002), and Guangdong Healthcare Talent Development Project (0720240226). The authors sincerely thank all clinicians and data management staff involved in this study for their valuable assistance. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Author notes
  1. These authors contributed equally: Tuo Liu, Yonglin Li.

Authors and Affiliations

  1. School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China

    Tuo Liu, Hongyi Chen, Naiqing Li, Xuanqi Huang & Changdong Wang

  2. Guangzhou University of Chinese Medicine, Guangzhou, China

    Yonglin Li & Jin Wang

  3. The Second Affiliated Hospital of Guangzhou University of Chinese Medicine (Guangdong Provincial Hospital of Chinese Medicine), Guangzhou, China

    Yonglin Li, Jin Wang, Rui Chen, Yuntao Liu, Danwen Zheng, Darong Wu, Xiaotu Xi & Zhongde Zhang

  4. Guangdong Provincial Key Laboratory of Research on Emergency in TCM, Guangzhou, China

    Yonglin Li, Jin Wang, Rui Chen, Danwen Zheng & Xiaotu Xi

  5. Department of Emergency Medicine, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China

    Yan Zhang & Tao Yu

  6. Institute of Cardiopulmonary Cerebral Resuscitation, Sun Yat-Sen University, Guangzhou, China

    Yan Zhang

  7. Information Management Office, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine (Guangdong Provincial Hospital of Chinese Medicine), Guangzhou, China

    Yuping Zeng

  8. State Key Laboratory of Traditional Chinese Medicine Syndrome, Guangzhou, China

    Yuntao Liu, Darong Wu & Zhongde Zhang

Authors
  1. Tuo Liu
    View author publications

    Search author on:PubMed Google Scholar

  2. Yonglin Li
    View author publications

    Search author on:PubMed Google Scholar

  3. Hongyi Chen
    View author publications

    Search author on:PubMed Google Scholar

  4. Naiqing Li
    View author publications

    Search author on:PubMed Google Scholar

  5. Yan Zhang
    View author publications

    Search author on:PubMed Google Scholar

  6. Xuanqi Huang
    View author publications

    Search author on:PubMed Google Scholar

  7. Jin Wang
    View author publications

    Search author on:PubMed Google Scholar

  8. Rui Chen
    View author publications

    Search author on:PubMed Google Scholar

  9. Yuping Zeng
    View author publications

    Search author on:PubMed Google Scholar

  10. Yuntao Liu
    View author publications

    Search author on:PubMed Google Scholar

  11. Danwen Zheng
    View author publications

    Search author on:PubMed Google Scholar

  12. Darong Wu
    View author publications

    Search author on:PubMed Google Scholar

  13. Changdong Wang
    View author publications

    Search author on:PubMed Google Scholar

  14. Tao Yu
    View author publications

    Search author on:PubMed Google Scholar

  15. Xiaotu Xi
    View author publications

    Search author on:PubMed Google Scholar

  16. Zhongde Zhang
    View author publications

    Search author on:PubMed Google Scholar

Contributions

T.L.: study conceptualization and design, construction of model, technical implementation, data analysis, statistical analysis, manuscript drafting; Y.L.: study conceptualization and design, data preparation, resources, statistical analysis, manuscript drafting; H.C.: study conceptualization and design, construction of model, data analysis; N.L.: study conceptualization and design, data preparation, statistical analysis; Y.Z., X.H.: data analysis, statistical analysis; J.W., R.C., Y.Z., and Y.L.: resources; D.Z.: data analysis; D.W.: resources; C.W.: study conceptualization and design, construction of model, manuscript drafting; T.Y. and X.X.: study conceptualization and design, data preparation, resources; Z.Z.: study conceptualization and design, data preparation, resources, manuscript drafting. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Changdong Wang, Tao Yu, Xiaotu Xi or Zhongde Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Li, Y., Chen, H. et al. A multimodal embedding model for sepsis data representation. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02446-3

Download citation

  • Received: 30 June 2025

  • Accepted: 07 February 2026

  • Published: 23 February 2026

  • DOI: https://doi.org/10.1038/s41746-026-02446-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Multimodal AI for Digital Medicine

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research