Abstract
We address a critical clinical gap in real-world kidney transplantation (KT), the long-standing disconnect between structured longitudinal follow-up and text-defined clinical rules, which often leads to inconsistent reporting, poor policy compliance, and non-reproducible outcomes across centers. To resolve this, we introduce KT-LLM, a verifiable orchestration layer that bridges sequence modeling with policy and terminology-aware reasoning, tailoring explicitly to KT clinical workflows. KT-LLM ensures clinical decision-making is grounded in authority by constraining knowledge access to Banff kidney allograft pathology references, OPTN, and SRTR policy documents via retrieval-augmented generation. This design anchors answers and computable checklists to versioned sources, enabling full auditability and reducing subjective interpretation errors. The system coordinates three clinically focused, auditable agents: (i) Agent-A (SRTR-MambaSurv): Optimizes discrete-time survival and competing risk prediction from TRF-aligned trajectories via a linear-time inference backbone to personalize follow-up scheduling; (ii) Agent-B (OPTN-BlackClust): identifies clinically distinct population subtypes using stable deep embedded clustering, supporting individualized treatment stratification; (iii) Agent-C (Policy-Ops): encodes OPTN and UNOS submission timelines, SRTR reporting cadence, and Banff terminology into executable rules, returning pass, warn and fail outcomes with versioned evidence to ensure policy compliance. On de-identified OPTN and UNOS cohorts, KT-LLM outperformed strong baselines in evidence attribution and predictive calibration. Critically, it retained the ability to surface clinically distinct subgroups among Black recipients, which aligns with prior reports of outcome heterogeneity, while avoiding overgeneralization of claims beyond the analyzed window. This supports equitable subgroup analysis while avoiding clinical overreach. By anchoring reasoning and outputs to versioned policies and terminology, KT-LLM transforms the model to govern KT workflows into an auditable, clock-synchronized process. This offers a practical solution to enhance reproducibility, monitor fairness across centers and eras, and standardize clinical practice, addressing unmet needs for scalable, reliable KT care in real-world settings.
Similar content being viewed by others
Data availability
Registry files for numerical modeling: (1) SRTR Standard Analysis Files (SAFs): https://www.srtr.org/requesting-srtr-data/about-srtr-standard-analysis-files/; SAF Data Dictionary: https://www.srtr.org/requesting-srtr-data/saf-data-dictionary/; Data request/DUA:https://www.srtr.org/requesting-srtr-data/data-requests/. (2) OPTN STAR files: overview/request page https://optn.transplant.hrsa.gov/data/view-data-reports/request-data/; STAR File Data Dictionary (xlsx):https://optn.transplant.hrsa.gov/media/1swp2gge/star-file-data-dictionary.xlsx. Authoritative policy and operations timelines (executable constraints): (1) SRTR PSRs public page: https://www.srtr.org/reports/program-specific-reports/. (2) PSR reporting timeline (cadence): https://www.srtr.org/reports/psr-reporting-timeline/. Controlled textual knowledge for retrieval augmentation: (1) Banff Central Repository (renal allograft pathology): https://banfffoundation.org/central-repository-for-banff-classification-resources-3/. (2) OPTN Policies main page: https://optn.transplant.hrsa.gov/policies-bylaws/policies/; Current OPTN Policies (PDF): https://optn.transplant.hrsa.gov/media/eavh5bf3/optnpolicies.pdf. (3) Race-neutral eGFR (policy background & FAQs): https://optn.transplant.hrsa.gov/policies-bylaws/a-closer-look/waiting-time-modifications-for-candidates-affected-by-race-inclusive-egfr-calculations/for-professionals-faqs-about-egfr-waiting-time-modifications/. (4) SRTR methodological notes (PSR technical methods): https://www.srtr.org/about-the-data/technical-methods-for-the-program-specific-reports/. This study's experiments were conducted in a Python 3.10 environment using the PyTorch framework (v2.2, CUDA 12.0, cuDNN 8.9) running on 4 NVIDIA A100 GPUs (80 GB) within a Linux system. The Mamba backbone for vertical modeling relies on mamba-ssm (v1.1.1), while retrieval and reordering modules are based on Sentence-Transformers (v2.7.0). Clustering-related workflows are built using scikit-learn (v1.3.2) and custom PyTorch modules. Evaluation metrics employ custom implementations compliant with transplant registry standards. Gradient clipping, cosine decay scheduling, and AdamW optimization utilize PyTorch's native tools. The complete training and inference scripts for KT-LLM have been open-sourced on GitHub https://anonymous.4open.science/r/KT-LLM_v1-7F53/README.md.
Code availability
This study’s experiments were conducted in a Python 3.10 environment using the PyTorch framework (v2.2, CUDA 12.0, cuDNN 8.9) running on 4 NVIDIA A100 GPUs (80 GB) within a Linux system. The Mamba backbone for vertical modeling relies on mamba-ssm (v1.1.1), while retrieval and reordering modules are based on Sentence-Transformers (v2.7.0). Clustering-related workflows are built using scikit-learn (v1.3.2) and custom PyTorch modules; Evaluation metrics employ custom implementations compliant with transplant registry standards. Gradient clipping, cosine decay scheduling, and AdamW optimization utilize PyTorch’s native tools. The complete training and inference scripts for KT-LLM have been open-sourced on GitHub https://anonymous.4open.science/r/KT-LLM_v1-7F53/README.md.
References
Leppke, S. et al. Scientific registry of transplant recipients: collecting, analyzing, and reporting data on transplantation in the United States. Transplant. Rev. 27, 50–56 (2013).
Spadaccini, N., Hall, S. R. & Castleden, I. R. Relational expressions in star file dictionaries. J. Chem. Inf. Comput. Sci. 40, 1289–1301 (2000).
Fine, J. P. & Gray, R. J. A proportional hazards model for the subdistribution of a competing risk. J. Am. Stat. Assoc. 94, 496–509 (1999).
Roufosse, C. et al. A 2018 reference guide to the Banff classification of renal allograft pathology. Transplantation 102, 1795–1814 (2018).
Loupy, A. et al. The Banff 2019 kidney meeting report (i): updates on and clarification of criteria for T cell–and antibody-mediated rejection. Am. J. Transplant. 20, 2318–2331 (2020).
Naesens, M. et al. The Banff 2022 kidney meeting report: reappraisal of microvascular inflammation and the role of biopsy-based transcript diagnostics. Am. J. Transplant. 24, 338–349 (2024).
Israni, A. Optn/srtr 2020 annual data report: introduction. Am. J. Transplant. 22, 11–20 (2022).
Gupta, A. et al. Program-specific reports: a guide to the debate. Transplantation 99, 1109–1112 (2015).
Scientific Registry Of Transplant Recipients. Technical Methods for the Program-Specific Reports (SRTR, 2022).
Myaskovsky, L. et al. Kidney transplant fast track and likelihood of waitlisting and transplant: a nonrandomized clinical trial. JAMA Intern. Med. 185, 499–509 (2025).
Singh, T. P. et al. Graft survival in primary thoracic organ transplant recipients: A special report from the International Thoracic Organ Transplant Registry of the International Society for Heart and Lung Transplantation. J. Heart Lung Transplant. 42, 1321–1333 (2023).
VanWagner, L. B. & Skaro, A. I. Program-specific reports: implications and impact on program behavior. Curr. Opin. Organ Transplant. 18, 210–215 (2013).
Loupy, A., Mengel, M. & Haas, M. Thirty years of the international banff classification for allograft pathology: the past, present, and future of kidney transplant diagnostics. Kidney Int. 101, 678–691 (2022).
Haas, M. et al. The Banff 2017 kidney meeting report: Revised diagnostic criteria for chronic active T cell-mediated rejection, antibody-mediated rejection, and prospects for integrative endpoints for next-generation clinical trials. Am. J. Transplant. 18, 293–307 (2018).
Farris, A. B. et al. Banff digital pathology working group: going digital in transplant pathology. Am. J. Transplant. 20, 2392–2399 (2020).
Farris, A. B. et al. Banff digital pathology working group: image bank, artificial intelligence algorithm, and challenge trial developments. Transpl. Int. 36, 11783 (2023).
Delgado, C. et al. A unifying approach for GFR estimation: recommendations of the NKF-ASN task force on reassessing the inclusion of race in diagnosing kidney disease. J. Am. Soc. Nephrol. 32, 2994–3015 (2021).
Inker, L. A. et al. New creatinine-and cystatin C-based equations to estimate GFR without race. N. Engl. J. Med. 385, 1737–1749 (2021).
Thongprayoon, C. et al. Use of machine learning consensus clustering to identify distinct subtypes of black kidney transplant recipients and associated outcomes. JAMA Surg. 157, e221286–e221286 (2022).
For Organ Sharing (UNOS), U. N. et al. Implementation Notice: Requirement for Race-Neutral eGFR Formulas in Effect (UNOS, 2023).
Fallahzadeh, M. A. et al. Performance of race-neutral eGFR equations in patients with decompensated cirrhosis. Liver Transplant. 31, 170–180 (2025).
Procurement, O. & Network, T. Modify Waiting Time for Candidates Affected by Race-Inclusive Estimated Glomerular Filtration Rate (eGFR) Calculations (HRSA, 2023).
Procurement, O. & Network, T. Waiting Time Modifications for Candidates Affected by Race-Inclusive eGFR Calculations (HRSA, 2024).
Cox, D. R. Regression models and life-tables. J. R. Stat. Soc. Ser. B 34, 187–202 (1972).
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests (2008).
Lee, C., Zame, W., Yoon, J. & Van Der Schaar, M. Deephit: a deep learning approach to survival analysis with competing risks. In Proc. the AAAI Conference on Artificial Intelligence, Vol. 32 (PKP Publishing Services Network, 2018).
Katzman, J. L. et al. Deepsurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med. Res. Methodol. 18, 24 (2018).
Heagerty, P. J., Lumley, T. & Pepe, M. S. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56, 337–344 (2000).
Heagerty, P. J. & Zheng, Y. Survival model predictive accuracy and ROC curves. Biometrics 61, 92–105 (2005).
Graf, E., Schmoor, C., Sauerbrei, W. & Schumacher, M. Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18, 2529–2545 (1999).
Gerds, T. A. & Schumacher, M. Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biometrical J. 48, 1029–1040 (2006).
Vaswani, A. et al. Attention is all you need. Advances in Neural Information Processing Systems 30 (NIPS, 2017).
Dai, Z. et al. Transformer-xl: attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL, 2019).
Zaheer, M. et al. Big Bird: transformers for longer sequences. Comput. Sci. 33, 17283–17297 (2020).
Choromanski, K. et al. Rethinking attention with performers. The 9th International Conference on Learning Representations (ICLR, 2021).
Gu, A. & Dao, T. Mamba: linear-time sequence modeling with selective state spaces. In First Conference on Language Modeling (COLM, 2024).
Gu, A., Goel, K. & Ré, C. Efficiently modeling long sequences with structured state spaces. The 10th International Conference on Learning Representations (ICLR 2022).
Lewis, P. et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Comput. Sci. 33, 9459–9474 (2020).
Karpukhin, V. et al. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP, 2020).
Izacard, G. & Grave, E. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL, 2021).
Petroni, F. et al. Kilt: a benchmark for knowledge intensive language tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL, 2021).
Borgeaud, S. et al. Improving language models by retrieving from trillions of tokens. In International Conference on Machine Learning, 2206–2240 (PMLR, 2022).
Hubert, L. & Arabie, P. Comparing partitions. J. Classif. 2, 193–218 (1985).
Vinh, N., Epps, J. & Bailey, J. Information theoretic measures for clusterings comparison: Variants, Properties, normalization and correction for chance. J. Mach. Learn. Res. 18, 2837–2854 (2009).
Robertson, S., Zaragoza, H. et al. The probabilistic relevance framework: BM25 and beyond. Found. Trends® Inf. Retr. 3, 333–389 (2009).
Izacard, G. et al. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research (TMLR, 2022).
Wang, L. et al. Text embeddings by weakly-supervised contrastive pre-training. Preprint at https://arxiv.org/abs/2212.03533 (2022).
Wang, L. et al. Improving text embeddings with large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024).
Santhanam, K., Khattab, O., Saad-Falcon, J., Potts, C. & Zaharia, M. Colbertv2: effective and efficient retrieval via lightweight late interaction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL, 2022).
Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 1–23 (2021).
Luo, R. et al. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Brief. Bioinforma. 23, bbac409 (2022).
Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat. Med. 31, 943–950 (2025).
Pradeep, R. et al. Squeezing water from a stone: a bag of tricks for further improving cross-encoder effectiveness for reranking. In European Conference on Information Retrieval 655–670 (Springer, 2022).
Hu, E. J. et al. Lora: Low-rank adaptation of large language models. ICLR 1, 3 (2022).
Prentice, R. L. & Gloeckler, L. A. Regression analysis of grouped survival data with application to breast cancer data. Biometrics 57–67 (1978).
Putter, H., Fiocco, M. & Geskus, R. B. Tutorial in biostatistics: competing risks and multi-state models. Stat. Med. 26, 2389–2430 (2007).
Andersen, P. K., Geskus, R. B., de Witte, T. & Putter, H. Competing risks in epidemiology: possibilities and pitfalls. Int. J. Epidemiol. 41, 861–870 (2012).
Lee, C., Yoon, J. & Van Der Schaar, M. Dynamic-deephit: a deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Trans. Biomed. Eng. 67, 122–133 (2019).
Binder, H., Allignol, A., Schumacher, M. & Beyersmann, J. Boosting for high-dimensional time-to-event data with competing risks. Bioinformatics 25, 890–896 (2009).
Acknowledgements
We sincerely appreciate the indispensable technical support provided by Qichuang Era Technology Co., Ltd. throughout the development cycle of our KT-LLM model. This study was supported by Noncommunicable Chronic Diseases-National Science and Technology Major Project (grant number: 2025ZD0547500), the National Natural Science Foundation of China (grant numbers: 82200843 and 82270783), NSFC Incubation Project of Guangdong Provincial People's Hospital (grant number: KY0120220048), Science and Technology Projects in Guangzhou (grant numbers: 2023B03J1250, 2025A03J4431).
Author information
Authors and Affiliations
Contributions
H.Z., Z.L., and K.H. contributed equally to this work, having full access to all study data and assuming responsibility for the integrity and accuracy of the analyses (validation and formal analysis). H.Z., W.Z., and Z.K. conceptualized the study, designed the methodology, and participated in securing research funding (conceptualization, methodology, and funding acquisition). Z.L. and J.D. carried out data acquisition, curation, and investigation (investigation and data curation) and provided key resources, instruments, and technical support (resources and software). K.H. and Q.D. drafted the initial manuscript and generated visualizations (writing—original draft and visualization). Q.S. supervised the project, coordinated collaborations, and ensured administrative support (supervision and project administration). All authors contributed to reviewing and revising the manuscript critically for important intellectual content (writing—review and editing) and approved the final version for submission.
Corresponding author
Ethics declarations
Competing interests
All authors declare no financial or non-financial competing interests relevant to this work.
Consent for publication
Not applicable. This study exclusively utilizes de-identified datasets from public repositories.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zheng, H., Luo, Z., He, K. et al. KT-LLM: an evidence-grounded and sequence text framework for auditable kidney transplant modeling. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-025-02323-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-025-02323-5


