Abstract
Epitope-based vaccines are promising therapeutic modalities for infectious diseases and cancer, but identifying immunogenic epitopes is challenging. Most prediction methods only use amino acid sequence information, and do not incorporate wide-scale structure data and biochemical properties across each peptide–major histocompatibility complex (MHC). We present ImmunoStruct, a deep learning model that integrates sequence, structural and biochemical information to predict multi-allele class I peptide–MHC immunogenicity. By leveraging a multimodal dataset of 26,049 peptide–MHCs, we demonstrate that ImmunoStruct improves immunogenicity prediction performance and interpretability beyond existing methods, across infectious disease epitopes and cancer neoepitopes. We further show strong alignment with in vitro assay results for a set of SARS-CoV-2 epitopes, as well as strong performance in peptide–MHC-based survival prediction for patients with cancer. Overall, this work also presents an architecture that incorporates equivariant graph processing and multimodal data integration for a long-standing challenge in immunotherapy.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
The infectious disease data were obtained from IEDB at https://iedb.org/. The cancer neoepitope data were obtained from CEDAR at https://cedar.iedb.org/. The cancer survival data were obtained as previously described46. Data are freely available via GitHub at https://github.com/KrishnaswamyLab/ImmunoStruct.
Code availability
The source code for the ImmunoStruct model and inference scripts are available under an open-source license via GitHub at https://github.com/KrishnaswamyLab/ImmunoStruct and are available via Zenodo at https://doi.org/10.5281/zenodo.17535443 (ref. 68).
References
Gupta, M. et al. Recent advances in cancer vaccines: challenges, achievements, and futuristic prospects. Vaccines 10, 2011 (2022).
Fan, T. et al. Therapeutic cancer vaccines: advancements, challenges, and prospects. Signal Transduct. Target. Ther. 8, 450 (2023).
Blass, E. & Ott, P. A. Advances in the development of personalized neoantigen-based therapeutic cancer vaccines. Nat. Rev. Clin. Oncol. 18, 215–229 (2021).
Bjerregaard, A.-M. et al. An analysis of natural T cell responses to predicted tumor neoepitopes. Front. Immunol. 8, 1566 (2017).
Katsikis, P. D., Ishii, K. J. & Schliehe, C. Challenges in developing personalized neoantigen cancer vaccines. Nat. Rev. Immunol. 24, 213–227 (2024).
Ott, P. A. et al. A phase Ib trial of personalized neoantigen therapy plus anti-PD-1 in patients with advanced melanoma, non-small cell lung cancer, or bladder cancer. Cell 183, 347–362 (2020).
Wells, D. K. et al. Key parameters of tumor epitope immunogenicity revealed through a consortium approach improve neoantigen prediction. Cell 183, 818–834 (2020).
Sahin, U. et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 547, 222–226 (2017).
Pishesha, N., Harmand, T. J. & Ploegh, H. L. A guide to antigen processing and presentation. Nat. Rev. Immunol. 22, 751–764 (2022).
Tynan, F. E. et al. The immunogenicity of a viral cytotoxic T cell epitope is controlled by its MHC-bound conformation. J. Exp. Med. 202, 1249–1260 (2005a).
Wu, P. et al. Mechano-regulation of peptide-MHC class I conformations determines TCR antigen recognition. Mol. Cell 73, 1015–1027 (2019).
Weber, J. K. et al. Unsupervised and supervised AI on molecular dynamics simulations reveals complex characteristics of HLA-a2-peptide immunogenicity. Brief. Bioinforma. 25, bbad504 (2024).
Reynisson, B., Alvarez, B., Paul, S., Peters, B. & Nielsen, M. Netmhcpan-4.1 and netmhciipan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 48, W449–W454 (2020).
Shao, X. M. et al. High-throughput prediction of MHC class I and II neoantigens with MHCnuggets. Cancer Immunol. Res. 8, 396–408 (2020).
O’Donnell, T. J., Rubinsteyn, A. & Laserson, U. MHCflurry 2.0: improved pan-allele prediction of MHC class I-presented peptides by incorporating antigen processing. Cell Syst. 11, 42–48 (2020).
Albert, B. A. et al. Deep neural networks predict class I major histocompatibility complex epitope presentation and transfer learn neoepitope immunogenicity. Nat. Mach. Intell. 5, 861–872 (2023).
Saotome, K. et al. Structural analysis of cancer-relevant TCR-CD3 and peptide-MHC complexes by cryoEM. Nat. Commun. 14, 2401 (2023).
Jiang, D. et al. Neoapred: a deep-learning framework for predicting immunogenic neoantigen based on surface and structural features of peptide–human leukocyte antigen complexes. Bioinformatics 40, btae547 (2024).
Vita, R. et al. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Res. 47, D339–D343 (2018).
Koşaloğlu-Yalçın, Z. et al. The Cancer Epitope Database and Analysis Resource (CEDAR). Nucleic Acids Res. 51, D845–D852 (2023).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Gfeller, D. et al. Improved predictions of antigen presentation and TCR recognition with MixMHCpred2. 2 and PRIME2.0 reveal potent SARS-CoV-2 CD8+ T-cell epitopes. Cell Syst. 14, 72–83 (2023).
Kim, J. Y., Bang, H., Noh, S.-J. & Choi, J. K. DeepNeo: a webserver for predicting immunogenic neoantigens. Nucleic Acids Res. 51, W134–W140 (2023).
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In Proc. International Conference on Machine Learning 9323–9332 (PMLR, 2021).
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In Proc. 5th International Conference on Learning Representations (ICLR, 2017).
Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems 5998–6008 (NIPS, 2017).
Xiao, X. et al. HGTDP-DTA: hybrid graph-transformer with dynamic prompt for drug-target binding affinity prediction. In Proc. International Conference on Neural Information Processing 340–354 (Springer, 2024).
Kingma, D. P. Auto-encoding Variational Bayes. In Proc. International Conference on Learning Representations (ICLR, 2014).
Liao, D. et al. RNAGenScape: property-guided optimization and interpolation of mRNA sequences with manifold Langevin dynamics. Preprint at https://doi.org/10.48550/arXiv.2510.24736 (2025).
Liu, C. et al. DiffKillR: killing and recreating diffeomorphisms for cell annotation in dense microscopy images. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing 1–5 (IEEE, 2025).
Sun, X. et al. Geometry-aware generative autoencoders for warped Riemannian metric learning and generative modeling on data manifolds. In Proc. International Conference on Artificial Intelligence and Statistics (PMLR, 2024).
Osorio, D., Rondón-Villarreal, P. & Torres, R. Peptides: a package for data mining of antimicrobial peptides. R J. 7, 4–14 (2015).
Zhuang, F. et al. A comprehensive survey on transfer learning. Proc. IEEE 109, 43–76 (2020).
Richman, L. P., Vonderheide, R. H. & Rech, A. J. Neoantigen dissimilarity to the self-proteome predicts immunogenicity and response to immune checkpoint blockade. Cell Syst. 9, 375–382 (2019).
Łuksza, M. et al. A neoantigen fitness model predicts tumour response to checkpoint blockade immunotherapy. Nature 551, 517–520 (2017).
Balachandran, V. P. et al. Identification of unique neoantigen qualities in long-term survivors of pancreatic cancer. Nature 551, 512–516 (2017).
Perry, J. S. A. & Hsieh, C.-S. Development of T-cell tolerance utilizes both cell-autonomous and cooperative presentation of self-antigen. Immunol. Rev. 271, 141–155 (2016).
Ghorani, E. et al. Differential binding affinity of mutated peptides for MHC class I is a predictor of survival in advanced lung cancer and melanoma. Ann. Oncol. 29, 271–279 (2018).
Moon, K. R. et al. Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37, 1482–1492 (2019).
Tadros, D. M., Eggenschwiler, S., Racle, J. & Gfeller, D. The MHC motif atlas: a database of MHC binding specificities and ligands. Nucleic Acids Res. 51, D428–D437 (2023).
Sidney, J., Peters, B., Frahm, N., Brander, C. & Sette, A. HLA class I supertypes: a revised and updated classification. BMC Immunol. 9, 1 (2008).
Nguyen, A. T., Szeto, C. & Gras, S. The pockets guide to HLA class I molecules. Biochem. Soc. Trans. 49, 2319–2331 (2021).
Tynan, F. E. et al. T cell receptor recognition of a ‘super-bulged’ major histocompatibility complex class I-bound peptide. Nat. Immunol. 6, 1114–1122 (2005b).
Tynan, F. E. et al. At cell receptor flattens a bulged antigenic peptide presented by a major histocompatibility complex class I molecule. Nat. Immunol. 8, 268–276 (2007).
Ding, Y.-H. et al. Two human T cell receptors bind in a similar diagonal mode to the HLA-a2/tax peptide complex using different TCR amino acids. Immunity 8, 403–411 (1998).
Borch, A. et al. Improve: a feature model to predict neoepitope immunogenicity through broad-scale validation of T-cell recognition. Front. Immunol. 15, 1360281 (2024).
Rappaport, A. R. et al. A shared neoantigen vaccine combined with immune checkpoint blockade for advanced metastatic solid tumors: phase 1 trial interim results. Nat. Med. 30, 1013–1022 (2024).
Chen, J.-L. et al. Structural and kinetic basis for heightened immunogenicity of T cell vaccines. J. Exp. Med. 201, 1243–1255 (2005).
Lu, D. et al. KRAS G12V neoantigen specific T cell receptor for adoptive T cell therapy against tumors. Nat. Commun. 14, 6389 (2023).
Poole, A. et al. Therapeutic high affinity T cell receptor targeting a KRASG12D cancer neoantigen. Nat. Commun. 13, 5333 (2022).
Nusrat, F. et al. The clinical implications of KRAS mutations and variant allele frequencies in pancreatic ductal adenocarcinoma. J. Clin. Med. 13, 2103 (2024).
Weiskopf, D. et al. Comprehensive analysis of dengue virus-specific responses supports an HLA-linked protective role for CD8+ T cells. Proc. Natl Acad. Sci. USA 110, E2046–E2053 (2013).
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).
Liu, C. et al. Imageflownet: forecasting multiscale trajectories of disease progression with irregularly-sampled longitudinal medical images. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing 1–5 (IEEE, 2025).
Liu, C. et al. Cuts: a deep learning and topological framework for multigranular unsupervised medical image segmentation. In Proc. International Conference on Medical Image Computing and Computer-Assisted Intervention 155–165 (Springer, 2024).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Proc. Advances in Neural Information Processing Systems 32, 8026–8037 (NIPS, 2019).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. International Conference on Machine Learning 1597–1607 (PMLR, 2020).
Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. Barlow twins: self-supervised learning via redundancy reduction. In Proc. International Conference on Machine Learning 12310–12320 (PMLR, 2021).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In Proc. International Conference on Learning Representations (ICLR, 2019).
Loshchilov, I. & Hutter, F. SGDR: stochastic gradient descent with warm restarts. In Proc. International Conference on Learning Representations (ICLR, 2017).
Sette, A. & Sidney, J. Nine major HLA class I supertypes account for the vast preponderance of HLA-a and -b polymorphism. Immunogenetics 50, 201–212 (1999).
Mirdita, M. et al. Colabfold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Jamasb, A. et al. Graphein—a Python library for geometric deep learning and network analysis on biomolecular structures and interaction networks. In Proc. Advances in Neural Information Processing Systems 35, 27153–27167 (NIPS, 2022).
Zaidi, N. et al. Role of in silico structural modeling in predicting immunogenic neoepitopes for cancer vaccine development. JCI Insight 5, (2020).
Slota, M., Lim, J.-B., Dang, Y. & Disis, M. L. ELISpot for measuring human immune responses to vaccines. Expert Rev. Vaccines 10, 299–306 (2011).
Yang, F. et al. Validation of an IFN-gamma ELISpot assay to measure cellular immune responses against viral antigens in non-human primates. Gene Ther. 29, 41–54 (2022).
Nelde, A. et al. SARS-CoV-2-derived peptides define heterologous and COVID-19-induced T cell recognition. Nat. Immunol. 22, 74–85 (2021).
Givenchian, K. B. et al. ImmunoStruct: ImmunoStruct release. Zenodo https://doi.org/10.5281/zenodo.17535443 (2025).
Acknowledgements
This work was supported by the National Science Foundation (NSF Career grant nos. 2047856, NSF IIS 2473317, NSF DMS 2327211) (S.K.), the National Institute of Health (grant nos. NIH 1R01GM130847-01A1, NIH 1R01GM135929-01) (S.K.) and by The Colton Center for Autoimmunity at Yale University (S.K.).
Author information
Authors and Affiliations
Contributions
K.B.G., A.I. and S.K. identified the research problem and designed this work. K.B.G. collected and cleaned the IEDB and CEDAR data. K.B.G., J.F.R., C.L., E.Y., R.Y., A.I. and S.K. conceived the experiments. K.B.G., J.F.R., C.L. and E.Y. developed ImmunoStruct. C.L. and K.B.G. conceived, designed and developed the cancer–wild-type contrastive learning. K.B.G., J.F.R., C.L. and E.Y. trained and evaluated ImmunoStruct. K.B.G., K.G. and E.C. performed the experimental validation on SARS-CoV-2. K.G. and J.F.R performed the clinical validation. K.B.G. and S.T. conducted the peptide–TCR contact analysis. K.B.G., J.F.R. and C.L. performed the data analysis. K.B.G., C.L. and S.T. produced the figures. All authors participated in the discussion and wrote the paper.
Corresponding authors
Ethics declarations
Competing interests
S.K. is the chief scientific officer of Latent Alpha and cofounder of Ascent Bio. A.I. is a member of the board of directors of Roche Holding Ltd and of Genentech. A.I. co-founded RIGImmune, Xanadu Bio and PanV. R.Y. is a Amazon Scholar. E.C. is cofounder of Neomabs Biotechnologies Inc. The other authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–5 and Supplementary Tables 1–5.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Givechian, K.B., Rocha, J.F., Liu, C. et al. ImmunoStruct enables multimodal deep learning for immunogenicity prediction. Nat Mach Intell 8, 70–83 (2026). https://doi.org/10.1038/s42256-025-01163-y
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s42256-025-01163-y


