Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Can we predict T cell specificity with digital biology and machine learning?

Abstract

Recent advances in machine learning and experimental biology have offered breakthrough solutions to problems such as protein structure prediction that were long thought to be intractable. However, despite the pivotal role of the T cell receptor (TCR) in orchestrating cellular immunity in health and disease, computational reconstruction of a reliable map from a TCR to its cognate antigens remains a holy grail of systems immunology. Current data sets are limited to a negligible fraction of the universe of possible TCR–ligand pairs, and performance of state-of-the-art predictive models wanes when applied beyond these known binders. In this Perspective article, we make the case for renewed and coordinated interdisciplinary effort to tackle the problem of predicting TCR–antigen specificity. We set out the general requirements of predictive models of antigen binding, highlight critical challenges and discuss how recent advances in digital biology such as single-cell technology and machine learning may provide possible solutions. Finally, we describe how predicting TCR specificity might contribute to our understanding of the broader puzzle of antigen immunogenicity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Structure and function of the TCR.
Fig. 2: The current landscape of known TCR–antigen pairs.
Fig. 3: Screening and computational methods.

Similar content being viewed by others

References

  1. Nguyen, A. T., Szeto, C. & Gras, S. The pockets guide to HLA class I molecules. Biochem. Soc. Trans. 49, 2319–2331 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. de Jong, A. & Ogg, G. CD1a function in human skin disease. Mol. Immunol. 130, 14–19 (2021).

    PubMed  Google Scholar 

  3. de Libero, G., Chancellor, A. & Mori, L. Antigen specificities and functional properties of MR1-restricted T cells. Mol. Immunol. 130, 148–153 (2021).

    PubMed  Google Scholar 

  4. Sun, L., Middleton, D. R., Wantuch, P. L., Ozdilek, A. & Avci, F. Y. Carbohydrates as T-cell antigens with implications in health and disease. Glycobiology 26, 1029–1040 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Bagaev, D. V. et al. VDJdb in 2019: database extension, new analysis infrastructure and a T-cell receptor motif compendium. Nucleic Acids Res. 48, D1057–D1062 (2020).

    PubMed  Google Scholar 

  6. Tickotsky, N., Sagiv, T., Prilusky, J., Shifrut, E. & Friedman, N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics 33, 2924–2929 (2017).

    CAS  PubMed  Google Scholar 

  7. Vita, R. et al. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Res. 47, D339–D343 (2019).

    CAS  PubMed  Google Scholar 

  8. Nolan, S. et al. A large-scale database of T-cell receptor beta (TCRβ) sequences and binding associations from natural and synthetic exposure to SARS-CoV-2. Preprint at Res. Sq. https://www.researchsquare.com/article/rs-51964/v1 (2020).

  9. Moris, P. et al. Current challenges for unseen-epitope TCR interaction prediction and a new perspective derived from image classification. Brief. Bioinform. 22, bbaa318 (2021).

    PubMed  Google Scholar 

  10. Huang, H., Wang, C., Rubelt, F., Scriba, T. J. & Davis, M. M. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nat. Biotechnol. 38, 1194–1202 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Mayer-Blackwell, K. et al. TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs. eLife 10, e68605 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Weber, A., Born, J. & Rodriguez Martínez, M. TITAN: T cell receptor specificity prediction with bimodal attention networks. Bioinformatics 37, I237–I244 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Lee, C. H., Antanaviciute, A., Buckley, P. R., Simmons, A. & Koohy, H. To what extent does MHC binding translate to immunogenicity in humans? Immunoinformatics 3–4, 100006 (2021).

    Google Scholar 

  14. Buckley, P. R. et al. Evaluating performance of existing computational models in predicting CD8+ T cell pathogenic epitopes and cancer neoantigens. Brief. Bioinform. 23, bbac141 (2022).

    PubMed  PubMed Central  Google Scholar 

  15. Mösch, A., Raffegerst, S., Weis, M., Schendel, D. J. & Frishman, D. Machine learning for cancer immunotherapies based on epitope recognition by T cell receptors. Front. Genet. 10, 1141 (2019).

    PubMed  PubMed Central  Google Scholar 

  16. Wells, D. K. et al. Key parameters of tumor epitope immunogenicity revealed through a consortium approach improve neoantigen prediction. Cell 183, 818–834.e13 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Altman, J. D. et al. Phenotypic analysis of antigen-specific T lymphocytes. Science 274, 94–96 (1996).

    CAS  PubMed  Google Scholar 

  18. Yao, Y., Wyrozżemski, Ł., Lundin, K. E. A., Kjetil Sandve, G. & Qiao, S.-W. Differential expression profile of gluten-specific T cells identified by single-cell RNA-seq. PLoS ONE 16, e0258029 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Glanville, J. et al. Identifying specificity groups in the T cell receptor repertoire. Nature 547, 94–98 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Kurtulus, S. & Hildeman, D. Assessment of CD4+ and CD8+ T cell responses using MHC class I and II tetramers. Methods Mol. Biol. 979, 71–79 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Joglekar, A. V. & Li, G. T cell antigen discovery. Nat. Methods 18, 873–880 (2021).

    CAS  PubMed  Google Scholar 

  22. Bosselut, R. et al. Single T cell sequencing demonstrates the functional role of αβ TCR pairing in cell lineage and antigen specificity. Front. Immunol. 1, 1516 (2019).

    Google Scholar 

  23. Emerson, R. O. et al. Immunosequencing identifies signatures of cytomegalovirus exposure history and HLA-mediated effects on the T cell repertoire. Nat. Genet. 49, 659–665 (2017).

    CAS  PubMed  Google Scholar 

  24. Wang, X., He, Y., Zhang, Q., Ren, X. & Zhang, Z. Direct comparative analyses of 10× genomics chromium and Smart-Seq2. Genomics Proteomics Bioinformatics 19, 253–266 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Zhang, W. et al. A framework for highly multiplexed dextramer mapping and prediction of T cell receptor sequences to antigen specificity. Sci. Adv. 7, eabf5835 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Gascoigne, N. et al. Optimized peptide-MHC multimer protocols for detection and isolation of autoimmune T-cells. Front. Immunol. 9, 1378 (2018).

    Google Scholar 

  27. Meysman, P. et al. Benchmarking solutions to the T-cell receptor epitope prediction problem: IMMREP22 workshop report. Preprint at bioRxiv https://doi.org/10.1101/2022.10.27.514020 (2022).

    Article  Google Scholar 

  28. Dobson, C. S. et al. Antigen identification and high-throughput interaction mapping by reprogramming viral entry. Nat. Methods 19, 449–460 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Guo, X. Z. J. & Elledge, S. J. V-CARMA: a tool for the detection and modification of antigen-specific T cells. Proc. Natl Acad. Sci. USA 119, e2116277119 (2022).

    PubMed  PubMed Central  Google Scholar 

  30. Brophy, S. E., Holler, P. D. & Kranz, D. M. A yeast display system for engineering functional peptide-MHC complexes. J. Immunol. Methods 272, 235–246 (2003).

    CAS  PubMed  Google Scholar 

  31. Birnbaum, M. E. et al. Deconstructing the peptide-MHC specificity of T cell recognition. Cell 157, 1073–1087 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Crawford, F. et al. Use of baculovirus MHC/peptide display libraries to characterize T-cell receptor ligands. Immunol. Rev. 210, 156–170 (2006).

    CAS  PubMed  Google Scholar 

  33. Coles, C. H. et al. TCRs with distinct specificity profiles use different binding modes to engage an identical peptide–HLA complex. J. Immunol. 204, 1943–1953 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Kula, T. et al. T-Scan: a genome-wide method for the systematic discovery of T cell epitopes. Cell 178, 1016 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Pan, X. et al. Combinatorial HLA-peptide bead libraries for high throughput identification of CD8+ T cell specificity. J. Immunol. Methods 403, 72–78 (2014).

    CAS  PubMed  Google Scholar 

  36. Li, G. et al. T cell antigen discovery via trogocytosis. Nat. Methods 16, 183–190 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Joglekar, A. V. et al. T cell antigen discovery via signaling and antigen-presenting bifunctional receptors. Nat. Methods 16, 191–198 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Schaap-Johansen, A.-L., Vujovic, M., Borch, A., Hadrup, S. R. & Marcatili, P. T cell epitope prediction and its application to immunotherapy. Front. Immunol. 12, 712488 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Valkiers, S. et al. Recent advances in T-cell receptor repertoire analysis: bridging the gap with multimodal single-cell RNA sequencing. Immunoinformatics 5, 100009 (2022).

    CAS  Google Scholar 

  40. Lee, C. H. et al. Predicting cross-reactivity and antigen specificity of T cell receptors. Front. Immunol. 11, 2498 (2020).

    Google Scholar 

  41. Vujovic, M. et al. T cell receptor sequence clustering and antigen specificity. Comput. Struct. Biotechnol. J. 18, 2166–2173 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Katayama, Y., Yokota, R., Akiyama, T. & Kobayashi, T. J. Machine learning approaches to TCR repertoire analysis. Front. Immunol. 13, 858057 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Luu, A. M., Leistico, J. R., Miller, T., Kim, S. & Song, J. S. Predicting TCR-epitope binding specificity using deep metric learning and multimodal learning. Genes 12, 572 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Montemurro, A. et al. NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data. Commun. Biol. 4, 1060 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Dens, C., Bittremieux, W., Affaticati, F., Laukens, K. & Meysman, P. Interpretable deep learning to uncover the molecular binding patterns determining TCR–epitope interactions. Preprint at bioRxiv https://doi.org/10.1101/2022.05.02.490264 (2022).

    Article  Google Scholar 

  46. Springer, I., Tickotsky, N. & Louzoun, Y. Contribution of T cell receptor alpha and beta CDR3, MHC typing, V and J genes to peptide binding prediction. Front. Immunol. 12, 1436 (2021).

    Google Scholar 

  47. Lu, T. et al. Deep learning-based prediction of the T cell receptor–antigen binding specificity. Nat. Mach. Intell. 3, 864–875 (2021).

    PubMed  PubMed Central  Google Scholar 

  48. Fischer, D. S., Wu, Y., Schubert, B. & Theis, F. J. Predicting antigen specificity of single T cells based on TCR CDR3 regions. Mol. Syst. Biol. 16, 9416 (2020).

    Google Scholar 

  49. Wu, K. et al. TCR-BERT: learning the grammar of T-cell receptors for flexible antigen-binding analyses. Preprint at bioRxiv https://doi.org/10.1101/2021.11.18.469186 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Grazioli, F. et al. On TCR binding predictors failing to generalize to unseen peptides. Front. Immunol. 13, 1014256 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Zhang, H., Zhan, X. & Li, B. GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation. Nat. Commun. 12, 4699 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Chronister, W. D. et al. TCRMatch: predicting T-cell receptor specificity based on sequence similarity to previously characterized receptors. Front. Immunol. 12, 640725 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Sidhom, J. W., Larman, H. B., Pardoll, D. M. & Baras, A. S. DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires. Nat. Commun. 12, 1605 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Dash, P. et al. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature 547, 89–93 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Valkiers, S., van Houcke, M., Laukens, K. & Meysman, P. ClusTCR: a python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity. Bioinformatics 37, 4865–4867 (2021).

    CAS  PubMed  Google Scholar 

  56. Corrie, B. D. et al. iReceptor: a platform for querying and analyzing antibody/B-cell and T-cell receptor repertoire data across federated repositories. Immunol. Rev. 284, 24–41 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Andreatta, M. et al. Interpretation of T cell states from single-cell transcriptomics data using reference atlases. Nat. Commun. 12, 2965 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Leem, J., de Oliveira, S. H. P., Krawczyk, K. & Deane, C. M. STCRDab: the structural T-cell receptor database. Nucleic Acids Res. 46, D406–D412 (2018).

    CAS  PubMed  Google Scholar 

  59. Mayer, A. & Callan Jr, C. G. Measures of epitope binding degeneracy from T cell receptor repertoires. Preprint at bioRxiv https://doi.org/10.1101/2022.07.25.501373 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Singh, N. K. et al. Emerging concepts in TCR specificity: rationalizing and (maybe) predicting outcomes. J. Immunol. 199, 2203–2213 (2017).

    CAS  PubMed  Google Scholar 

  61. Quaratino, S., Thorpe, C. J., Travers, P. J. & Londei, M. Similar antigenic surfaces, rather than sequence homology, dictate T-cell epitope molecular mimicry. Proc. Natl Acad. Sci. USA 92, 10398–10402 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Lanzarotti, E., Marcatili, P. & Nielsen, M. T-cell receptor cognate target prediction based on paired α and β chain sequence and structural CDR loop similarities. Front. Immunol. 10, 2080 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at bioRxiv https://doi.org/10.1101/2021.10.04.463034 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Koehler Leman, J. et al. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 17, 665–680 (2020).

    PubMed Central  Google Scholar 

  65. Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Bradley, P. Structure-based prediction of T cell receptor: peptide–MHC interactions. Preprint at bioRxiv https://doi.org/10.1101/2022.08.05.503004 (2022).

    Article  Google Scholar 

  67. Jiang, Y., Huo, M. & Li, S. C. TEINet: a deep learning framework for prediction of TCR-epitope binding specificity. Preprint at bioRxiv https://doi.org/10.1101/2022.10.20.513029 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Chinery, L., Wahome, N., Moal, I. & Deane, C. M. Paragraph — antibody paratope prediction using Graph Neural Networks with minimal feature vectors. Bioinformatics 39, btac732 (2022).

    Google Scholar 

  69. Alley, E. C., Khimulya, G. & Biswas, S. Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods 16, 1312–1322 (2019).

    Google Scholar 

  70. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Reynisson, B., Alvarez, B., Paul, S., Peters, B. & Nielsen, M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 48, 449–454 (2020).

    Google Scholar 

  72. Mason, D. A very high level of cross-reactivity is an essential feature of the T-cell receptor. Immunol. Today 19, 395–404 (1998).

    CAS  PubMed  Google Scholar 

  73. Sewell, A. K. Why must T cells be cross-reactive? Nat. Rev. Immunol. 12, 669–677 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Keck, S. et al. Antigen affinity and antigen dose exert distinct influences on CD4 T-cell differentiation. Proc. Natl Acad. Sci. USA 111, 14852–14857 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Achar, S. R. et al. Universal antigen encoding of T cell activation from high-dimensional cytokine dynamics. Science 376, 880–884 (2022).

    CAS  PubMed  Google Scholar 

  76. van Panhuys, N., Klauschen, F. & Germain, R. N. T cell receptor-dependent signal intensity dominantly controls CD4+ T cell polarization in vivo. Immunity 41, 63–74 (2014).

    PubMed  PubMed Central  Google Scholar 

  77. Liu, S. et al. Spatial maps of T cell receptors and transcriptomes reveal distinct immune niches and interactions in the adaptive immune response. Immunity 55, 1940–1952.e5 (2022).

    CAS  PubMed  Google Scholar 

  78. Schattgen, S. A. et al. Integrating T cell receptor sequences and transcriptional profiles by clonotype neighbor graph analysis (CoNGA). Nat. Biotechnol. 40, 54–63 (2021).

    PubMed  PubMed Central  Google Scholar 

  79. Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP) — round XIV. Proteins 89, 1607–1617 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Pearson, K. On lines and planes of closest fit to systems of points in space. Philos. Mag. 2, 559–570 (1901).

    Google Scholar 

  81. Cai, M., Bang, S., Zhang, P. & Lee, H. ATM-TCR: TCR–epitope binding affinity prediction using a multi-head self-attention model. Front. Immunol. 13, 893247 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  82. Pavlović, M. et al. The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires. Nat. Mach. Intell. 3, 936–944 (2021).

    PubMed  PubMed Central  Google Scholar 

  83. Heikkilä, N. et al. Human thymic T cell repertoire is imprinted with strong convergence to shared sequences. Mol. Immunol. 127, 112–123 (2020).

    PubMed  Google Scholar 

  84. 10× Genomics. A new way of exploring immunity: linking highly multiplexed antigen recognition to immune repertoire and phenotype. 10× Genomics https://pages.10xgenomics.com/rs/446-PBO-704/images/10x_AN047_IP_A_New_Way_of_Exploring_Immunity_Digital.pdf (2020).

  85. Ehrlich, R. et al. SwarmTCR: a computational approach to predict the specificity of T cell receptors. BMC Bioinformatics 22, 422 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. Dean, J. et al. Annotation of pseudogenic gene segments by massively parallel sequencing of rearranged lymphocyte receptor loci. Genome Med. 7, 123 (2015).

    PubMed  PubMed Central  Google Scholar 

  87. Zhang, W. et al. PIRD: pan immune repertoire database. Bioinformatics 36, 897–903 (2020).

    CAS  PubMed  Google Scholar 

  88. Jokinen, E., Huuhtanen, J., Mustjoki, S., Heinonen, M. & Lähdesmäki, H. Predicting recognition between T cell receptors and epitopes with TCRGP. PLoS Comput. Biol. 17, e1008814 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  89. Tong, Y. et al. SETE: sequence-based ensemble learning approach for TCR epitope binding prediction. Comput. Biol. Chem. 87, 107281 (2020).

    CAS  PubMed  Google Scholar 

  90. Snyder, T. M. et al. Magnitude and dynamics of the T-cell response to SARS-CoV-2 infection at both individual and population levels. Preprint at medRxiv https://doi.org/10.1101/2020.07.31.20165647 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Zhang, S. Q. et al. High-throughput determination of the antigen specificities of T cell receptors in single cells. Nat. Biotechnol. 36, 1156–1159 (2018).

    CAS  Google Scholar 

  92. Zhang, H. et al. Investigation of antigen-specific T-cell receptor clusters in human cancers. Clin. Cancer Res. 26, 1359–1371 (2020).

    CAS  PubMed  Google Scholar 

  93. Berman, H. M. et al. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  94. Chen, S. Y., Yue, T., Lei, Q. & Guo, A. Y. TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function. Nucleic Acids Res. 49, D468 (2021).

    CAS  PubMed  Google Scholar 

  95. Gilson, M. K. et al. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res. 44, 1045–1053 (2015).

    Google Scholar 

  96. Dines, J. N. et al. The ImmuneRACE Study: a prospective multicohort study of immune response action to COVID-19 events with the ImmuneCODETM Open Access Database. Preprint at medRxiv https://doi.org/10.1101/2020.08.17.20175158 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  97. Chen, G. et al. Sequence and structural analyses reveal distinct and highly diverse human CD8+ TCR repertoires to immunodominant viral antigens. Cell Rep. 19, 569 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. Huth, A., Liang, X., Krebs, S., Blum, H. & Moosmann, A. Antigen-specific TCR signatures of cytomegalovirus infection. J. Immunol. 202, 979–990 (2019).

    CAS  PubMed  Google Scholar 

  99. Springer, I., Besser, H., Tickotsky-Moskovitz, N., Dvorkin, S. & Louzoun, Y. Prediction of specific TCR-peptide binding from large dictionaries of TCR–peptide pairs. Front. Immunol. 11, 1803 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. Kanakry, C. G. et al. Origin and evolution of the T cell repertoire after posttransplantation cyclophosphamide. JCI Insight 1, 86252 (2016).

    Google Scholar 

  101. Raman, M. C. C. et al. Direct molecular mimicry enables off-target cardiovascular toxicity by an enhanced affinity TCR designed for cancer immunotherapy. Sci. Rep. 6, 18851 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  102. Soto, C. et al. High frequency of shared clonotypes in human T cell receptor repertoires. Cell. Rep. 32, 107882 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  103. Woolhouse, M. E. J. & Gowtage-Sequeria, S. Host range and emerging and reemerging pathogens. Emerg. Infect. Dis. 11, 1842–1847 (2005).

    PubMed  PubMed Central  Google Scholar 

  104. Robinson, J., Waller, M. J., Parham, P., Bodmer, J. G. & Marsh, S. G. E. IMGT/HLA Database — a sequence database for the human major histocompatibility complex. Nucleic Acids Res. 29, 210–213 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. Waldman, A. D., Fritz, J. M. & Lenardo, M. J. A guide to cancer immunotherapy: from T cell basic science to clinical practice. Nat. Rev. Immunol. 20, 651–668 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  106. Linette, G. P. et al. Cardiovascular toxicity and titin cross-reactivity of affinity-enhanced T cells in myeloma and melanoma. Blood 122, 863–871 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  107. Arellano, B., Graber, D. J. & Sentman, C. L. Regulatory T cell-based therapies for autoimmunity. Discov. Med. 22, 73–80 (2016).

    PubMed  PubMed Central  Google Scholar 

  108. Raffin, C., Vo, L. T. & Bluestone, J. A. Treg cell-based therapies: challenges and perspectives. Nat. Rev. Immunol. 20, 158–172 (2020).

    CAS  PubMed  Google Scholar 

  109. Hernando, B. et al. The effect of age on the acquisition and selection of cancer driver mutations in sun-exposed normal skin. Ann. Oncol. 32, 412–421 (2021).

    CAS  PubMed  Google Scholar 

  110. Sesma, A. et al. From tumor mutational burden to blood T cell receptor: looking for the best predictive biomarker in lung cancer treated with immunotherapy. Cancers 12, 1–19 (2020).

    Google Scholar 

  111. Scott, A. C. et al. TOX is a critical regulator of tumour-specific T cell differentiation. Nature 571, 270 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. Yost, K. E. et al. Clonal replacement of tumor-specific T cells following PD-1 blockade. Nat. Med. 25, 1251–1259 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  113. Wherry, E. J. & Kurachi, M. Molecular and cellular insights into T cell exhaustion. Nat. Rev. Immunol. 15, 486–499 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  114. Daniel, B. et al. Divergent clonal differentiation trajectories of T cell exhaustion. Nat. Immunol. 23, 1614–1627 (2022).

    CAS  PubMed  Google Scholar 

  115. Shakiba, M. et al. TCR signal strength defines distinct mechanisms of T cell dysfunction and cancer evasion. J. Exp. Med. 219, e20201966 (2022).

    CAS  PubMed  Google Scholar 

  116. Dan, J. M. et al. Immunological memory to SARS-CoV-2 assessed for up to 8 months after infection. Science 371, eabf4063 (2021).

    CAS  PubMed  Google Scholar 

  117. Swanson, P. A. et al. AZD1222/ChAdOx1 nCoV-19 vaccination induces a polyfunctional spike protein-specific TH1 response with a diverse TCR repertoire. Sci. Transl Med. 13, 7211 (2021).

    Google Scholar 

  118. Bjornevik, K. et al. Longitudinal analysis reveals high prevalence of Epstein–Barr virus associated with multiple sclerosis. Science 375, 296–301 (2022).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

H.K. is supported by funding from the UK Medical Research Council grant number MC_UU_12010/3. D.H. receives support from the Biotechnology and Biological Sciences Research Council (BBSRC) (grant number BB/T008784/1) and is funded by the Rosalind Franklin Institute. The authors thank A. Simmons, B. McMaster and C. Lee for critical review. H.K. acknowledges A. Antanaviciute, A. Simmons, T. Elliott and P. Klenerman for their encouragement, support and fruitful conversations.

Author information

Authors and Affiliations

Authors

Contributions

H.K. and D.H. researched and wrote the article. R.A.F., M.B. and G.O. reviewed and edited the manuscript before submission.

Corresponding author

Correspondence to Hashem Koohy.

Ethics declarations

Competing interests

G.O. is a co-founder of T-Cypher Bio. D.H. and R.A.F provide consultancy services to companies active in T cell antigen discovery and vaccine development. The other authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Immunology thanks M. Birnbaum, P. Holec, E. Newell and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links:

BindingDB: https://www.bindingdb.org/rwd/bind/index.jsp

Immune Epitope Database: https://www.iedb.org/

McPas-TCR: http://friedmanlab.weizmann.ac.il/McPAS-TCR

MIRA: https://clients.adaptivebiotech.com/pub/covid-2020

PyMOL: https://www.schrodinger.com/products/pymol

VDJdb: https://vdjdb.cdr3.net/

Glossary

Area under the receiver-operating characteristic curve

(ROC-AUC). ROC-AUC and the area under the precision–recall curve (PR-AUC) are measures of model tendency to different classes of error. These plots are produced for classification tasks by changing the threshold at which a model prediction falling between zero and one is assigned to the positive label class, for example, predicted binding of a given T cell receptor–antigen pair. ROC-AUC is the area under the line described by a plot of the true positive rate and false positive rate. ROC-AUC is typically more appropriate for problems where positive and negative labels are proportionally represented in the input data. PR-AUC is the area under the line described by a plot of model precision against model recall. PR-AUC is typically more appropriate for problems in which the positive label is less frequently observed than the negative label.

Library-on-library screens

Experimental screens that permit analysis of the binding between large libraries of (for example) peptide–MHC complexes and various T cell receptors.

Machine learning models

A broad family of computational and statistical methods that aim to identify statistically conserved patterns within a data set without being explicitly programmed to do so. Machine learning models may broadly be described as supervised or unsupervised based on the manner in which the model is trained. Many recent models make use of both approaches.

Neural networks

A family of machine learning models inspired by the synaptic connections of the brain that are made up of stacked layers of simple interconnected models. Although each component of the network may learn a relatively simple predictive function, the combination of many predictors allows neural networks to perform arbitrarily complex tasks from millions or billions of instances. Neural networks may be trained using supervised or unsupervised learning and may deploy a wide variety of different model architectures. Deep neural networks refer to those with more than one intermediate layer.

Shuffling

In the absence of experimental negative (non-binding) data, shuffling is the act of assigning a given T cell receptor drawn from the set of known T cell receptor–antigen pairs to an epitope other than its cognate ligand, and labelling the randomly generated pair as a negative instance.

Supervised learning

Models that learn a mathematical function mapping from an input to a predicted label, given some data set containing both input data and associated labels. Common supervised tasks include regression, where the label is a continuous variable, and classification, where the label is a discrete variable.

Synthetic peptide display libraries

Experimental systems that make use of large libraries of recombinant synthetic peptide–MHC complexes displayed by yeast30, baculovirus32 or bacteriophage33 or beads35 for profiling the sequence determinants of immune receptor binding. Peptide diversity can reach 109 unique peptides for yeast-based libraries.

Training data

The training data set serves as an input to the model from which it learns some predictive or analytical function.

Unsupervised learning

Models that learn to assign input data to clusters having similar features, or otherwise to learn the underlying statistical patterns of the data. Unlike supervised models, unsupervised models do not require labels. Common unsupervised techniques include clustering algorithms such as K-means; anomaly detection models and dimensionality reduction techniques such as principal component analysis80 and uniform manifold approximation and projection.

Validation

Analysis done using a validation data set to evaluate model performance during and after training. A given set of training data is typically subdivided into training and validation data, for example, in an 80%:20% ratio. Models may then be trained on the training data, and their performance evaluated on the validation data set.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hudson, D., Fernandes, R.A., Basham, M. et al. Can we predict T cell specificity with digital biology and machine learning?. Nat Rev Immunol 23, 511–521 (2023). https://doi.org/10.1038/s41577-023-00835-3

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41577-023-00835-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing