Extended Data Fig. 9: Additional results of testing the effects of TCR counts on model performance and correlation between the heterogeneity of TCRs and model performance.
From: Assessment of computational methods in predicting TCR–epitope binding recognition

a, Performance saturation analysis for TEIM, TCR-BERT, ERGO-AE, VitTCR, NetTCR, PiTE and ATM-TCR, using five epitopes with most TCR counts, showing per-epitope AUPRC and mean performance (red line). b, AUPRC comparison of average AUPRC of models obtained by five epitopes across different TCR numbers. c, Growth trend of AUPRC across TCR count intervals. The x-axis denotes three intervals of TCR counts employed in model training. The heatmap shows the slopes, calculated as AUPRC change divided by the TCR count range within each interval. d, Correlation between TCR sequence heterogeneity and AUPRC for models: epiTCR, TCRGP, TEPCAM, VitTCR, TEIM, TCR-BERT, PiTE, NetTCR, ATM-TCR, and ERGO-AE; dots represent epitopes, colored by antigen group. The heterogeneity between TCR sequences was measured by average Levenshtein distance per epitope. Spearman correlation was used, and P-values were from two-sided t-test (n = 389). e, Differences in the strength of the negative correlation between intra-epitope TCR heterogeneity and model AUPRC across different models based on the results from d. P-values of Fisher’s r-to-z transformation were from two-sided z-test with Benjamini-Hochberg correction (n = 389).