Extended Data Fig. 4: Distribution of training, test and independent test data for retrained model evaluation using the CDR3β-only and CDR3β+others datasets.
From: Assessment of computational methods in predicting TCR–epitope binding recognition

a, Distribution of TCR length in the CDR3β-only dataset. b, Distribution of data used by retrained CDR3β-only models. c, Percentage and number of TCRs in the stratified sampling of 5 times for constructing training and test sets within the CDR3β-only dataset. d, Distribution of antigen types and epitopes in the seen-epitope independent test set of CDR3β-only data. e, Number of epitopes that correspond to different TCR numbers in the seen-epitope independent test set of CDR3β-only data. f, Distribution of antigen types and epitopes in the unseen-epitope independent test set of CDR3β-only data. g, Number of epitopes that correspond to different TCR numbers in the unseen-epitope independent test set of CDR3β-only data. h, Distribution of TCR length in the CDR3β+others dataset. i, Distribution of data used by retrained CDR3β+others models. j, Percentage and number of TCRs in the stratified sampling of 5 times for constructing training and test sets within the CDR3β+others dataset. k, Distribution of antigen types and epitopes in the seen-epitope independent test set of CDR3β+others data. l, Number of epitopes that correspond to different TCR numbers in the seen-epitope independent test set of CDR3β+others data. m, Distribution of antigen types and epitopes in the unseen-epitope independent test set of CDR3β+others data. n, Number of epitopes that correspond to different TCR numbers in the unseen-epitope independent test set of CDR3β+others data. Heatmaps (b, d, f, i, k, m) show the log10-transformed number of TCRs corresponding to each epitope, with x-axis representing epitopes and y-axis representing antigens.