Extended Data Fig. 3: Exploring polyspecificity vs. training set statistics across baseline, bidirectional, and multitask model variants.
From: Conditional generation of real antigen-specific T cell receptor sequences

(a) Heat map of ranked TCRBART-0 translations across pMHCs coloured by number of known alleles, known epitopes, training set frequency, epitope dissimilarity (measured as the reciprocal of the longest common substring (LCS)), and membership status in the 915 polyspecific TCRs. (b) Analogous heat map as panel ‘a’ but for TCRT5-FT generations. (c) Correlation plots for TCRBART-0 and TCRT5-FT model generations and training set occurrence. Line of best fit is shown in red. Pearson’s r and Spearman’s ρ are also provided for each model. (d) Correlation plots for TCRBART-0 and TCRT5-FT log[pgen] and model generation frequency. Line of best fit shown in red. Summary statistics are provided as well.