Fig. 4: Atomic level contact prediction.
From: Identifying T cell antigen at the atomic level with graph convolutional network

a Boxplots of NPCC and AUROC for deepAntigen through leave-one-out cross validation on the antigen-HLA I (left, n = 130), antigen-HLA II (middle, n = 73) and antigen-TCR (right, n = 134) structural dataset (Dataset 4,7,11, respectively), with or without pre-training. The boxplots display data distribution where the box spans the interquartile range (IQR, 25th to 75th percentile), the line inside the box indicates the median. The whiskers extend to the minimum and maximum values within 1.5×IQR from the quartiles. Outliers beyond this range are plotted as diamond-shaped blocks. P-values were computed using a one-sided paired t-test. ‘w/’ is the abbreviation for ‘with’, and ‘w/o’ is the abbreviation for ‘without’. b The crystal structure of C259-NYESO complex and the hydrogen bonds between the antigen and the TCR β chain (PDB:2BNQ). The α and β chains of NY-ESOc259 TCR are colored in green and purple, respectively. The HLA is colored in gray and the antigen is colored in orange. The hydrogen bonds between atoms are plotted by ChimeraX61 as dotted lines. c The pairwise true distance and contact probability of crucial atoms that were identified by deepAntigen. The first line of tick labels indicates the atom name from PDB and the second line is the residue type and position. I6 and T7 in the antigen are the core of contacting with CDR3. d The correlation between experimental binding, activation and killing score change and predicted interaction probability change before and after mutation. The experimental data is derived from T cell immune responses stimulated by 133 mutated antigens generated by single residue substitution of NY-ESO−1. Each point in the scatter plots represents a mutaion, and the bands represent the 95% confidence intervals for the linear fitting. P-values were calculated by a two-side Pearson correlation test. e The average of contact probability change between the core of antigen and each crucial atom of CDR3 after the two mutations (W5V and Q8S). f, The contact scores of motif sites and ‘non-motif’ sites in the LLLDRLNQL-specific TCR pools. The motifs were identified by GLIPH18. The contact score is the average probability of k-mer residues contacting the antigen. For each residue, the contact probability is the sum of all pairwise atom contact probabilities involved in that residue. The contact scores are significantly higher for motif sites than ‘non-motif’ sites, shown as mean values with 1.5 times standard errors. The total number of CDR3s is 192. The number of CDR3s associated with each motif is provided in Supplementary Table 9. P-values were computed using a one-sided paired t-test (n = 192).