Fig. 3: Selection of the important positions.
From: Connecting MHC-I-binding motifs with HLA alleles via deep learning

a A clustering heatmap of the peptide mask on each peptide position of each allele. b A stack plot of the position importance of HLA genes at each MHC-I residue and a heatmap of allele masks derived from ScoreCAM results with clustering on alleles. These two plots are aligned by MHC-I-binding cleft sequences, to better demonstrate the distribution of mask scores. In the stack plot, different HLA genes were counted independently due to the number of alleles with variation as well as the divergent patterns of conserved or polymorphic sequences (Supplementary Fig. 4). As for the heatmap clustering in a, b, we used Euclidean distance and unweighted average linkage for clustering mask scores, and the row color is used to label the HLA gene. c A scatterplot with linear correlation shows the relationship between polymorphism and importance of each polymorphic MHC-I residue (n = 80). Information entropy (−ΣP × ln(P), where P is the amino acid frequency) is used to represent the degree of polymorphism. The important positions selected using ScoreCAM are colored in red, and the 34 residues derived from NetMHCpan4.1 are cross-marked. The blue band represents the 95% confidence interval of the regression fit, and the line represents the estimated regression. d A Venn diagram shows the intersection of the important position set from each HLA gene and the polymorphic residue sets. Residues in the set of “(A ∪ B ∪ C) ∩ polymorphism” are selected as the 42 important positions of MHCfovea. Source data are provided in Supplementary Data 5 and 6.