Fig. 3: Interpretation of NeoPrecis-Immuno.

a AUROC comparisons of incrementally constructed NeoPrecis-Immuno components for MHC-I and MHC-II predictions. Each component builds upon the previous one, except for BLOSUMDist and PMBECDist, which serves as baseline comparison. The components include: BLOSUMDist (BLOSUM62 distance, baseline), PMBECDist (PMBEC distance, baseline), SubDist (substitution distance of residue embeddings), SubPosDist (SubDist with position weighting), GeoDist (SubPosDist with MHC-binding motif enrichment), and CRD (GeoDist with sigmoid scaling). b Distribution of residue embeddings in NeoPrecis-Immuno. Residues are colored by amino acid properties to illustrate clustering based on biochemical characteristics. c Scaling factors for motif enrichment across MHC allele-position pairs (n = 4923). These factors indicate how motif enrichment adjusts residue embeddings (b) along specific axes. Each dot is annotated with the most frequently observed amino acid at the binding motif for that allele position, represented by both color and text. d Positive-to-negative ratio distributions for SubDist and GeoDist across MHC-I allele-position pairs (n = 50) in the NCI dataset. Positives represent immunogenic substitutions while negatives represent non-immunogenic substitutions. A higher ratio indicates better differentiation. The red line represents x = y, while the blue line shows the fitted regression; the shaded region represents the 95% confidence interval.