Fig. 1: Data overview.
From: Geometric deep learning improves generalizability of MHC-bound peptide predictions

A Shuffled data distribution representation between HLA-A, -B and -C for the training and test sets. B-D show the hierarchical clustering of HLA-A, -B, -C pseudosequences, respectively, with training data in black and test data in red. In all panels, the mustard/teal bar by each gene label represents the binders/non-binders (positive/negative) ratio for the gene. E Data enrichment through 3D modeling. We used PANDORA37 to generate 20 3D models of each of the 100178 pMHC data points, thus enriching the sequence information with physics-derived information, such as geometry and physico-chemical features.