Supplementary Figure 5: Training and validation data sources and feature importance for presentation models. | Nature Biotechnology

Supplementary Figure 5: Training and validation data sources and feature importance for presentation models.

From: Predicting HLA class II antigen presentation through integrated deep learning

Supplementary Figure 5

(a) Detailed HLA-II ligand data and gene expression data used in training and validation of MARIA models. (b) Distributions of minimum additive distances of validation peptide sequences to training peptide sequences. The median minimum additive distance is around 7, which indicates these validation peptides need to undergo at least seven amino acid changes to become a peptide in the training set. No identical peptides were present in both training and validation sets (minimum additive distance > 0). (c) Performance of RNN-based binding models compared to NetMHCIIpan3.1. RNN-based HLA-DR in vitro binding model was trained on the identical IEDB HLA-DR data of NetMHCIIpan3.1 and validated on naturally presented MCL HLA-DR ligands (18 MCL samples). RNN-based binding models and NetMHCIIpan3.1 got about the same predictive performance (ROC-AUC=0.64, Mann-Whitney U test P=0.34, n=18). (d) Detailed 10-fold cross validation performance on identifying naturally presented with different predictors. MARIA models considering all relevant features (peptide sequence, gene expression, predicted in vitro binding, and cleavage scores) have higher average AUC scores than the second best model (RNN with sequence only, Mann-Whitney U test P<1e-5, n=10). (e) Validation performance of logistical regression models combining gene expression, binding scores and cleavage scores. Logistical regression models were trained on training MCL HLA-DR ligand data, and the validation performance was reported as average AUCs of 10-fold cross validation. Combining gene expression, binding scores and cleavage scores moderately increases the AUC compared to gene expression alone or combined with one additional feature (AUC=0.82, DeLong test p<0.0001, n=3300 for ligand peptides and n=10,000 for decoy peptides). (f) Comparing deep RNN models and shallow neural network (NN) models on predicting HLA-DR ligands based on peptide sequences only. Trained and validated on the identical sequence data, deep RNN models achieved higher validation AUC than shallow NN models after the 6th epoch. The solid lines indicate average validation AUC of 5 independent training experiments, and the shaded areas indicate 95% confidence interval (n=3300 for ligand peptides and n=10,000 for decoy peptides). (g) Impact of training dataset size on prediction performance for pan-HLA-II MARIA models. We trained new MARIA models using varying randomly sampled levels (x-axis) of training peptide ligand examples from a pan-HLA-II dataset profiling diverse cell types, when combined with the data we originally used to train MARIA (Khodadoust et al. 2017). Validation AUCs (y-axis) were then calculated using two monoallelic HLA-DR datasets (top panel: DRB1*01:01, bottom panel: DRB1*04:04) originally shown in Fig. 3. Models with more training examples show stronger performance, but with saturating plateaus in AUC performance gains after consideration of ~20k peptides. Surprisingly, models trained using pan-HLA-II data from diverse cell types did not significantly outperform the original MARIA model trained only on HLA-DR ligands from a single tumor type (two-tailed independent t-test P=0.35, n=10). The shaded area depicts the 95% confidence interval around the mean, based on 10 independently trained models, with the mean performance depicted by the solid line. (h) Performance of pan-HLA-II models for differentiating HLA-DP ligands from random human peptides. A recurrent neural network model was trained on presented HLA II ligands identified with MS and used to scores 20 reported HLA-DP ligands (Supplementary Table 12) and 100 random human peptides. Presentation scores for HLA-DP ligands were significantly higher than those for random human peptides (Mann-Whitney U test p=3e-6) and this difference achieved an AUC of 0.82.

Back to article page