Fig. 4
From: deepBreaks identifies and prioritizes genotype–phenotype associations using machine learning

Classification of HIV-1 subtypes B and C based on nucleotide sequences of the V3 loop. (a) Cluster analysis of the sequences with ground truth labels from the Los Alamos National Lab database. (b) Results of tenfold cross-validation (cv) and test data of top 2 classification models, XGBOOST (xgb) and lightGBM (lgbm), trained to predict the subtypes of the HIV-1 based on the V3 loop. (c) Important positions reported by deepBreaks based on the results of the top three models labeled with the sections of the sequence. (d) Stacked bar plots of the top 5 positions that contribute to the classification models for predicting HIV subtypes ‘B’ and ‘C’.