Fig. 5: Gut microbiota profiles robustly segregate samples by HIV infection status and sexual preference.

a The ridge logistic regression application of machine learning was used to test robustness of differences between MSM vs. MSW (n = 72 MSM, n = 22 MSW), and PWH vs. uninfected (n = 71 PWH, n = 71 seronegative). Both MSM status and HIV infection status exhibited robust signatures by which ridge logistic regression could predict the classification status of unseen samples with high accuracy (AUROC = 0.84 for MSM vs. MSW with P = 0.015, AUROC = 0.73 for PWH vs. seronegative with P = 0.017). b Sample classifications (e.g. MSM, MSW, PWH, seronegative) were randomly permuted to introduce noise into the dataset and quantify capacity for machine learning to classify persons based on spurious data. Random class label permutation demonstrated no classification accuracy to spurious data (AUROC = 0.5 for both MSM vs. MSW and PWH vs. uninfected). c Top ASV features that informed models for ridge logistic regression predictions of MSM vs. MSW and PWH vs. seronegative. Feature coefficients represent extent to which each ASV was weighted to represent the given class (positive for MSM vs. MSW indicated ASV was indicative of MSM; positive for PWH vs. seronegative was indicative of PWH). Bars are color-coded by log mean fold change in PWH vs. seronegative, and show a lack of correlation between ASVs informative for predicting MSM status and those altered in abundance in PWH.