Extended Data Fig. 10: Phage, prokaryotic, and phenotypic differences in ART+ and ART- PLWH.

a) Prokaryotic diversity (inverse Simspon’s index after rarefaction) and phage richness (species present at ≥10−5% abundance) by HIV and antiretroviral therapy status. Points represent individual samples. Differences in diversity by site were tested with ANOVA and across sites with a linear mixed effect model accounting for site as a random effect. b) Generalized fold change (gFC) for all species in HIV+ ART+ relative to HIV− individuals and for HIV+ ART− relative to HIV− individuals. Species are coloured by q-value in HIV− vs HIV+ ART+ comparison. Species with an absolute gFC ≥ 0.3 in the HIV− vs HIV+ ART− comparison (that do not exhibit a gFC ≥ 0.3 in the HIV− vs HIV+ ART+ comparison) are annotated. c) Prediction from machine learning model trained prokaryotic data from HIV− and HIV+ ART+ participants and applied to HIV+ ART- participants. Sample fraction predicted to be positive at a 5% internal false positive rate (dashed line) is listed below. d) HIV-associated effect size for prokaryotic and phage species. Species are colored by q-value. e) Receiver-operating characteristic (ROC) for models trained to distinguish HIV status using phage composition. Shading indicates 95% confidence intervals and numbers show area under the ROC curve (AU-ROC). f) AU-ROC for models trained on participants from each site (panel e) and applied to other sites. Models were trained on two sites and validated on the left-out site for leave-one-site-out (LOSO) validation. g) Statistics for age, waist-to-hip ratio, cholesterol, and glucose for individuals who are HIV seronegative and seropositive on ART. All p-values result from Wilcox rank sum test. For all panels, n = 129 HIV+ ART+, n = 28 HIV+ ART−, n = 719 HIV−. For all boxplots, boxes denote the interquartile range (IQR) with the median as a thick black line and the whiskers extending up to the most extreme points within 1.5-fold IQR.