Fig. 4: Associations between the upper respiratory microbiota and the acquisition of bacterial pathobionts.

a Forest plots depict the accuracy of random forest models for the prediction of acquisition of bacterial pathobionts. The area under the curve (AUC) is represented by a point, with error bars indicating the 95% confidence interval. Clinical variables included in the models were age, number of children in the household (<5 years of age), respiratory virus infection, maternal HIV infection, breastfeeding, receipt of antibiotics, season, low birth weight, use of solid fuels, location of residency (urban or rural), and 13-valent pneumococcal conjugate vaccine doses (for S. pneumoniae only). Data on URT microbiota taxonomy and microbial pathways were from the study visit preceding the visit at which the sample was collected to evaluate for bacterial pathobiont acquisition. For time-varying clinical variables, the data included in the models were collected at the study visit at which the sample was collected to evaluate for pathobiont acquisition. Models predicted pathobiont acquisition at the following number of study visits: H. influenzae (n = 1319), M. catarrhalis (n = 827), S. aureus (n = 1488), and S. pneumoniae (n = 936). b Heatmap depicting variable of importance scores for features in random forest models predicting pathobiont colonization and containing URT microbiota taxonomic data and clinical variables. The taxonomic classification of each ASV based on a BLAST search is shown, followed by a number in parentheses corresponding to the mean relative abundance of this ASV in infant nasopharyngeal samples (ASV1 was the most abundant ASV across all samples). c Mixed effect logistic regression was used to identify URT microbiota features from the preceding sample that predicted pathobiont acquisition. The coefficients from these models, which correspond to the relative effect sizes of associations, are shown for significant associations (q < 0.20). ASVs for which higher/lower relative abundances were associated with lower odds of pathobiont acquisition are shown as blue/green bars. Source data are provided as a Source Data file. (AUC-ROC, area under the receiving operating characteristic curve; URT, upper respiratory tract; ASV, amplicon sequence variant).