Extended Data Fig. 4: Modelling and scoring procedure metrics.
From: A host–microbiota interactome reveals extensive transkingdom connectivity

a, A histogram of the protein barcode representation in the input library. The wide spread on the log10 x-axis indicates a high degree of variability. The model accounts for this by using barcode input concentration as an offset term. Each tick mark across the x-axis below the histogram represents a protein. b, A Venn diagram showing interaction counts that pass each of the three hit-calling thresholds for the standard threshold set (95% interval excludes zero, estimated effect size > 0.5, and concordance score > 0.75). c, A plot of normalized counts demonstrating the utility of the concordance threshold. Both interactions shown have about the same interaction score (around 1.9) and similarly variable inputs in the Pre library (top panels), but the concordance between normalized output counts (bottom panels) in the TFF2:HM645 interaction is much higher than in SLC6A9:HM1171. Grey cells represent zero counts. d, A histogram of concordance scores for all interactions in the assay. Dashed vertical lines indicate the stringent and standard thresholds. e, Saturation curves from repeated rarefaction analysis. Given that both sets of thresholds have roughly plateaued, we can conclude that we have identified most of the interactions that are detectable under the experimental conditions. f, Comparison of the results of an initial run of the scoring method against five repeated runs where the standard deviation of the normal prior on interaction scores varied from 0.075 to 0.3. Each dot represents the score of a particular interaction. Only interactions that were a hit in at least one run are shown. The middle panel uses the same value as the initial run, showing the extent of Monte Carlo error. As expected, the rank and relative magnitude of scores are highly consistent between runs, while narrower priors lead to lower scores and fewer hits and wider priors lead to higher scores and more hits. The two distinct groups of interactions visible in the panels with wide priors represent subpopulations of interactions that are either more or less amenable to the zero-inflation component of the model.