Abstract
Pseudomonadota (formerly Proteobacteria) are prevalent in the commensal human gut microbiota, but also include many pathogens that rely on secretion systems to support pathogenicity by injecting proteins into host cells. Here we show that 80% of Pseudomonadota from healthy gut microbiomes also have intact type III secretion systems (T3SS). Candidate effectors predicted by machine learning display sequence and structural features that are distinct from those of pathogen effectors. Towards a systems-level functional understanding, we experimentally constructed a protein–protein meta-interactome map between human proteins and commensal effectors. Network analyses uncovered that effector-targeted neighbourhoods are enriched for genetic variation linked to microbiome-associated conditions, including autoimmune and metabolic diseases. Metagenomic analysis revealed effector enrichment in Crohn’s disease but depletion in ulcerative colitis. Functionally, commensal effectors can translocate into human cells and modulate NF-κB signalling and cytokine secretion in vitro. Our findings indicate that T3SS contribute to microorganism–host cohabitation and that effector–host protein interactions may represent an underappreciated route by which commensal gut microbiota influences health.
Similar content being viewed by others
Main
Host-associated microbiota influences human health in complex, genotype-dependent ways. Especially the human gut microbiome, which is dominated by Firmicutes, Bacteroidetes and Pseudomonadota1 (formerly Proteobacteria2), can alter the risk of diverse conditions, including metabolic disorders, and autoimmune and neurodegenerative diseases3. However, the underlying molecular mechanisms are incompletely understood. Most studies have focused on metabolites, extracellular microorganism-associated molecular patterns, or community-level microbiome properties4. However, the role of intracellular bacteria–host protein interactions is largely unexplored. The potential impact of such interkingdom protein interactions is illustrated by viral proteins in asymptomatic or non-acute infections, which can influence cellular signalling and cell physiology and thereby contribute to complex diseases in a likely host-genetics dependent manner5,6,7,8.
In bacteria, the type III secretion system (T3SS) is a well-characterized apparatus for delivering proteins into eukaryotic cells. The T3SS is a highly conserved ‘needle and syringe’-like machinery found in Pseusomonadota to inject bacterial proteins into host cells9. T3SS and their substrate effectors have been studied almost exclusively in human and plant pathogens such as Yersinia, Pseudomonas or Salmonella, for which protein translocation is a key pathogenic strategy. In the host, translocated effectors manipulate cellular processes including cytoskeletal dynamics or immune signalling to ensure bacterial survival and promote transmission9. Thus, the T3SS has traditionally been framed as a virulence determinant.
Insights from plant and insect systems challenge this pathogen-centric view of the T3SS. Many commensal and beneficial microorganisms for these hosts deploy T3SS or analogous machinery to translocate proteins that promote symbiosis or fine-tune host immunity10,11,12. Effector–host protein–protein interaction maps in plants further reveal that both pathogenic and mutualistic microorganisms converge on central host signalling nodes, suggesting conserved principles by which injected proteins can modulate eukaryotic biology across diverse symbioses12,13,14.
Despite the emerging broader conceptual importance of protein injection and cross-kingdom protein interactions in diverse microorganism–host systems, it is unknown whether analogous mechanisms operate in the healthy human gut. Here we investigate the distribution, diversity and host interactions of T3SS and their effectors in commensal Pseudomonadota from human guts. By integrating comparative genomics, structural prediction, functional assays and host protein interaction networks, we uncover an underappreciated layer of direct, protein-mediated communication between commensal microorganisms and the human host, with implications for immune modulation, microbial competition and complex disease biology.
Results
T3SS are common in the human gut microbiome
We first analysed reference genomes of Pseudomonadota strains from healthy gut and stool samples isolated, for example, by the human microbiome project15. Using EffectiveDB16, a widely used tool for secretion system identification, we detected complete T3SS in 44 of the 77 genomes (Supplementary Data 1). To expand the scope, we analysed genomes of 4,752 phylogenetically diverse strains of the human intestinal bacteria collection (HiBC)17, Broad Institute–OpenBiome Microbiome Library (BIO–ML)18 and Global Microbiome Conservancy (GMC)19. Of the 568 Pseudomonadota genomes, 449 (79%) have complete T3SS (Extended Data Fig. 1); similar proportions have T4SS (315) and T6SS (474), which may also inject effectors into host cells among other functions20 (Extended Data Fig. 1 and Supplementary Data 2). Together, 527 of the 568 Pseudomonatoda genomes (92%) have at least one host-directed secretion system. Because culturing can bias taxon representations, we also screened 16,179 high- and intermediate-quality Pseudomonadota metagenome-assembled genomes (MAGs)21,22,23, finding complete T3SS in 770 (5%) MAGs (Extended Data Fig. 1 and Supplementary Data 3). Notably, T3SS were only detected in Gammaproteobacteria, but not in Beta- or Epsilon-proteobacteria (except in Helicobacter strains), and were especially common among Escherichia (Fig. 1a and Supplementary Data 3). Among the T3SS-positive (T3SS+) species, 24 matched representatives in two cohorts of the Weizmann Institute of Science24: 59.4% of individuals in the Israeli and 47.1% in the Dutch cohort harboured potentially T3SS+ species in their gut microbiome at 0.80% and 0.48% relative abundance, respectively, with Escherichia coli being the most common. These observations indicate that T3SS+ strains are common members of the human gut microbiota and motivated our further investigation.
a, Most abundant genera, species and genomes encoding complete T3SS among reference strains and MAGs from the human gut. b, Sequence similarity of 3,002 candidate commensal T3SS effectors with 1,195 effectors from pathogenic bacteria across alignment coverages (bottom left). Each dot represents a pairwise sequence comparison. Dot colour indicates effectors with significant and non-significant Jackhmmer results (inset legend) indicating homology to pathogen effectors. Marginal histograms display the aggregated distribution of alignment coverage (top) and aggregated sequence similarity (right), with colour indicating Jackhmmer outcome. c, Left, number of the structure clusters observed in FoldSeek analysis (red arrow) compared to random expectation for that group (homogeneous or mixed) in grey (exp. empirical P < 0.0001, n = 10,000, two-sided label permutation test). Middle, example structures for one cluster in the group; small networks are representative structure clusters for the group with an anchor structure in the centre and similar structures connected by links. Donut plots: proportion of proteins with origins indicated by colour in all clusters of (homogeneous or mixed) the structure-cluster group. d, Selection of 18 commensal Pseudomonadota strains for subsequent functional analyses. Numbers indicate the count of shared effectors at >90% mutual sequence similarity across 90% sequence length. e, Injection of indicated effectors by wild-type and ΔsctV (T3SS-defective) Salmonella Typhimurium into HeLa cells detected by luminescence of reconstituted nano-luciferase (y axis). Control pathogen effectors (left): sseJ (A0A0F6B1Q8), sopA (Q8ZNR3) and pipB2 (A0A0F6B5H5) from Salmonella Typhimurium; yopJ (A0A0N9NCU6) from Yersinia pseudotuberculosis; and ipaH9.8 (Q8VSC3) and ospG (Q99PZ6) from Shigella flexneri. SipA is an assay control used as reference. Asterisks denote statistically significant differences between the wild-type and ΔsctV-negative strains (two-sided Wilcoxon test; five biological repeats with four technical repeats each). f, Injection of effectors from gut commensal Edwardsiella tarda into HeLa cells. SipA tested in wild-type and ΔsctV Salmonella Typhimurium were used as positive and negative controls, respectively (two-sided Wilcoxon test, *P < 0.05, **P < 0.001; NS, not significant; seven biological repeats with four technical repeats each). Raw data and precise P values for all panels are found in Supplementary Data 1, 3, 5, 6 and 10 as described in Supplementary Information. Boxplots (e,f) show the median (centre line) and the interquartile range (IQR, box), with whiskers extending to minimum and maximum values within 1.5× IQR.
Commensal effectors are unrelated to known pathogen effectors
Using three complementary machine-learning models25,26,27, 3,002 effector candidates were confidently predicted in the T3SS+ reference genomes (hereafter: strain effectors) (Supplementary Data 4) and 182 in the 770 T3SS+ MAGs (meta-effectors) (Supplementary Data 4). Because T3SS effectors are classically associated with pathogenicity, we compared these candidate effectors to 1,195 T3SS effectors of known pathogens28. Only 17 out of 3,002 (0.5%) strain effectors and 6 out of 182 (3%) meta-effectors showed high sequence similarity to those of pathogens (≥90% across ≥90% length) (Supplementary Data 5). To find weak similarities, we performed iterative jackhmmer29 searches against ~124 M non-redundant bacterial sequences from UniRef90. Yet, even with this sensitive approach, significant similarity to pathogen effectors was found only for 155 commensal strain effectors (~5%) and 42 meta-effectors (22.5%) (Fig. 1b and Extended Data Fig. 1).
As effectors can be structurally related despite sequence divergence, we clustered AlphaFold30-predicted tertiary structures using FoldSeek31 for a structural comparison. Surprisingly, homogeneous clusters with effectors from only commensal strains or pathogens were highly overrepresented, whereas mixed clusters II and III, reflecting common structures of effectors from pathogens and commensal strains, were depleted (Fig. 1c and Supplementary Data 6; P << 0,0001, empirical P values). Meta-effectors clustered exclusively with strain effectors, albeit close to random expectation. All results were robust over varying FoldSeek parameters and when considering only vertebrate or human pathogens (Supplementary Data 6). Thus, candidate effectors in T3SS+ strains from healthy human guts markedly differ from pathogen effectors in both sequence and structure.
We analysed all candidate effectors from the strains for annotated domains. Besides 860 proteins without any identifiable domain, among the most common finds were the diguanylate cyclase, GGDEF domain (PF00990) (58 effectors), and EAL domain (PF00563) (50 effectors), none of which was found in pathogen effectors (Supplementary Data 5). Cyclic diguanylate is a known second messenger in bacterial signal transduction, and the EAL domain is thought to be a diguanylate phosphodiesterase, thus opposing the effect of the cyclase32. Furthermore, we observed a PAS-fold domain (PF08447) in 32 effectors, which can function as a ligand-binding sensor32 and in some effectors co-occurs with a guanylate cyclase domain. As cyclic dinucleotides recently emerged as important immune regulators in all kingdoms of life33, the observation that two domains acting on the same second messenger occur at high frequency among the commensal effector candidates makes a role for this signalling molecule in interkingdom communication plausible.
Injection of commensal candidate effectors into human cells
A key question is whether commensal candidate effectors get injected into human cells by T3SS. To enable functional studies, we cloned open reading frames (ORFs) encoding effectors from 18 bacterial strains (Fig. 1d, Extended Data Fig. 1, Supplementary Table 9 and Supplementary Data 7 and 8). The generated human microbiome effector ORFeome v1 (HuMEOme_v1) contains 910 sequence-verified, full-length ORFs representing 746 strain effectors and 164 meta-effectors (Supplementary Data 7). Cloning failure mainly resulted from failed PCR amplification without indications of toxicity. Using Salmonella enterica subsp. enterica sv. Typhimurium (S. Typhimurium) as a model, we established a nano-luciferase-based injection assay34 fusing an 11-amino acid Nano-Luc HiBiT tag to the C terminus of candidate effectors expressed in bacteria. HeLa cells stably expressed the complementary LgBiT fragment, so that effector injection reconstitutes functional nano-luciferase. Specificity was ensured by inclusion of the T3SS-defective ΔsctV mutant for all tests. Benchmarking with six pathogen effectors demonstrated effective translocation of four. Among 97 tested candidate effectors from 11 strains, 32 were specifically and significantly injected (Fig. 1e, Extended Data Fig. 1, Supplementary Table 9 and Supplementary Data 10). The slightly higher success rate for the positive controls probably reflects phylogenetic diversity and missing chaperones and cofactors. Thus, although some false effector identifications cannot be excluded, overall, our pipeline reliably identified bona fide T3SS substrate effectors from commensal strains in healthy human guts.
Next, we assessed the functionality of T3SS in the commensals. Of the 11 strains with at least one T3SS-injectable effector, 6 could not be tested due to antibiotic resistance or transformation failure. Whereas the two E. coli strains yielded no signals, Citrobacter pasteurii and Phytobacter massiliensis showed occasional signals suggesting sporadic activation of the T3SS. By contrast, Edwardsiella tarda reproducibly and significantly injected three out of four tested effectors into HeLa cells (Fig. 1f and Supplementary Data 10). Notably, only one, Eta_3, was also positive in the Salmonella system, supporting the notion that missing cofactors may have caused false negatives in the first experiment. Overall, these data demonstrate that functional T3SS are present in strains from healthy human guts and can deliver identified effectors into human cells.
A microbiome–host protein–protein meta-interactome map
Next, we explored possible functions of the commensal effectors by systematically mapping their physical interactions with host proteins using our multi-assay mapping pipeline35 (Extended Data Fig. 2). Screening all cloned effectors against the full human ORFeome9.136 identified 1,067 interactions constituting the human-microbiome meta-interactome (HuMMI) main dataset (HuMMIMAIN) (Fig. 2a). Three repeat screens with 290 effectors and 1,440 human proteins yielded 39 interactions (HuMMIRPT) and indicated a sampling sensitivity of ~32% for the main screen (Fig. 2b), matching previous studies37. Lastly, we addressed how sequence similarity affects effector interaction profiles. We grouped effectors with ≥30% sequence identity (Supplementary Data 11) and experimentally tested them against the union of their interactors from the main screen. The resulting HuMMIHOM dataset contains 394 interactions, of which 181 are non-redundant. Altogether, HuMMI contains 1,255 unique verified interactions between 286 effectors and 426 human proteins (Fig. 2a and Supplementary Data 11).
a, Verified human microbiome meta-interactome (HuMMI) map; coloured nodes indicate effectors from strains according to colour legend in f. Grey nodes represent human proteins; outer layer human proteins are targeted only by the nearest strain, central human proteins by effectors from multiple strains. b, Sampling sensitivity: saturation curve calculated from HuMMIRPT. Red dots represent the average of verified interactions found in any combination of indicated number of repeat screens; diamonds denote interaction counts per experiment over all sequential experiment combinations; error bars indicate standard deviation; black dots and line represent calculated saturation curve. c, Assay sensitivity: percentage of identified interactions from bhLit_BM-v1 (n = 54 pairs), bhRRS-v1 (n = 72 pairs), hsPRS-v2 (n = 60 pairs) and hsRRS-v2 (n = 78 pairs) in the Y2H system used for network mapping. Error bars represent the s.e. of proportion. d, Validation rate of a random sample of HuMMI interactions (n = 294 pair configurations) compared to four reference sets in the yN2H validation assay: bhLit_BM-v1 (n = 94 pair configurations), bhRRS-v1 (n = 144 pair configurations), hsPRS-v2 (n = 44 pair configurations) and hrRRS-v2 (n = 51 pair configurations). Two-sided Fisher’s exact test, *P = 0.04, ***P = 0.0006 (Supplementary Data 14). Error bars represent s.e. of proportion. e, Co-immunoprecipitation of MYC-tagged human proteins by Flag-tagged effectors or Flag–GFP as negative control. Input, cell lysates; green dots, successful co-immunoprecipitation; red dot, no co-immunoprecipitation; effector espG of Escherichia coli (Q7DB50) as positive control (one biological replicate). Molecular mass markers are given in kilodaltons. f, Most-targeted human proteins interacting with the indicated number of effectors from different strains. Colours represent strains according to indicated legend (full statistics in Supplementary Data 11). g, Most highly connected effectors interacting with the indicated number of human proteins (Supplementary Data 11). h, Observed number of effector-interacting human proteins compared to random expectation (two-sided permutation test, P < 0.0001; n = 10,000). i, Frequency distribution of human proteins targeted by effectors from the indicated number of different strains (red) compared to random expectation (two-sided permutation test, P = 0.004; n = 10,000).
To experimentally assess data quality, we assembled a positive control set of 67 well-documented binary interactions of pathogen effectors with human proteins (bacterial human literature binary multiple (bhLit_BM-v1)) and a negative control set of random effector–human protein pairs (bacterial host random reference set (bhRRS-v1)) (Supplementary Data 12). Benchmarking our yeast-two-hybrid (Y2H) assay with these, alongside established human reference sets (hsPRS-v2 and hsRRS-v2)38, indicated an assay sensitivity of 13% and 17.5%, respectively, matching previous observations35,37,38 (Fig. 2c and Supplementary Data 13). No negative control scored positive, demonstrating the reliability of our system. Next, we assessed the biophysical quality of HuMMI using the yeast nanoluciferase-two-hybrid assay (yN2H)38 benchmarked against the four reference sets. Across thresholds, sets with bacterial proteins yielded fewer positive-scoring pairs than the human sets (Extended Data Fig. 2). As this included the negative controls and no effector toxicity was observed, prokaryotic proteins appear harder to test in this assay system, reinforcing the need for tailored reference sets. The 172 randomly selected HuMMI interactions were statistically indistinguishable from the positive control sets (Fig. 2d, Extended Data Fig. 2 and Supplementary Data 14). Thus, the biophysical quality of HuMMI is on par with well-documented literature interactions.
We aimed to demonstrate that interactions can occur within the human cell environment. We performed immunoprecipitation experiments in HEK293 cells (RRID: CVCL_0045, DSMZ) using Flag-tagged effectors and negative control Flag–GFP as baits and detecting the MYC-tagged human interaction partner by western blot. Of 32 pairs including 4 positive controls, 18 pairs and 3 controls yielded meaningful data, while 10 could not be evaluated due to unspecific binding of the human protein (3) or poor expression (7). Only one of the control pairs was positive, whereas 13 of the 18 candidate pairs yielded detectable bands specifically in the effector immunoprecipitation (Fig. 2e and Extended Data Fig. 2). Together, these results demonstrate that HuMMI contains biophysically reliable interactions that are robustly detectable in different assays and occur in human cellular environments. Importantly, functional effects may go in both directions and while in most cases effectors probably perturb the host cell, intracellular immune receptors may also recognize effectors to then initiate defence responses.
We started the functional exploration by analysing the topology of the microorganism–host interaction network (Fig. 2f,g). The degree distribution of HuMMIMAIN shows that numerous human proteins interact with multiple effectors, often from different species (Fig. 2f and Supplementary Data 11). Random sampling demonstrates highly significant effector convergence on few host interactors (Fig. 2h), a phenomenon linked to the functional importance of the targeted host proteins as shown in plant–pathogen systems13. Moreover, interactions of human proteins with effectors from four bacterial strains are highly significant and unlikely to result from random processes (Fig. 2i and Supplementary Data 11). Thus, 60 human proteins are subject to effector convergence, highlighting their potential importance for microbiome–host interactions. To explore overlap with pathogen effectors, we extracted 265 high-quality binary interactions between 217 human proteins and 80 effectors from 17 pathogenic strains from IntAct39 (Supplementary Data 15). We found a numerically low, albeit significant, number of 12 human proteins targeted by both groups (P = 0.014, Fisher’s exact test, odds ratio = 2.26), of which 3 are subject to convergence by commensal effectors (P = 0.067, Fisher’s exact test, odds ratio = 3.37, Supplementary Data 11). Although limited by sample size, experimental differences and the non-systematic nature of the pathogen data, these findings support both overlap and lifestyle-dependent specificity in commensal and pathogenic effector targeting40.
Structural features mediating effector–host interactions
Many inference approaches assume that sequence similarity implies functional and interaction similarity, and such similarity could also underlie convergence. However, in the homology clusters of the systematically tested HuMMIHOM (Fig. 3a), we found that sequence and interaction similarity are only poorly correlated; instead, sequence similarity merely defines an upper limit for interaction similarity. For instance, cluster 3 contains 7 effectors sharing >90% sequence similarity, yet their interaction profile similarities range from identical to complementary (Fig. 3b,c and Supplementary Data 16). Conversely, clustering effectors unrelated in sequence and structure by their pairwise interaction similarity in HuMMIMAIN identified substantial overlap outside homology clusters (Extended Data Fig. 3), suggesting that dissimilar effectors can have similar functions in the host. Thus, host effector function as measured by protein-interaction profiles is largely independent of overall sequence similarity.
a, Schematic representation of the systematic interaction profiling of homologous effectors. b, Scatterplot of mutual sequence similarity and Jaccard interaction similarity for all effector pairs in the indicated homology groups. The union of human proteins targeted by each effector pair is indicated by node size as denoted in the legend. Individual data in Supplementary Data 16. c, Yeast growth in one representative of four repeats of all effector–human interactions tested for homology cluster 3. d, Left, proportions of human-protein targets interacting with the same bacterial effector, grouped by interface similarity on the basis of the Jaccard index (JI) of target-contacting residues, categorized as: distinct (JI ≤ 0.1), overlapping (0.1 < JI < 0.6) and same (JI ≥ 0.6). Right, Mmo_5 interacts with example human proteins via the same interface (top, 66% overlap), whereas Pfa_4 uses distinct interfaces. e, Proportions of different effectors interacting with the same human protein, grouped by interface similarity as in d. Example interface models (right) show effector binding to the same (Pfa_9, Pse_2) or distinct (Pfa_9, Yen_2) interfaces on human TCF4. f, Count of domain–motif interfaces identified in HuMMIMAIN matching at least one stringency criterion (arrow) compared to random expectation (one-sided permutation test, P = 0.0137; n = 10,000). All data related to c–f are available in Supplementary Data 17. g, Results of holdup assay and comparison with Y2H results. Indicated PDZ domains of human proteins shown on y axis were tested against 10-amino-acid C-terminal peptides of the effectors indicated on top. Calculated dissociation constant (Kd) values as indicated. Overlap between holdup (HU) and Y2H on protein level is indicated by coloured frames. Precise P values and n for each test are shown in Supplementary Data 19.
To gain structural insights and potential functional leads, we modelled effector–host protein interactions using AlphaFold-Multimer, obtaining predictions for 123 pairs (10%). For proteins with multiple interactors, we classified interfaces as ‘same’ (≥60% shared contacting residues), ‘different’ (<10% overlap) or ‘overlapping’ (Fig. 3d,e and Extended Data Fig. 3). For instance, Mmo_5 binds to TPD52L1 and BORCS6 via the same interface, whereas Pfa_4 interacts with NOTO and LBX1 with different interfaces, possibly enabling simultaneous interactions with both (Fig. 3d and Supplementary Data 17).
Analogously, Pse_2 and Pfa_9 bind to the same interface of TCF4, whereas Yen_2 targets a different part of the protein (Fig. 3e). Identical interface binding was more frequent on human proteins than on effectors, suggesting the importance of targeting functions linked to specific domains. Mapping the binding interfaces to domain annotations strengthened this hypothesis, as even effectors binding via different interfaces may target the same domain (for example, the DNA-binding domain of LBX1). More commonly, however, effectors with different interfaces bind to distinguishable parts of the host protein. Efe_11 and Kpn_9 bind the same interface in the TRAF2 E3 ubiquitin ligase domain, whereas Pem_8 targets the C-terminal MATH domain of TRAF2, which mediates trimerization and receptor binding. Similarly, on REL, Pma_4 binds the DNA-binding and Yen_11 the dimerization domain.
Beyond large interfaces, many interactions are mediated by short linear motifs (SLiMs) in intrinsically disordered regions that bind to specific protein pocket-forming domains41. As AlphaFold often misses such interactions42, we used the orthogonal mimicINT approach to identify SLiM–domain interactions, which matches interaction pairs to known SLiM–domain templates43 (Fig. 3f). This identified putative interfaces for 54 HuMMIMAIN interactions involving bacterial host-like SLiMs binding to human domains (Supplementary Data 18), of which 51 passed at least one (Fig. 3f, P = 0.0137, exp. P value) and 22 passed two stringency criteria (Extended Data Fig. 3, P = 0.0005, exp. P value). Some of the matched motifs encompass phosphorylation sites that interact with kinases or phosphorylation-dependent binding domains such as SH2 domains. Conversely, although several commensal effectors encode predicted enzymatic domains (Supplementary Data 5), using an analogous approach we found no case in which these engage cognate substrate motifs on host proteins, and only a single effector-domain–SLiM match consistent with known docking specificity: the calcineurin-like phosphoesterase domain (PF00149) of Efe_1 and the canonical LxVP docking motif in VAC14. The largest group of 23 interactions involved PDZ domains in human proteins binding PDZ-binding motifs (PBM) in the C terminus of the bacterial interaction partners. PDZ domain-containing proteins commonly mediate functions important for microorganism–host interactions including cell–cell adhesion, protein trafficking and immune signalling44. To experimentally validate these interfaces, individual and tandem PDZ domains from 13 human proteins and C-terminal peptides from 16 interacting bacterial effectors were tested via the quantitative in vitro interaction holdup assay 45. Of 23 Y2H pairs, 16 (70%) showed at least one PDZ–peptide interaction, thus validating the mode of interaction (Fig. 3g and Supplementary Data 19). In three instances, two PDZ domains arranged in tandem were required for the interaction, suggesting that some Y2H pairs might have been missed by the holdup method due to untested combinations. As for the predicted globular interfaces, for human proteins with multiple PDZ domains, different effectors often target different domains, demonstrating specificity and functional specialization (Fig. 3g). Thus, while overall effector sequence similarity does not correlate with interaction profiles, structural modelling showed that some effectors target similar interfaces and domains, suggesting shared functions, whereas others bind distinct domains pointing to functional specialization.
Effector-targeted functions and disease modules
We explored effector target functions using Gene Ontology (GO) enrichment (Fig. 4a, Extended Data Fig. 3 and Supplementary Data 20). Among the most enriched functions was ‘response to muramyl dipeptide (MDP)’, a bacterial cell wall-derived peptide. Intriguingly, the MDP receptor, NOD2, is a major susceptibility gene for Crohn’s disease46, a gut autoimmune disease with a strong aetiological microbiome contribution46. Central immune signalling pathways are also enriched, namely, the NF-κB and the stress-activated protein kinase and Jun-N-terminal kinase (SAPK/JNK) pathways. Remarkably, five significantly targeted convergence proteins belong to the NF-κB module (Extended Data Fig. 3), one of the evolutionarily oldest immune pathways in animals47. Using the Recon3D human genome-scale metabolic model48, we further found significant enrichment for metabolic enzymes among the human interactors (P = 0.0001, Fisher’s exact test); however, beyond glycerophospholipid metabolism, no metabolic subsystem stood out (Supplementary Data 20). Finally, we compared commensal-targeted functions to those of pathogens (Supplementary Data 20). Some pathways were common to targets of both groups, such as ‘NF-κB signalling’, whereas others are specific to commensals including ‘collagen biosynthesis’ and ‘response to muramyl dipeptide’. These findings reinforce the notion of lifestyle-dependent specificity and functional overlap in the molecular interactions of commensals and pathogens with the human host.
a, Odds ratios of representative functional annotations enriched among effector-targeted human proteins (FDR < 0.05, Fisher’s exact test with Bonferroni FDR correction). Terms (#) show the number of represented terms. The lowest and highest odds ratios observed for the represented group are indicated by light shaded areas of bars. Black line indicates odds ratios for shown representative terms. White triangles indicate functions also enriched in pathogen targets. b, Genetic predisposition for traits and diseases enriched among human genes encoding effector-interacting proteins in HuRI (α = 0.05, Fisher’s exact test; n = 349). The odds ratio in a and b estimates the effect size of significant function/trait (two-sided Fisher’s exact test FDR < 0.05) and is calculated as the odds of function-annotated/trait-associated human genes encoding effector targets to function-unannotated/trait-unassociated human genes encoding effector targets in the target set, divided by the same ratio in the HuRI set (see Methods). c, Disease groups for which genetic predisposition proteins are enriched in network neighbourhoods of effectors of the indicated strains. Trait node size corresponds to number of significantly targeted traits in that group as indicated in the legend. Thickness of strain–group edges reflects the number of underlying significant effector–trait links (α < 0.01 and odds ratio > 3, two-sided Fisher’s exact test). d, Specific diseases underlying the ‘immunological’ group in c. Node size reflects the number of underlying effector–trait associations as indicated in the legend. Precise P values and n for all tests are provided in Supplementary Data 23.
We wondered whether perturbations by commensal effectors could influence non-infectious human diseases, starting our analysis at the network level. Genetic variants, but also viruses, contribute to complex diseases by often subtly altering intracellular networks and disease-relevant functions. We first explored whether commensal effectors target proteins that are genetically relevant for diseases and other traits. We used ‘causal genes’ identified from genome-wide-association studies (GWAS) by the Open Targets initiative49 to identify the encoded ‘disease proteins’, and unified traits by the Experimental Factor Ontology (EFO)50 (Fig. 4b and Supplementary Data 21). The strong enrichment for ‘immunoglobulin isotype switching’ is intriguing as the evolutionarily older IgA antibodies have important roles in shaping the gut microbiome51. Effector targets are also associated with cancers and immune diseases, such as psoriasis, asthma, allergies and systemic lupus erythematosus, although none of these predominantly affect the gut. Given the abundance of immune-related measurement traits, it is possible that the effectors systemically perturb immune signalling and thereby contribute to lung and skin diseases. Alternatively, convergence proteins such as REL or TCF4 (Fig. 2f,h) may also be targeted by local microbiota in skin or lung tissues. Supporting this, 26% of HuMMI effectors are detectable in skin microbiome samples, suggesting that commensal effectors are shared across ecological niches (Supplementary Data 22).
In addition to disease proteins being direct targets, we previously discovered that relevant genetic variation often resides in their protein interaction neighbourhood13,35. To explore these, we performed short random walks in the binary human reference interactome (HuRI)36 and defined ‘neighbourhood’ as proteins visited significantly more often in HuRI than in degree-preserved randomly rewired control networks. In these neighbourhoods, we assessed disease-protein enrichment using Open Targets, aggregated nominally significant associations at the strain level, and summarized them by disease group (Fig. 4c and Supplementary Data 23). Most disease groups we found are known to be affected by the gut microbiome3. Among immune diseases, inflammatory bowel disease (IBD) was enriched (nominal P = 0.0008, Fisher’s exact test), particularly Crohn’s disease(nominal P = 8.5 × 10−5, Fisher’s exact test) but not ulcerative colitis (Fig. 4d and Supplementary Data 23). As for direct targets, neighbourhoods also harboured susceptibility for skin and lung diseases such as asthma and psoriasis. Considering the microbiota’s relevance for metabolic disorders, effector targeting of neighbourhoods affecting high- and low-density lipoprotein (HDL and LDL, respectively) cholesterol levels (nominal P = 0.006 and P = 0.008, respectively, Fisher’s exact test) and several diabetes-related traits is notable (Supplementary Data 23). Together, these findings suggest that commensal effectors modulate host immune signalling and the local metabolic and structural microenvironment. As the targeted proteins and neighbourhoods are genetically associated with several diseases, modulation of their functions by effectors may contribute to disease aetiology.
Effector function in human cells and disease
We sought to experimentally verify that commensal effectors perturb some of the identified pathways and functions. We focused on NF-κB signalling, which is central to many diseases and emerged repeatedly in our study. Using a dual-luciferase assay35 in HEK293 cells (RRID: CVCL_0045, DSMZ), 5 out of 26 commensal effectors significantly activated NF-κB activity in the absence of stimulation (Fig. 5a,b and Supplementary Data 24), while 3 effectors reduced NF-κB activity under strong TNF stimulation (Fig. 5b, Extended Data Fig. 4 and Supplementary Data 24). Next, we assessed whether the effectors modulate NF-κB signalling in unstimulated Caco-2 cells and after pro-inflammatory stimulation (Extended Data Fig. 4). Consistently, Cpa_12, an ABC domain-containing effector, reduced secretion of several cytokines with and without stimulation (Fig. 5c and Supplementary Data 25). Other effectors enhanced cytokine responses, particularly IL-6 and IL-8, only after Pam3CSK4 stimulation, but not after TNF or flagellin stimulation (Fig. 5d, Extended Data Fig. 4 and Supplementary Data 25). Pam3CSK4 mimics TLR1/2 activation by triacylated lipopeptides abundant in Gram-positive Bacteroidetes, while flagellin mimics TLR5 activation by Gram-negative Pseudomonadota52. Thus, commensal effectors exert complex effects on intracellular immune signalling.
a, Relative NF-κB transcription reporter activity in HEK293 cells expressing the indicated effectors at baseline conditions (no TNF) (Kruskal–Wallis test with Dunn’s post hoc comparisons, *P < 0.05, **P < 0.01; n = 4 biological replicates). Boxes represent IQR, black line indicates the mean, whiskers indicate highest and lowest data point within 1.5× IQR. b, Summary of significant influence of effectors on normalized NF-κB transcriptional reporter activity at baseline conditions (−TNF) and after TNF stimulation (+TNF) (Kruskal–Wallis test with Dunn’s post hoc comparisons, *P < 0.05, **P < 0.01; n = 4 biological replicates). c,d, Concentration of cytokines secreted by Caco-2 cells transfected with indicated effectors at basal conditions (unstim) or after stimulation with a proinflammatory cocktail (stim) (c) or with Pam3CSK4 (d). EV, empty vector mock control. Numbers above brackets indicate P values calculated by Kruskal–Wallis test with Dunn’s post hoc comparisons; n = 3 (c) and n = 5 (d) biological replicates. Boxes represent IQR, black line indicates the mean, circles indicate individual data points. e, Radial barplot showing fold change in prevalence of 122 bacterial effectors in metagenomes of individuals with Crohn’s disease (CD; n = 504 patient samples, orange) or ulcerative colitis (UC; n = 302 patient samples, purple) relative to healthy controls (n = 334 samples). Fold changes were calculated using pseudo-counts from healthy controls (Supplementary Data 26). Labels indicate effectors with significant prevalence in either Crohn’s disease or ulcerative colitis (FDR < 0.01, Fisher’s exact test with BH correction). Black asterisks mark statistical significance for individual bars. f, Prevalence of indicated effectors in metagenomes of individuals with Crohn’s disease (n = 504) and ulcerative colitis (n = 302) compared to healthy controls (FDR < 0.01, two-sided Fisher’s exact test, BH correction). g, HuMMI subnetworks showing human proteins (grey) associated with Crohn’s disease (orange border) or ulcerative colitis (purple border) interacting with effectors (coloured nodes) from strains enriched and depleted in patient metagenomes. Effector colours indicated in legend. Edges represent protein–protein interactions in HuMMI.
Given the genetic and functional links between commensal effectors and IBD, we wondered whether clinical data support a potential role of effectors in the disease. Hypothesizing that causal involvement in IBD aetiology may be reflected in altered effector prevalence, we analysed a metagenome study with over 800 individuals with IBD (504 Crohn’s disease, 302 ulcerative colitis) and 334 healthy controls53. Focusing on effectors with physical interactions in HuMMI, 64 effectors were significantly more prevalent in individuals with Crohn’s disease compared with healthy controls, whereas effectors were less common in individuals with ulcerative colitis (Fig. 5e,f and Supplementary Data 26). These opposing trends were unexpected as Pseudomonadota abundance reportedly increases in both IBDs46,54. Some hypotheses for mechanisms underlying this observation emerged from HuMMI: effectors from K. pneumonia, E. coli and E. fergusonii, all highly prevalent in Crohn’s disease, interact with the Crohn’s disease susceptibility protein COG6, which directly interacts with the ulcerative colitis susceptibility gene RTP5 (Fig. 5g). Similarly, Efe_13 of E. fergusonii binds the Crohn’s disease susceptibility protein TNIP1, which functions in NF-κB signalling and interacts with two genes associated with ulcerative colitis. Other enriched effectors show indirect links to Crohn’s disease and ulcerative colitis proteins via shared interaction partners (Fig. 5g). While the mechanistic relevance of these interactions requires future studies, these direct and indirect connections to IBD disease proteins invite speculation that they may cause a homeostatic shift that increase the risk for Crohn’s disease while decreasing the same for ulcerative colitis.
Discussion
T3SS are traditionally viewed as hallmarks of pathogenicity, yet in plants and insects they mediate a wider range of functions including beneficial interactions55. Our findings extend this observation to the human gut, revealing that T3SS are unexpectedly common among commensal Pseudomonadota. In particular, E. coli, which resides close to the intestinal epithelium56, frequently encodes complete systems. Although not detected in commensal beta- or delta-Pseudomonadota, divergent systems may have escaped current detection tools, thus underestimating their true distribution. Functional assays validated our predictions and revealed regulatory complexity: C. pasteurii and P. massiliensis showed inconsistent T3SS activation, whereas E. tarda reliably injected effectors into human cells. Using S. Typhimurium as a heterologous host that robustly initiates the T3SS, we confirmed translocation of 32 effectors from 11 species, indicating that many commensals harbour host-directed secretion capability. The regulatory diversity of T3SS activation is consistent with the idea that, in contrast to pathogens such as S. Typhimurium, commensals may require highly specific host or environmental cues to activate secretion. Whether human epithelial or immune cells, akin to plant hosts57, can actively signal to commensals to induce T3SS, or whether secretion primarily reflects stress responses of potential pathobionts, remains an important area for future investigation.
Interpreting T3SS functionality in the human gut requires moving beyond species-level labels such as ‘commensal’ or ‘pathogenic’, which often obscure substantial within-species diversity. As observed in other host kingdoms58, these categories are fluid: E. coli includes both highly pathogenic lineages (for example, EPEC or EHEC) and harmless or beneficial ones, such as the probiotic E. coli Nissle 191759. In our analyses, strains isolated from apparently healthy individuals were considered commensals, whereas strains encoding known virulence effectors28, including P. aeruginosa and Salmonella spp., were designated pathogens. Importantly, between these poles lie opportunistic pathogens whose infectious potential emerges only in specific environmental or host-related conditions. A key question is therefore whether commensal T3SS primarily support opportunistic pathogenicity, or whether they have adaptive functions in the non-pathogenic lifestyle. Multiple lines of evidence from our study support the latter.
Comparative sequence, structure and host-target analyses revealed that commensal and pathogen effector repertoires are largely distinct, supporting a model in which commensal T3SS are adapted for cooperative rather than pathogenic interactions. Homotypic clustering of effector structures and depletion of mixed commensal–pathogen clusters suggest that commensal effectors follow separate selective trajectories. The domain analysis supports this, revealing many domains found only in commensal effectors possibly to support a non-pathogenic lifestyle. Notably, numerous effectors involved in cyclic diguanylate synthesis or degradation were identified, often paired with PAS sensor domains suggesting environmentally responsive functions. Intriguingly, several effectors from Gram-negative commensal Pseudomonadota potentiated Pam3CSK4-induced TLR1/2 signalling, suggesting that T3SS may modulate host responses to Gram-positive Bacteroidetes and thereby influence interphyla competition within the gut ecosystem.
Despite substantial divergence in effector structures, commensal and pathogen effectors exhibited both shared and unique host interactions within the meta-interactome. Although these comparisons are limited by the availability of hypothesis-driven interaction datasets for pathogen effectors, the observed patterns parallel findings in plant systems, where effectors from mutualists and pathogens converge on some common host targets while also interacting with proteins critical for distinct outcomes40. Across systems, convergence on a subset of host proteins emerges as a signature with biological importance13,40. These convergence proteins therefore emerge as key nodes in host–microorganism interactions and understanding their role in commensal versus pathogenic contexts is a promising entry point for understanding how pathogenicity emerges and how balanced immune responses are ensured.
The interaction–structure models provide leads for dissecting effector mechanisms. Targeted host protein domains can indicate which processes an effector may perturb and, when mediated by a corresponding motif, whether the effector may get post-translationally modified, as seen for H. pylori CagA60. Conversely, post-translational modification of host proteins is a common mechanism of pathogenicity61. A manual analysis matching the mimicINT workflow, however, revealed no clear cases in which these engage cognate host substrate motifs and only a single example consistent with known docking specificity. Whether this reflects differences between commensal and pathogen effectors, or the prevalence of functional mimicry without sequence similarity61, or merely limitations of our approach, remain to be clarified. Post-translational modification of effectors by host enzymes may either enhance effector function or act in host defence, such as by targeting foreign proteins for degradation. The latter would, however, be expected to select against motif retention. Thus, while biochemical directionality from host domain to effector SLiM is plausible, the available evidence suggests that such modifications predominantly support the lifestyle of the injecting bacterium. When commensals act as pathobionts and contribute to non-communicable diseases, such interactions may become intervention targets.
Analysis of host pathways targeted by commensal effectors indeed revealed enrichment for proteins and genetic variation implicated in immune disorders, cancers and metabolic traits. Notably, commensal effectors target network neighbourhoods associated with Crohn’s disease but not ulcerative colitis, and physically and functionally interact with key members of the TNF–NF-κB signalling axis. Consistent with these molecular data, T3SS effectors were enriched in the microbiomes of individuals with Crohn’s disease while being depleted in ulcerative colitis. This pattern mirrors the differential clinical response to anti-TNF therapy, which is highly effective in Crohn’s disease but not in ulcerative colitis. Understanding whether and how commensal effectors directly contribute to Crohn’s disease risk or flares, and whether they may even confer protection in ulcerative colitis, are compelling questions for future mechanistic studies with potential therapeutic implications.
Together, our data position host-directed secretion as an underappreciated mode of communication between the human microbiota and its host. By integrating genomic, structural, functional and systems-level analyses, we provide an initial map of the commensal T3SS meta-interactome and establish a framework for exploring its roles across microbial niches, host genotypes and disease states. These findings broaden the conceptual boundaries of T3SS biology and highlight the need to examine secretion systems not only as virulence factors but also as potential modulators of mutualism, competition and host physiology within the human gut.
Methods
Identification of T3SS+ strains and candidate effectors in culture collections and MAGs
Reference genomes for Pseudomonadota strains isolated by the human microbiome project from human guts and available from DSMZ (via BacDive), ATCC (atcc.org) or BEI (beiresources.org) were identified and cross-referenced with GenBank (release 229), yielding 77 matches, and subjected to T3SS identification, along with 92,143 and 9,367 MAGs, respectively, from two different meta-studies21,22 that were at least 50% complete and less than 5% contaminated. Prediction performance of EffectiveDB16 was evaluated by fivefold cross-validation with five repeats using simulated MAGs of 0–100% completeness and 0–50% contamination (in 5% steps) by random sampling genes from the test set. A performance-improved re-implementation of the EffectiveDB classifier (https://github.com/univieCUBE/phenotrex, trained on EggNOG 4.0 annotations62) was used with a positive prediction threshold of >0.7.
For 770 T3SS+, MAGs protein coding sequences for 474,871 representative proteins were identified using prodigal (v.2.6.3)63 and CD-HIT (v.4.8.1, parameters: ‘-c 1.0’)64. A total of 61,115 proteins were encoded by 44 T3SS+ culture collection genomes. Three machine-learning tools (EffectiveT3 v.2.0.1, DeepT3 v.2.025 and pEffect27) were used to predict T3SS signal or effector homology. Predictions were integrated using a 0–2 scoring scheme: 2 for perfect score (pEffect >90, EffectiveT3 >0.9999, DeepT3: both classifiers positive prediction); 1 for positive prediction at default thresholds (pEffect >50, EffectiveT3 >0.95, DeepT3: one classifier); 0 for negatives. Sequences with a sum score above 4 were regarded as potential effectors. Sequences lacking start/stop codons or containing transmembrane regions (TMHMM 2.0) were excluded. Proteins were clustered using 90% sequence identity (CD-HIT parameters: ‘-c 0.9 -s 0.9’) to reduce redundancy. Effector clusters with diverse effector-prediction scores were removed (full data in Supplementary Data 1 and 2).
Cohort analyses
T3SS were analogously predicted for 4,753 strains from the human gastrointestinal bacteria genome collection (HBC)17, Broad Institute–Open Biome Microbiome Library (BIO–ML) and Global Microbiome Conservancy (GMC)18,19. To obtain phylogenetic relationships for T3SS+ strains, concatenated bac120 marker proteins from GTDB-Tk (v.2.1)65 were used. T3SS+ genomes were matched to Weizmann Institute of Science representative genomes of the human gut24 with FastANI v.1.0 using average nucleotide identity (ANI) values >95% (ref. 66). The relative abundance of the 10 matching representatives was identified across 3,096 Israeli and 1,528 Dutch individuals24.
Identification of effector similarities and homology groups
Effectors were aligned using the Needleman Wunsch algorithm and were considered ‘homologous’ in HuMMIHOM using mutual sequence identity of ≥30% over 90% of the common sequence length (Supplementary Data 11).
Commensal vs pathogen effector similarity
Sequences of 1,195 known pathogenic T3 effectors were obtained from BastionHub28 (29 August 2022), and sequence similarity between commensal and pathogenic effector sequences was assessed using BLAST (v.2.10)67. For each commensal effector, the pathogen effector with the highest sequence similarity was considered as the best match and used to calculate alignment coverage. Additional significant similarities were identified using iterative sequence searches against ~124 M non-redundant bacterial sequences from UniRef90 (January 2024) with Jackhmmer29. For each commensal effector, we ran five iterations using inclusion and comparison E-value thresholds of <10−5 (Supplementary Data 5).
Domain identification and analysis
Protein domain annotation for effectors was carried out using the standalone version of InterProScan (v.5.75-106.0), using Pfam v.37.4 as reference. Amino acid sequences in FASTA format were used as input across three datasets: effector proteins from commensal bacterial strains (n = 3,002) and MAGs (n = 186), human and vertebrate pathogen effectors obtained from BastionHub, and all reviewed human proteins from the UniProtKB/Swiss-Prot reference proteome. Pathogen effectors were classified on the basis of the host annotation (human, vertebrate) of the corresponding species or strain, as provided by PHI-base68 and BV-BRC69. InterProScan used (translated) protein sequences (Supplementary Data 4) with default parameters. Domain hits with an E-value < 10−5 were considered significant.
Domains identified as significant in commensal effectors were used as reference for comparative analysis and evaluated for their presence in pathogen effectors and human proteins, applying the same annotation criteria and significance threshold. All domain annotation results, including individual hits across datasets and the comparative summary, are provided in Supplementary Data 5.
Structural effector similarity
Structures of pathogen and commensal effectors were compared using FoldSeek70. Effector structures were downloaded from the AlphaFold DB when available; otherwise, a model with >95% sequence identity and >90% sequence coverage was selected as representative. Clustering was performed by setting bidirectional query coverages (qcov) at 0.5, 0.7 and 0.9, and E-value thresholds at 0.001, 0.01 and 1 using FoldSeek Cluster’s greedy set cover algorithm. To assess the statistical significance of the obtained cluster distributions, we performed label permutation tests (n = 10,000) while keeping the graph’s topology intact. The clustering analysis was run for all commensals against three sets of pathogen effectors: all pathogens (895 structures from human, vertebrate and plant pathogens), human and vertebrate pathogens (536 structures) and human pathogen effectors only (488 structures) (Supplementary Data 6).
Effector cloning
For PCR cloning, genomic DNA or bacterial stocks were obtained from culture collections: ATCC (via LGC Standards, Wesel, Germany, or ATCC, Manassas, VA, USA), DSMZ (Leibniz Institute DSMZ, Braunschweig, Germany) and BEI resources (Manassas, VA, USA) (Supplementary Data 9). Live strains were cultured according to supplier protocols and DNA was extracted using the NucleosSpin Plasmid mini kit. Effectors were cloned into pENTR223.1 by nested PCR to add Sfi sites and by restriction enzyme-based cloning using standard protocols, and verified by Sanger sequencing. Effectors identified from MAGs and effectors for the PRS were synthesized by Twist Bioscience. For experiments, effectors were moved into pDEST-DB (pPC97, Cen origin), the pDEST-N2H-N1 and -N2 and pMH-Flag-HA by Gateway LR reactions.
For bacterial injection assays, effector ORFs were cloned into a modified bacterial expression plasmid based on the pEYFP backbone (BD Biosciences, 6004-1). The EYFP sequence (positions 217–1,407) was removed and replaced with (1) SfiI and XbaI restriction sites for directional cloning of effector ORFs, (2) a 3× Flag epitope tag, (3) the HiBiT tag coding sequence VSGWRLFKKIS (Promega), and (4) the E. coli rrnB transcriptional terminator (pLac_FL_HiBiT). PCR-amplified effectors were ligated into pLac_FL_HiBiT at SfiI and XbaI restriction sites (primers in Supplementary Data 7). The positive control SipA was amplified from a pT10-based plasmid (pMIB6433)34. Cloning was verified by analytical PCR.
Electroporation of plasmids into bacterial strains for injection experiments
Electrocompetent S. enterica sv. Typhimurium (wild-type SB300 and ΔsctV mutant SB1751)34, E. tarda (ATCC 23685), C. pasteurii (DSM 28879) and P. massiliensis (DSM 26120) were generated in-house and electroporated with effector encoding plasmids using a Gene Pulser Xcell Electroporation System (Bio-Rad) at 2.5 kV and 200 Ω for ~5 ms. Transformed strains were cultured overnight in LB medium with ampicillin for subsequent use in injection assays.
Injection assay
The injection assay was adapted from ref. 34. HeLa cells stably expressing LgBiT (HeLa-LgBiT) were grown using standard conditions (DMEM, 10% fetal bovine serum (FBS), 37 °C, 5% CO2) for 24 h before infection. S. Typhimurium strains carrying pLac_FL_HiBiT effector constructs were cultured overnight in LB supplemented with 0.3 M NaCl and ampicillin. Edwardsiella tarda, C. pasteurii and P. massiliensis strains were cultured with 200 μM IPTG to induce effector expression. Overnight bacterial cultures were added to the HeLa-LgBiT cells at a multiplicity of infection of 50 and jointly incubated for 1 h (Salmonella) or 1.5 h. After media replacement, extracellular luminescence was quenched by addition of 1× DarkBiT peptide (Promega, CS3002A02) for 50 min. Luminescence was measured after addition of 25 μl fresh Nano-Glo reagent (Promega, N2012) using a SpectraMax ID3 microplate reader (1,000 ms). Each strain was tested with four technical replicates and five biological replicates performed on separate days. Luminescence values from technical replicates were averaged to obtain a single value for each biological replicate. Luminescence fold-change was calculated by dividing the average signal from the effector-expressing strain by that of the mock control separately for wild-type and ΔsctV strains. To assess effector translocation, fold-change values were statistically compared between wild-type and mutant strains for Salmonella and against the negative control (SipA in ΔsctV) for E. tarda, using Wilcoxon rank-sum test (Supplementary Data 10).
Western blot analysis for injection assay and co-immunoprecipitations
Proteins were separated by 10% or 15% SDS–PAGE, transferred to PVDF membranes (Bio-Rad, 1620177) and blocked in blocking solution (5% non-fat dry milk in 1× PBS) for 1 h at room temperature or overnight at 4 °C. Blots were done with mouse anti-Flag M2 monoclonal antibody (Sigma-Aldrich, F1804, 1:5,000), rabbit anti-Myc (Abcam, ab9106, 1:1,000), followed by HRP-conjugated secondary antibody (Santa Cruz Biotechnology; anti-mouse: sc-516102; anti-rabbit: sc-2357, both 1:5,000) for 1 h each with three washes with blocking solution or PBST, respectively. Signal was detected with SuperSignal West Femto Substrate (Thermo Scientific, 34094) according to manufacturer instructions. Blots were imaged using the Intas ChemoStar imaging system.
Meta-interactome mapping
A multi-assay interactome mapping pipeline was used37 (Extended Data Fig. 2). In the initial screening by Y2H, candidate effectors fused to the Gal4 DNA-binding domain (DB-X) were screened against 17,472 human proteins fused to the Gal4 activation domain (AD-Y). Before screening, DB-X ORFs were tested for autoactivation by mating against AD-empty plasmids. Autoactivators were excluded. In the primary screen, DB-X strains in Y8930 (MATα) were mated on yeast extract peptone dextrose (YEPD) agar (1%) plates against minipools of ~188 AD-Y in Y8800 (MATa) representing the human ORFeome collection (v.9.1)36. After 24 h, yeasts were replica-plated onto selective media lacking leucine, tryptophan and histidine (SC-Leu-Trp-His), containing 1 mM 3-AT (3-amino-1,2,4-triazole) (3-AT plates) and replica-cleaned after 24 h. After 48 h, colonies were picked and then grown for 72 h in SC-Leu-Trp liquid medium for secondary phenotyping using the same selective +SC-Leu-His + 1 mM 3-AT + 1 mg l−1 cycloheximide plates to identify spontaneous autoactivators. Clones growing on 3-AT plates but not on cycloheximide plates were processed for sequence identification using a modified Kilo-seq procedure35: ORFs were amplified and tagged by PCR using a universal ‘term’ reverse primer (5’-GGAGACTTGACCAAACCTCTGGC) and Gal4-AD and -DB specific forward primers with position barcodes (Supplementary Table 11) and a TruSeq P7 sequence (0.2 µl DreamTaq DNA polymerase (ThermoFisher, EP0702), 3 µl 2 µM term primer, 3 µl forward primer, 2 µl yeast lysis). For every 96-well plate, 5 µl from each well were pooled, purified with 24 µl magnetic beads (magtivio, MDKT00010075) and eluted in 25 µl TE buffer. The DNA concentration of each pool was quantified by the QuantiT PicoGreen dsDNA Assay kit (ThermoFisher, P7589) using a lambda DNA dilution series (50–0.390625 ng μl−1), then diluted to 1–2 ng μl−1 and tagmented with 0.25 µl TDE enzyme (Illumina Tagment DNA TDE1 Enzyme and Buffer kit, 20034197). A second PCR added plate-specific Nextera i5/i7 indices (Supplementary Table 11) (8 µl tagmented DNA, 0.2 µl DreamTaq (ThermoFisher, EP0702), 1 µl 10 µM i5/i7 primers), followed by bead cleanup (80 µl beads per 100 µl PCR, eluted in 30 µl). Libraries were sequenced on a MiSeq v.2 kit (Illumina, MS-102-2002) and demultiplexed with bcl2fastq2 (v.2.20.0.422) by Illumina.
Finally, haploid yeasts of the DB-X and AD-Y candidate interaction pairs were mated individually and tested four times on selective plates using empty AD and DB plasmids as negative controls. Growth scoring was performed using a custom dilated convolutional neural network35. Pairs scoring positive in at least three out of four repeats qualified as bona fide Y2H interactors. The AD-Y and DB-X constructs were again identified by Illumina sequencing. All interaction data are provided in Supplementary Data 11.
Assembling reference sets for quality control
To identify reliably documented interactions between bacterial effectors and human proteins for our control set, we queried the IMEx consortium protein interaction databases71 for pairs supported by multiple evidence and at least one experiment detecting direct interactions. We manually recurated the corresponding publications and identified 67 well-documented direct interactions between 29 T3 effectors and 64 human proteins, described in 38 distinct publications constituting bhLit_BM-v1. To assemble bhRRS-v1, we randomly paired T3 effectors from bhLit_BM-v1 with human proteins in HuRI (Supplementary Data 12). Effector ORFs were cloned into Entry and experimental plasmids as described above. Human hsPRS/RRS-v2 ORFs were taken from hORFeome9.1 (ref. 36) and verified by end-read Sanger sequencing.
Interactome validation by yN2H
yN2H was used to independently validate the quality of the HuMMI dataset38. A total of 200 interaction pairs were randomly picked from HuMMI; all ORFs (Supplementary Data 14) were transferred by Gateway LR reactions into pDEST-N2H-N1 and pDEST-N2H-N2, and transformed into haploid Saccharomyces cerevisiae Y8800 (MATa) and Y8930 (MATα) strains. Protein pairs from all datasets were randomly distributed across matching 96-well plates. Luminescence from reconstituted NanoLuc for each sample was measured on a SpectraMax ID3 (Molecular Devices) with a 2-s integration time. The normalized luminescence ratio (NLR) was calculated by dividing the raw luminescence of each pair (N1-X N2-Y) by the maximum luminescence value of one of the two background measurements. All obtained NLR values were log2 transformed and the positive fraction for each dataset was determined at log2 NLR thresholds between –2 and 2, in 0.01 increments. Statistical results were robust across a wide range of stringency thresholds. Supplementary Data 14 reports the results at log2NLR = 0. Reported P values were calculated by Fisher’s exact test.
Co-immunoprecipitation of selected effector–host interactions
We evaluated whether N-terminally Flag-HA-tagged effector, or negative control Flag-GFP, co-immunoprecipitated the human proteins carrying an N-terminal MYC tag. Transfections for test and control pairs were always processed in parallel. HEK293 cells (RRID: CVCL_0045, DSMZ) were seeded in 10-cm dishes at a density yielding 60–70% confluency on the day of transfection. Plasmid DNA and X-tremeGENE transfection reagent (Roche) were mixed at a ratio of 1:2 (µg DNA:µl reagent) in serum-free DMEM. Per dish, 10 µg plasmid DNA (consisting of 3 µg effector- or GFP-encoding plasmid, 3 µg plasmid encoding the human protein and 4 µg empty vector) was diluted in 500 µl serum-free medium, followed by the addition of 20 µl X-tremeGENE reagent. The mixture was inverted twice, incubated for 15 min at room temperature and then added dropwise to the culture dish containing cells in complete growth medium. Cells were incubated under standard culture conditions (37 °C, 5% CO2) for 24 h before downstream analysis.
For cell lysate preparation, all steps were performed on ice. Culture medium was aspirated, and cells were washed three times with ice-cold 1× PBS by rinsing and aspirating sequentially. Cells were lysed directly on the plate by adding 1 ml NP-40 lysis buffer per plate (50 mM Tris-HCl, pH 8.0, 150 mM NaCl, 1% (v/v) NP-40 and 2.5 mM EDTA, with Roche complete protease inhibitor). Cells were detached using a rubber policeman and transferred to a 1-ml centrifuge tube. Lysates were incubated on ice for 30 min and cleared by centrifugation at 30,000g for 15 min at 4 °C. The supernatant was collected and the protein concentration was measured using the Bradford assay (Bio-Rad); the lysate was immediately used.
For immunoprecipitation experiments, 1 mg of cleared lysates of each sample was diluted into a final volume of 750 µl, and then 50 µl of an NP-40 buffer equilibrated with 20% anti-Flag M2 affinity gel (Sigma-Aldrich, A2220) slurry was added. Samples were rotated at 4 °C for 1 h. For washing, the tube was centrifuged at maximum speed for 30 s, the supernatant aspirated and 1 ml NP-40 wash buffer added, followed by a brief inversion. After three washes, the beads were resuspended in 50 µl NP-40 buffer, 50 µl Laemmli loading buffer was added, and the beads were heated at 98 °C for 10 min and briefly centrifuged before analysis. For analysis, 10 µl of cleared lysates and 15 µl of all immunoprecipitates were loaded on SDS–PAGE and processed through western blots as described above.
Interactome framework parameter calculation
The completeness of an interactome map is an important parameter that enables assessment of overlap and how complete a given biology is covered by the map. The framework incorporates assay sensitivity (that is, the proportion of interactions the assay can detect), sampling sensitivity (that is, saturation of the screen) and search space, describing all pairwise protein combinations. For the meta-interactome studied here, the search space cannot reasonably be estimated due to the uncertainty of T3SS-containing microorganisms in all human guts and the resulting inability to define that dimension of the problem.
Assay sensitivity (Sa) was assessed using the effector bh_LitBM-v1 (54 pairs) and bhRRS-v1 (72 pairs) as well as the human hsPRS/RRS-v2 (60 and 78 pairs, respectively) for benchmarking. All reference sets were tested four times using the Y2H screening pipeline (Supplementary Data 13). To assess sampling sensitivity (Ss), a repeat screen was conducted. A total of 288 bacterial effectors were screened 4 times against 5 pools comprising 1,475 human proteins. A saturation curve was calculated as described37. In brief, all combinations of the number of interactions of the four repeats were assembled and the reciprocal values calculated. From these, a linear regression was determined to obtain the slope and the intercept. Reciprocal parameters were calculated and the Michaelis Menten equation was used with modified variables: analogous to increasing substrate concentrations in enzyme reactions, repeat screens progressively drive the screen to saturation91. Hence a saturation curve was predicted using Ni(R) = Nimax × R/Km + R, with Ni representing the interaction count after R repeats, Nimax the saturation limit and Km the Michaelis constant. Overall sensitivity emerges from both sampling and assay limitations and was calculated as So = Sa × Ss.
Intra- and interspecies effector convergence
To estimate the significance of effector convergence, we performed a permutation test by randomly sampling with replacement 979 target nodes from HuRI36 (n = 8,274). In each iteration, we counted the number of unique targets, and the distribution from 10,000 random permutations was used to compute the z-score for the observed 349 targets. A P value was obtained from the z-scores using the ‘pnorm()’ R command and multiplied by 2 for a two-tailed test. To avoid overestimation and increase stringency, we restricted the analysis to Y2H positive proteins in HuMMIMAIN and HuRI. To assess interspecies convergence, we used a conditional permutation test that preserves the strain contribution. Each iteration generated 18 samples corresponding to the observed number of targets for each strain (Supplementary Data 11). For every protein, the frequency of selection across all strains was recorded as its convergence value. On the basis of 10,000 iterations, we derived the convergence value distribution, calculated z-scores and obtained the P value using the pnorm() R function. Significance was observed from four strains onward (P < 0.004), and proteins targeted by at least four strains were considered to show interspecies convergence.
Sequence similarity and interaction profile
To investigate the relationship between the effector sequence and the interaction profile similarity, we calculated the pairwise Jaccard indices for all effector pairs within each homology cluster. The index was defined as the ratio of shared to total human targets. Pairs with fewer than three targets were excluded.
AlphaFold-based interaction modelling
To analyse the interfaces of effector–host interaction pairs, all identified pairs were subjected to structural prediction using AlphaFold v.2.3.1 with the following options: –model_preset=multimer, –db_preset=full_dbs, –max_template_date=2023-12-19, –num_multimer_predictions_per_model=1, –enable_cpu_relax and –use_precomputed_msas. Predictions were not generated for pairs whose combined length exceeded 2,500 residues. The predicted aligned error (PAE) matrix was extracted from the AlphaFold pickle output using alphapickle v.1.4.1 (https://github.com/mattarnoldbio/alphapickle, https://doi.org/10.5281/zenodo.5752375). To assess confidence, we used the confident contacts count (CCC), which is the number of residue–residue contacts72 with PAE < 4 Å. Each putative interface residue was assigned a PAE value. When a residue was in contact with multiple residues on the partner protein, the minimum PAE value among those contacts was used. Structure predictions were considered confident when the CCC was ≥5.
Interface similarity analysis using PAE thresholding
Protein sequences (Supplementary Data 17) were converted from single-letter aa notation to three-letter residue annotation, and residue identifiers were assigned to match their positions in the AlphaFold PAE matrix. Only human proteins targeted by at least two bacterial effectors were retained. Residue contacts were extracted and matched to PAE coordinates, and pooled PAE values defined the 25th, 50th, 75th and 95th percentile thresholds. Contacts with PAE values equal or below the threshold were retained, and the corresponding human and bacterial residues and total retained contacts were recorded. This procedure was repeated for the 25th, 50th, 75th and 95th percentiles, and the resulting subsets were merged into the main dataset.
Interface similarities
Interface similarity between bacterial effectors targeting the same human protein was assessed using the Jaccard index across all PAE thresholds. For each targeted human protein, all interacting bacterial effectors were identified, and all possible effector–effector combinations were generated. At each threshold, the Jaccard index was calculated as the number of overlapping human interface residues divided by the total number of unique residues in both interfaces. Indices were classified as distinct (Jaccard index ≤ 0.1), overlapping (0.1 < Jaccard index < 0.6) or same (Jaccard index ≥ 0.6). Analogous calculations were performed to analyse interfaces of human proteins targeted by the same bacterial effector.
Interface domain annotations
Domains were assigned to the interacting human proteins using InterProScan v.5.75 with InterPro release 106.0, run through the EBI web server. Domain coordinates, descriptions and confidence scores were retrieved. The number of interface residues within each domain boundary (n_interface_residues_in_domain) was then counted, along with the total residues in the predicted interface (n_residues_in_interface), the percentage of interface residues in the domain (IF%), the number of residues in the domain (Domain_length) and the proportion of the domain length relative to the full protein length (Domain%).
SLiM–domain interface predictions
We used as mimicINT43 input, a representative set of effectors identified in isolated strains (2,300 sequences clustered at 90% identity) and all effectors identified in MAGs (186). mimicINT detects domains in effector sequences using the signatures from the InterPro v.81.0 database73, retaining matches with an E-value < 10−5. For motif detection, mimicINT uses definitions available in the ELM database74. The IUPred 1.0 algorithm75 was employed to detect motifs in disordered regions with both short and long models (motif disorder propensity = 0.2 (ref. 76), minimum size = 5). The interface inference step used the 3did database77 for domain–domain templates and the ELM database (2022 release) for motif–domain templates. Two scoring strategies were applied. First, domain binding specificity within the same family was accounted for by computing a profile HMM-based domain score41 (stringency threshold = 0.3). Second, given the degenerate nature of motifs41, mimicINT uses Monte Carlo simulations to estimate the probability of a SLiM occurring by chance, by shuffling disordered regions of the input sequences to generate N randomized proteins. Effectors were first grouped by strain, with MAG-derived effectors assigned to the closest strain. Disordered regions were shuffled 100,000 times using two backgrounds: same-strain effectors (within-strain shuffling) and full effector set (interstrain shuffling). Motif occurrences in each effector were compared to those in the shuffled sequences, retaining only those with an empirical P < 0.1 in both backgrounds. To assess whether the number of inferred interface-resolved interactions exceeded random expectation, the analysis was controlled using 10,000 degree-controlled random networks generated from the human interaction search space (Supplementary Data 18).
For the reverse analysis of bacterial domains interacting with SLiMs in the human proteins, the annotated bacterial domains were matched to domains in the ELM templates. For interactors of the so-identified effectors (Efe_1, Pfa_18, Pre_16, Pst_8, Vfu_32), we identified disordered regions as above and screened these for motifs matching the templates in the ELM database, yielding the reported example.
Holdup assay
Holdup is a biochemical assay used to validate the interface predictions involving PDZ domains. A total of 54 human PDZ domains and 11 tandem constructs were recombinantly expressed as His6-MBP-PDZ constructs in E. coli BL21(DE3) pLysS and purified by Ni2+-affinity columns using 800 µl of beads (Chelating Sepharose Fast Flow immobilized metal affinity chromatography, Cytiva) per target. After elution, purified proteins were desalted using PD10 columns (GE healthcare, 17085101) into 3.5 ml 50 mM Tris (pH 8.0), 300 mM NaCl and 10 mM imidazole buffer. Protein concentrations were determined using A280 nm on a PHERAstar FSX plate reader (BMG LABTECH), and purity assessed by SDS–PAGE and capillary electrophoresis; 4 µM stocks were stored at −20 °C. Biotinylated peptides (10-mer) corresponding to the C-terminal sequences of effectors were synthesized by GenicBio Limited; the N-terminal biotin was attached via a 6-aminohexanoic acid linker, and all peptides were >95% pure (HPLC and MS). Peptides were solubilized in dH2O, 1.4% ammonia or 5% acetic acid, aliquoted at 10 mM and stored at −20 °C.
For the assay, 2.5 µl of streptavidin resin (Cytiva, 17511301) was incubated in a 384-well filter plate (Millipore, MZHVN0W10) for 15 min with 20 µl of a 42 µM peptide solution. The resin was washed with 10 resin volumes (resvol) of holdup buffer (50 mM Tris-HCl, 300 mM NaCl, pH 8.0, 10 mM imidazole, 5 mM dithiothreitol), incubated for 15 min with 5 resvol 1 mM biotin and washed three times with 10 resvol of holdup buffer. Individual PDZ domains were added to wells, incubated for 15 min, and unbound PDZ recovered by centrifugation into 384-well black assay plates for fluorescence readout. Concentrations were quantified by intrinsic Trp fluorescence, and fluorescein/mCherry was used for peak normalization. Binding affinities and equilibrium dissociation constants were calculated as previously described45, using the mean PBM concentration. Raw values and statistical analysis are provided in Supplementary Data 19.
Function enrichment analysis
Functional enrichment of effector targets was assessed using the ‘gost()’ function in the ‘gprofiler2’ R package (v.0.2.1)78 with HuRI as the background (custom_bg), excluding electronic annotations (exclude_iea = TRUE), with Benjamini–Hochberg correction (correction_method = ‘fdr’). Functional categories were drawn from Gene Ontology biological process terms (GO:BP), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and the Reactome pathways database (sources = c(‘GO:BP’, ‘KEGG’, ‘ REAC’)). Odds ratios and fold enrichments were calculated to estimate effect sizes, where the odds ratios was the ratio of odds in the target set to those in the HuRI background, and fold enrichment compared observed to expected annotated targets. Expected values were based on random sampling from the HuRI background (GO:BP = 6,988; KEGG = 3,250; Reactome = 4,592) (Supplementary Data 20). Similar analyses were performed for functional enrichment analysis of human proteins targeted by pathogens (Supplementary Data 20).
Metabolic subsystem analysis
We assessed enrichment of targeted enzymes across metabolic subsystems using the human genome-scale model Recon3D48. Recon3D is a curated static model of human metabolism that lacks post-translational and allosteric regulation. Ligases and kinases were excluded to focus on metabolic enzymes. For each of the 95 Recon3D subsystems, enrichment was tested using the ‘phyper()’ R function, with inputs corresponding to annotated and unannotated targeted enzymes and BH false discovery rate (FDR) correction. OR and fold enrichment were calculated as described for functional analyses (Supplementary Data 20).
Disease enrichment analysis
Associations of effector targets and convergence proteins with human disease genetics were tested using a two-sided Fisher’s exact test. Disease-causal genes were obtained from the Open Targets genetic portal (access date 23 August 2022), which integrates variant-to-gene distance, quantitative trait loci co-localization, chromatin interactions and variant pathogenicity79. The portal’s machine-learning model assigns each locus-to-gene (L2G) score to genes in loci identified in GWAS to identify the most probable causal gene. Genes with L2G ≥ 0.5 were considered causal as recommended80. Ensembl identifiers were converted to gene symbols using the biomaRt R package (v.2.60.1, Bioconductor 3.19), and Fisher’s exact test was implemented in R (fisher.test), stats v.4.2.2 using default parameters on 2 × 2 contingency tables comparing causal gene presence in query and background sets. HuRI protein encoding genes were used as the background, and targets or convergence proteins as the query sets. FDR correction and OR and fold enrichments were calculated as done for functional enrichment (Supplementary Data 21).
Random walk-based determination of commensal effector network neighbourhoods
We implemented a random walk with restart (RWR) algorithm, RWR-MH81, to explore the network neighbourhood of 338 human proteins targeted by 243 commensal effectors in HuRI36 (HuMMIMAIN). Human targets were used as seeds, with the restart probability of 0.7 generating a ranked list of proteins. Statistical significance was assessed by random walks in degree-preserved randomized networks. We generated 1,000 random networks from HuRI and computed RWR scores for each protein, retaining as network neighbour only those with empirical P < 0.01.
For each set of significant neighbourhood proteins, we tested for enrichment of Open Targets causal genes (L2G ≥ 0.5) linked to traits supported by at least three causal genes. Enrichment in each strain neighbourhood was assessed using two-sided Fisher’s exact test with BH correction. No associations were significant (FDR < 0.05). We therefore focused on 400 associations with nominal P < 0.01 and odds ratio > 3. Disease categorizations were refined to reflect aetiology; Sjogren syndrome, eczema and psoriasis were grouped as immunological rather than eye or skin traits, and osteoarthritis as musculoskeletal/connective tissue rather than metabolic traits. For Fig. 4d, related asthma and psoriasis terms were merged (Supplementary Data 23).
NF-κB activation assay
HEK293 cells (RRID: CVCL_0045, DSMZ) were maintained in DMEM, 10% FBS, 100 U ml−1 penicillin–streptomycin at 37 °C and 5% CO2. IKKβ (pRK5-Flag) and A20 (pEF4-Flag) served as positive and negative controls, respectively. Cells (1 × 106 per 60 mm dish) were transfected with 10 ng NF-κB reporter plasmid (6× NF-κB firefly luciferase pGL2), 50 ng pTK reporter (Renilla luciferase) and 2 µg bacterial ORF in pMH-Flag-HA using the calcium phosphate method. After 6 h, medium was replaced. To assess NF-κB inhibition, cells were treated for 4 h with 20 ng ml−1 TNF (Sigma-Aldrich, SRP3177) at 24 h post transfection. Lysates were analysed using the dual-luciferase reporter kit (Promega, E1980) with a luminometer (Berthold Centro LB960 microplate reader, software: MikroWin 2010). NF-κB induction was determined as firefly/Renilla luminescence. P values were calculated using Kruskal–Wallis test with Dunn’s post hoc comparisons followed by FDR correction. Raw values and statistical analysis are provided in Supplementary Data 24.
Protein expression was analysed by western blot as described above with following modifications: blocking solution contained 0.1% Tween-20. Membranes were incubated overnight at 4 °C with primary antibodies in 2.5% BSA in PBST, washed and probed with anti-mouse secondary antibody in PBST for 1 h at room temperature (1:10,000; Jackson ImmunoResearch Labs, RRID:AB_2340770). Primary antibodies used were: anti-β-actin (1:10,000; Santa Cruz Biotechnology, RRID:AB_626632), anti-Flag M2 (1:500; Sigma-Aldrich, RRID:AB_259529) and anti-HA (1:1,000; Sigma-Aldrich, RRID:AB_514505). Signals were detected using LumiGlo reagent (CST, 7003S) and chemiluminescence film (Sigma-Aldrich, GE28-9068-36).
Cytokine assays
Caco-2 cells (RRID: CVCL_0025) were maintained in DMEM glutamax medium (Gibco) with 10% FBS and 1% Pen/Strep at 37 °C and 5% CO2. Experiments in Fig. 5c were performed by transfecting Caco-2 cells using 40,000 MW linear polyethylenimine (PEI MAX) (Polysciences) at a ratio of 1:5 pDNA:PEI. Cells were exposed to the transfection mixture for 16 h, washed, recovered for 6 h and then sorted (BD FACSAria III cell sorter, BD Biosciences). After 24 h recovery, cells were activated for 48 h using a stimulation mix containing 200 ng ml−1 phorbol-12-myristate-13-acetate (P8139, Sigma-Aldrich), 100 ng ml−1 lipopolysaccharide (L6529, Sigma-Aldrich) and 100 ng ml−1 TNF (130-094-014, Miltenyi Biotec). During activation, proliferation was monitored in the Incucyte S3 Live Cell Analysis system (Essen BioScience). Cytokine levels were determined using the human inflammation panel 1 LEGENDplex kit (Biolegend). We performed three biological repeats, each with three or four technical repeats. Statistical significance was tested on the average of the technical replicates using Kruskal–Wallis test with Dunn’s post hoc comparisons. Experiments in Fig. 5d and Extended Data Fig. 4d were performed by transfecting cells using the 4D-Nucleofector system (Lonza). Collected cells were resuspended in SF nucleofector solution, added with 0.6 µg plasmid, and pulsed (code DG-113) and plated in DMEM + 5% FBS. Cells were allowed to recover overnight and then rested in culture medium for 24 h. Cells were stimulated with 10 µg ml−1 Pam3CSK4 (tlrl-pms, Invivogen), 1 µg ml−1 flagellin (tlrl-stfla, Invivogen) or 100 ng ml−1 TNF (130-094-014, Miltenyi Biotec) for 24 h. We performed five biological repeat experiments with three technical repeats each. For each experiment, pooled supernatants were analysed using the Human Anti-virus Response Panel V02 (BioLegend). The data were analysed using Kruskal–Wallis test with Dunn’s post hoc comparisons. Raw and statistical summary data are available in Supplementary Data 25.
Protein ecology on IBD metagenomes
Metagenomic assemblies from the Inflammatory Bowel Disease Multiomics DataBases (IBDMBD)53 and from the skin metagenome82 were downloaded, and protein repertoires predicted using Prodigal (option: -p meta)83. Effectors were compared to the metagenomic protein repertoires using DIAMOND 0.9.24 (options: >90% query length, >80% identity). For analyses in Fig. 5, samples were grouped into individuals with ulcerative colitis (n = 304), Crohn’s disease (n = 508) and the controls without IBD (n = 334). Binary presence and absence vectors for each effector across the sample were generated and the prevalence of each effector in patients compared to the controls was assessed using Fisher’s exact test, implemented within the SciPy 1.9.3 Python 3.10.12 module, and FDR corrected using BH correction. Differences in prevalence distributions between healthy and either patient cohort were estimated using the Wilcoxon rank-sum test, implemented in the ‘wilcox.test()’ R function. We used fold change as the measure of effect size in Fig. 5e, calculated as prevalence in the test group divided by prevalence in the healthy group. To avoid division by zero, we applied a small pseudo-count to the healthy cohort data for individuals with 0% prevalence. The pseudo-count was equivalent to half a case in the healthy cohort (n = 334 individuals), ensuring minimal influence on results while enabling calculation of fold change. Statistical details are provided in Supplementary Data 26.
Statistics and reproducibility
Data were subjected to statistical analysis and plotted in Microsoft Excel 2010 or Python or R scripts. For comparison of normally distributed values, we used one-way analysis of variance (ANOVA). For comparison of values not passing the normality tests, we used either Kruskal–Wallis test with Dunn’s correction for multiple-group comparisons or Wilcoxon rank-sum test for two-group comparisons as indicated. Enrichments were calculated using Fisher’s exact test with Bonferroni FDR correction. All statistical evaluations were done as two-sided tests. Generally, a corrected P < 0.05 was considered significant. GO, KEGG and Reactome functional enrichments were calculated using the gprofiler2 R package with the indicated background sets. For the disease target enrichments and neighbourhood associations, no associations were significant after multiple hypothesis correction, which is why nominally significant associations calculated by Fisher’s exact tests were used for Fig. 4c,d. All raw values, n and statistical details are presented in Supplementary Data as indicated in figure legends and in Methods.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All sequence, interaction and functional data generated in this study are available as supplementary information. The effectors identified and cloned for interactome mapping are presented in Supplementary Data 7. All protein–protein interaction data acquired in this study can be found in Supplementary Data 11. The data for functional validation assays can be found in Supplementary Data 24–26. All protein interaction data have been deposited to the IMEx consortium (http://www.imexconsortium.org) through IntAct and assigned the identifier IM-29849. New effector sequences have been submitted to GenBank: BankIt2727690: OR372873–OR373035 and OR509516–OR509528. Source data are provided with this paper.
Code availability
Data and scripts related to the prediction of T3SS-positive reference strains and metagenomes are available on Zenodo at https://doi.org/10.5281/zenodo.17825584 (ref. 84). All data and scripts generated to perform the structural comparison between commensal and pathogen effectors are available on Zenodo at https://doi.org/10.5281/zenodo.11951539 (ref. 85). The full set of inferred SLiM–domain and domain–domain interactions and the randomly generated networks are available on Zenodo at https://doi.org/10.5281/zenodo.11400863 (ref. 86). The mimicINT43 code can be found on GitHub at https://github.com/TAGC-NetworkBiology/mimicINT/releases/tag/v1 (ref. 87). The 1,000 randomized control networks for the random walk analysis are available on Zenodo at https://doi.org/10.5281/zenodo.12743976 (ref. 88). The AlphaFold predictions of effector–host interaction pairs along with all confidently predicted structures are available on Zenodo at https://doi.org/10.5281/zenodo.16816224 (ref. 89). The datasets and analysis scripts for convergence analysis, functional and disease enrichment analysis, and AlphaFold human–effector interface similarity analysis are available on Zenodo at https://doi.org/10.5281/zenodo.16883544 (ref. 90).
References
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Oren, A. & Garrity, G. M. Valid publication of the names of forty-two phyla of prokaryotes. Int. J. Syst. Evol. Microbiol. https://doi.org/10.1099/ijsem.0.005056 (2021).
Joos, R. et al. Examining the healthy human microbiome concept. Nat. Rev. Microbiol. 23, 192–205 (2025).
Thomas, S. et al. The host microbiome regulates and maintains human health: a primer and perspective for non-microbiologists. Cancer Res. 77, 1783–1812 (2017).
Lietzen, N. et al. Coxsackievirus B persistence modifies the proteome and the secretome of pancreatic ductal cells. iScience 19, 340–357 (2019).
Cadwell, K. et al. Virus-plus-susceptibility gene interaction determines Crohn’s disease gene Atg16L1 phenotypes in intestine. Cell 141, 1135–1145 (2010).
Hatano, Y. et al. Virus-driven carcinogenesis. Cancers 13, 2625 (2021).
Rozenblatt-Rosen, O. et al. Interpreting cancer genomes using systematic host network perturbations by tumour virus proteins. Nature 487, 491–495 (2012).
Deng, W. Y. et al. Assembly, structure, function and regulation of type III secretion systems. Nat. Rev. Microbiol. 15, 323–337 (2017).
Pallen, M. J., Beatson, S. A. & Bailey, C. M. Bioinformatics, genomics and evolution of non-flagellar type-III secretion systems: a Darwinian perspective. FEMS Microbiol. Rev. 29, 201–229 (2005).
Miwa, H. & Okazaki, S. How effectors promote beneficial interactions. Curr. Opin. Plant Biol. 38, 148–154 (2017).
Osborne, R. et al. The evolution of effectome-based strategies to establish beneficial symbiosis. Mol. Plant Microbe Interact. 32, 41(2019).
Wessling, R. et al. Convergent targeting of a common host protein-network by pathogen effectors from three kingdoms of life. Cell Host Microbe 16, 364–375 (2014).
Mukhtar, M. S. et al. Independently evolved virulence effectors converge onto hubs in a plant immune system network. Science 333, 596–601 (2011).
Nelson, K. E. et al. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).
Eichinger, V. et al. EffectiveDB—updates and novel features for a better annotation of bacterial secreted proteins and Type III, IV, VI secretion systems. Nucleic Acids Res. 44, D669–D674 (2016).
Hitch, T. C. A. et al. HiBC: a publicly available collection of bacterial strains isolated from the human gut. Nat. Commun. 16, 4203 (2025).
Poyet, M. et al. A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research. Nat. Med. 25, 1442–1452 (2019).
Groussin, M. et al. Elevated rates of horizontal gene transfer in the industrialized human microbiome. Cell 184, 2053–2067.e18 (2021).
Yang, X. B., Pan, J. F., Wang, Y. & Shen, X. H. Type VI secretion systems present new insights on pathogenic Yersinia. Front. Cell. Infect. Microbiol. 8, 260 (2018).
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662.e20 (2019).
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Leviatan, S., Shoer, S., Rothschild, D., Gorodetski, M. & Segal, E. An expanded reference map of the human gut microbiome reveals hundreds of previously unknown species. Nat. Commun. 13, 3863 (2022).
Jing, R. et al. DeepT3 2.0: improving type III secreted effector predictions by an integrative deep learning framework. NAR Genom. Bioinform. 3, lqab086 (2021).
Arnold, R. et al. Sequence-based prediction of type III secreted proteins. PLoS Pathog. 5, e1000376 (2009).
Goldberg, T., Rost, B. & Bromberg, Y. Computational prediction shines light on type III secretion origins. Sci. Rep. 6, 34516 (2016).
Wang, J. W. et al. BastionHub: a universal platform for integrating and analyzing substrates secreted by Gram-negative bacteria. Nucleic Acids Res. 49, D651–D659 (2021).
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Varadi, M. et al. AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Res. 52, D368–D375 (2024).
Barrio-Hernandez, I. et al. Clustering predicted structures at the scale of the known protein universe. Nature 622, 637–645 (2023).
Galperin, M. Y., Nikolskaya, A. N. & Koonin, E. V. Novel domains of the prokaryotic two-component signal transduction systems. FEMS Microbiol. Lett. 203, 11–21 (2001).
Zaver, S. A. & Woodward, J. J. Cyclic dinucleotides at the forefront of innate immunity. Curr. Opin. Cell Biol. 63, 49–56 (2020).
Westerhausen, S. et al. A NanoLuc luciferase-based assay enabling the real-time analysis of protein secretion and injection by bacterial type III secretion systems. Mol. Microbiol. 113, 1240–1254 (2020).
Kim, D. K. et al. A proteome-scale map of the SARS-CoV-2-human contactome. Nat. Biotechnol. 41, 140–149 (2023).
Luck, K. et al. A reference map of the human binary protein interactome. Nature 580, 402–408 (2020).
Altmann, M. et al. Extensive signal integration by the phytohormone protein network. Nature 583, 271–276 (2020).
Choi, S. G. et al. Maximizing binary interactome mapping with a minimal number of assays. Nat. Commun. 10, 3907 (2019).
Del Toro, N. et al. The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res. 50, D648–D653 (2022).
Osborne, R. et al. Symbiont–host interactome mapping reveals effector-targeted modulation of hormone networks and activation of growth promotion. Nat. Commun. 14, 4065 (2023).
Davey, N. E. et al. Attributes of short linear motifs. Mol. Biosyst. 8, 268–281 (2012).
Burke, D. F. et al. Towards a structurally resolved human protein interaction network. Nat. Struct. Mol. Biol. 30, 216–225 (2023).
Choteau, S. A. et al. mimicINT: A workflow for microbe–host protein interaction inference. F1000Research 14, 128 (2025).
Gutierrez-Gonzalez, L. H. et al. Peptide targeting of PDZ-dependent interactions as pharmacological intervention in immune-related diseases. Molecules 26, 6367 (2021).
Gogl, G. et al. Quantitative fragmentomics allow affinity mapping of interactomes. Nat. Commun. 13, 5472 (2022).
Shan, Y., Lee, M. & Chang, E. B. The gut microbiome and inflammatory bowel diseases. Annu. Rev. Med. 73, 455–468 (2022).
Brennan, J. J. & Gilmore, T. D. Evolutionary origins of Toll-like receptor signaling. Mol. Biol. Evol. 35, 1576–1587 (2018).
Brunk, E. et al. Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat. Biotechnol. 36, 272–281 (2018).
Ochoa, D. et al. Open Targets Platform: supporting systematic drug-target identification and prioritisation. Nucleic Acids Res. 49, D1302–D1310 (2021).
Malone, J. et al. Modeling sample variables with an Experimental Factor Ontology. Bioinformatics 26, 1112–1118 (2010).
Bunker, J. J. et al. Natural polyreactive IgA antibodies coat the intestinal microbiota. Science 358, eaan6619 (2017).
Voogdt, C. G. P., Bouwman, L. I., Kik, M. J. L., Wagenaar, J. A. & van Putten, J. P. M. Reptile Toll-like receptor 5 unveils adaptive evolution of bacterial flagellin recognition. Sci. Rep. 6, 19046 (2016).
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Franzosa, E. A. et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat. Microbiol. 4, 293–305 (2019).
Egan, F., Barret, M. & O’Gara, F. The SPI-1-like type III secretion system: more roles than you think. Front. Plant Sci. 5, 34 (2014).
Shalon, D. et al. Profiling the human intestinal environment under physiological conditions. Nature 617, 581–591 (2023).
Zboralski, A., Biessy, A. & Filion, M. Bridging the gap: type III secretion systems in plant-beneficial bacteria. Microorganisms https://doi.org/10.3390/microorganisms10010187 (2022).
Rodriguez, P. A. et al. Systems biology of plant–microbiome interactions. Mol. Plant. 12, 804–821 (2019).
Secher, T. et al. Oral administration of the probiotic strain Escherichia coli Nissle 1917 reduces susceptibility to neuroinflammation and repairs experimental autoimmune encephalomyelitis-induced intestinal barrier dysfunction. Front. Immunol. 8, 1096 (2017).
Backert, S., Tegtmeyer, N. & Selbach, M. The versatility of Helicobacter pylori CagA effector protein functions: the master key hypothesis. Helicobacter 15, 163–176 (2010).
Scott, N. E. & Hartland, E. L. Post-translational mechanisms of host subversion by bacterial effectors. Trends Mol. Med. 23, 1088–1102 (2017).
Huerta-Cepas, J. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Chaumeil, P. A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2019).
Jain, C., Rodriguez, R. L., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Urban, M. et al. PHI-base in 2022: a multi-species phenotype database for pathogen–host interactions. Nucleic Acids Res. 50, D837–d847 (2022).
Olson, R. D. et al. Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR. Nucleic Acids Res. 51, D678–D689 (2023).
van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).
Orchard, S. et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat. Methods 9, 345–350 (2012).
Mosca, R., Ceol, A. & Aloy, P. Interactome3D: adding structural details to protein networks. Nat. Methods 10, 47–53 (2013).
Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49, D344–D354 (2021).
Kumar, M. et al. ELM—the eukaryotic linear motif resource in 2020. Nucleic Acids Res. 48, D296–D306 (2020).
Dosztanyi, Z. Prediction of protein disorder based on IUPred. Protein Sci. 27, 331–340 (2018).
Edwards, R. J., Paulsen, K., Aguilar Gomez, C. M. & Perez-Bercoff, A. Computational prediction of disordered protein motifs using SLiMSuite. Methods Mol. Biol. 2141, 37–72 (2020).
Mosca, R., Ceol, A., Stein, A., Olivella, R. & Aloy, P. 3did: a catalog of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 42, D374–D379 (2014).
Kolberg, L., Raudvere, U., Kuzmin, I., Vilo, J. & Peterson, H. gprofiler2—an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler. F1000Research 9, ELIXIR-709 (2020).
Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 53, 1527–1533 (2021).
Barrio-Hernandez, I. et al. Network expansion of genetic associations defines a pleiotropy map of human cell biology. Nat. Genet. 55, 389–398 (2023).
Valdeolivas, A. et al. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics 35, 497–505 (2019).
Oh, J. et al. Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Hyden, P. Prediction of T3SS-positive reference strains and metagenomes. Zenodo https://doi.org/10.5281/zenodo.17825584 (2025).
Fernandez MacGregor, J., Brun, C. & Zanzoni, A. Structural analysis and comparison of known and candidate type 3 effectors. Zenodo https://doi.org/10.5281/zenodo.11951539 (2025).
Choteau, S., Brun, C. & Zanzoni, A. Inferred protein interactions between candidate type 3 effectors and human proteins. Zenodo https://doi.org/10.5281/zenodo.11400863 (2024).
Choteau, S. et al. mimicINT: a computational workflow to infer protein–protein interactions. GitHub https://github.com/TAGC-NetworkBiology/mimicINT/releases/tag/v1 (2024).
Saha, D., Perrin, J., Brun, C. & Zanzoni, A. RWR neighborhood analysis on HuRI. Zenodo https://doi.org/10.5281/zenodo.12743976 (2024).
Lambourne, L. AlphaFold-multimer predictions for meta-interactome PPIs. Zenodo https://doi.org/10.5281/zenodo.16816224 (2025).
Dohai, B. Metainteractome functional and disease enrichment analysis. Zenodo https://doi.org/10.5281/zenodo.16883544 (2025).
Arabidopsis Interactome Mapping Consortium. Evidence for network evolution in an Arabidopsis interactome map. Science 333, 601–607 (2011).
Acknowledgements
This work was supported by HDHL-INTIMIC ‘Interrelation of the Intestinal Microbiome, Diet and Health’ (BMBF 01EA1803 to P.F.-B., ANR ANR-17-HDIM-0001 to C.B. and FFG 11819559 to T.R.), the European Union’s Horizon 2020 Research and Innovation Programme (Project ID 101003633, RiPCoN; P.F.-B., C.B.); the Free State of Bavaria’s AI for Therapy (AI4T) Initiative through the Institute of AI for Drug Discovery (AID) (P.F.-B.), the French government under the France 2030 investment plan, as part of the initiative d’Excellence d'Aix-Marseille Université – A*MIDEX (AMX-21-PEP-043, to A.Z.), and the FRS-FNRS (J.-C.T. and S.B.M.). The computational results presented were achieved in part using the Vienna Scientific Cluster (VSC). Centre de Calcul Intensif d’Aix-Marseille is acknowledged for granting access to its high-performance computing resources. S.A.C. received funding from the ‘Espoirs de la recherche’ programme managed by the French Fondation pour la Recherche Médicale (FRM, FDT202106013072). The project leading to this publication also received funding from France 2030, the French Government programme managed by the French National Research Agency (ANR-16-CONV-0001) and from the Excellence Initiative of Aix-Marseille University – AMIDEX. J.F.-M. was funded by Consejo Nacional de Humanidades Ciencias y Tecnologias (CONAHCYT) Becas al Extranjero Convenios GOBIERNO FRANCES 2021 – 1 grant 795494, and received support from the ‘Espoirs de la Recherche’ programme managed by the French Fondation pour la Recherche Medicale (FRM, FDT202404018637). The AFMB contribution was supported by the French Infrastructure for Integrated Structural Biology (FRISBI) ANR-10-INSB-05-01. D. Krappmann was supported by Deutsche Forschungsgemeinschaft (ID 210592381 – SFB 1054 A04). T.C. was supported by the Deutsche Forschungsgemeinschaft (project 403224013 – SFB 1382 Q02). S.R. was supported by an ERS Long-Term Research Fellowship (LTRF2024-01131).
Funding
Open access funding provided by Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH).
Author information
Authors and Affiliations
Contributions
P.F.-B. conceived the project. P.H., T.R., T.C.A.H., S.A., C.B., A.Z. and P.F.-B. performed T3SS and effector identification. V.Y., H.H., B.W., S.T.R., M.R., M.A., A.S. and P.F.-B. performed ORF cloning. J.F.-M., L.B. and A.Z. conducted structural pathogen–commensal comparison. H.H., B.W., S.T.R., Y.M.T., L.P., B.D. and P.F.-B. conducted injection assays. V.Y., S.T.R., B.W., P.S., A.S. and P.F.-B. performedY2H analyses. V.Y., M.A., S.A., M. Boujeant, A.Z., C.F. and P.F.-B. curated bhLit-BM data. B.D., D.S., V.Y., J.F.-M., L.L., L.B., J.P., C.-W.L., M.H., C.B., A.Z. and P.F.-B. performed data analyses. B.W., S.T.R. and V.Y. conducted yN2H validation assays. S.A.C., L.B., J.P. and A.Z. performed interface-SLiMs analyses. L.L., B.D., M.A.C., M.V. and P.F.-B. conducted interface-AF analyses. J.F.-M., S.B.M., J.-C.T. and R.V. performed the holdup and peptide assays. T.C.A.H. and T.C. analysed effector ecology. V.Y., N.S.v.H., D.K.J.P., S.R., F.O., M.T., J.B., D. Kotlarz, D. Krappmann, M. Boes and P.F.-B. performed cell-based assays. V.Y., B.D., H.H., J.F.-M. and P.F.-B. generated visualization and developed the figures. P.F.-B., C.F., A.Z., C.B., T.R. and D. Krappmann acquired funding. P.F.-B., B.D., A.Z., V.Y., B.W., T.H. and C.F. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks Kenichi Tsuda, Tamas Korcsmaros and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 T3SS in strains of the commensal gut microbiome.
a, Effector complements comparison of the 44 T3SS+ Pseudomonadota reference strains. Numbers indicate the count of shared effectors at > 90% mutual sequence similarity across 90% common sequence length among the indicated strains (Supplementary Data 1). b, Abundance of secretion systems in Pseudomonadota genomes among the 77 reference strains of human intestinal and stool samples, in a collection of 4,475 strains isolated from normal human guts (HBC/BIO-ML/GMC strains) and in meta-assembled genomes (MAG) of normal human guts. c, Similarity of identified 182 candidate effectors from the 770 T3SS+ MAGs with 1,195 effectors from pathogenic microbes across the range of alignment coverages. Full data for all panels in Supplementary Data 4. d, Cloning success: success rates of effector open reading frame (ORF) cloning for the indicated reference strains, and the number of obtained and sequence verified ORFs (on top of bars) (Supplementary Data 7). e, Luminescence from injection assays with Salmonella Typhimurium wt and ΔsctV strains expressing SipA, and Citrobacter pasteurii and Phytobacter massiliensis expressing the indicated effectors. Each data point represents a single technical replicate. f. Western blots showing expression of FLAG-tagged effectors in wt and ΔsctV S. Typhimurium in the indicated strains.
Extended Data Fig. 2 Discovery and validation of HuMMI.
a, Schematic of the multi-assay screening pipeline based on initial screening of bacterial ORFs against the human ORFeome 9.1. The primary screening involved screening against human protein pools, followed by retesting of positives, identification of candidate pairs by sequencing and final, independent four-fold verification. b, Detection rates of protein pairs in different sets across varying thresholds in yN2H. Fractions scoring positive of the HuMMI dataset and benchmarking datasets (hsPRS-v2, bhLit_BM-v1, hsRRS-v2, bhRRS-v1) depending on the threshold of the normalized luminescence ratio (NLR). Full data in Supplementary Data 14. c, Co-immunoprecipitation of indicated Myc-tagged human proteins by indicated FLAG-tagged effectors or FLAG-GFP as negative control. Input: Cell lysates. Molecular weight markers are given in kDa. Dark green dots: successful co-immunoprecipitation. Light green dots: successful co-immunoprecipitation, but weak or no effector detection in lysate. Blot lanes were partially rearranged.
Extended Data Fig. 3 Patterns of bacterial effector-human protein interactions.
a, Jaccard-interaction similarity of all interacting effector-pairs with at least three shared human interactors. Color-intensity correlates with Jaccard-index. Effector pairs marked with “H” share the same homology cluster. Clusters are color-coded according to the legend. b, Actual count of motif-domain pairs matching at least two stringency criteria identified in HuMMIMAIN (arrow) compared to n = 10,000 randomized control networks (empirical P = 0.0003). c, (Top) Distribution of residue–residue contacts across predicted aligned error (PAE) scores for interfaces between bacterial effectors and their human targets. (Bottom) Proportions of human–human protein pairs targeted by the same bacterial effector, grouped by interface similarity at different PAE thresholds (Jaccard Index (JI) categories: Distinct ≤ 0.1, Overlapping 0.1–0.6, Same ≥ 0.6). Pie charts show similarity distributions for contacts with PAE ≤ 9Å (50th percentile), PAE ≤ 21Å (75th percentile), and PAE ≤ 30 Å (95th percentile). d, Top: Distribution of residue–residue contacts across predicted aligned error (PAE) scores for interfaces between human proteins and their bacterial effectors. Bottom: Proportions of bacterial effector–effector pairs targeting the same human protein, grouped by interface similarity at different PAE thresholds (JI categories: Distinct ≤ 0.1, Overlapping 0.1 – 0.6, Same ≥ 0.6). Pie charts show similarity distributions for contacts with PAE ≤ 9 Å (50th percentile), PAE ≤ 21Å (75th percentile), and PAE ≤ 30 Å (95th percentile). e, GO enrichment for convergence proteins. OR of functional annotations enriched among effector-targeted human proteins that are subject of convergence (FDR < 0.05, Fisher’s exact test with Bonferroni FDR correction). Full data and precise FDR and OR values in Supplementary Data 20.
Extended Data Fig. 4 Effector impact on human cell function.
a. Relative NF-κB transcriptional reporter activity of HEK293 cells expressing the indicated effectors under TNF-stimulated conditions (Kruskal-Wallis test with Dunn’s post-hoc comparisons, * P < 0.05, ** P = 0.01, n = 4 biological replicates). Boxes represent IQR, with the bold black line representing the mean; whiskers indicate highest and lowest data point within 1.5 IQR. b, Representative anti-Hemagglutinin (HA) and anti-Flag (FLAG) Western blots showing expression of transfected effector proteins relative to actin control (ACT), which was run on a different blot. Empty pMH-Flag-HA (pMH), empty pEF4 (pEF). c. Representative proliferation curves of Caco-2 cells transfected with empty vector (EV) or Cpa_12 in basal conditions (unstim) or following pro-inflammatory stimulation (stim) over 72 h after sorting. Error bars indicate one standard deviation above and below the mean. d, Concentration of cytokines secreted by Caco-2 cells transfected with the indicated effectors in basal conditions (US) or following stimulation with the indicated elicitors. EV indicates empty vector mock control. Indicated P values calculated by Kruskal-Wallis test with Dunn’s post-hoc comparisons (n = 5). Boxes represent IQR, with the bold black line representing the mean; whiskers indicate highest and lowest data point. Raw measurements, n, and precise P values for all panels in Supplementary Data 24 and 25.
Supplementary information
Supplementary Information
Guide to Supplementary Data providing an overview of 26 thematically organized datasets, each introduced by a summary sheet describing the contents of every worksheet.
Supplementary Data 1
T3SS identification in reference strains.
Supplementary Data 2
TxSS identification in HBC/UHGG collections.
Supplementary Data 3
T3SS identification in metagenome assemblies.
Supplementary Data 4
Prediction of T3SS effectors.
Supplementary Data 5
Commensal versus pathogen effectors.
Supplementary Data 6
Structural effector analysis and Foldseek clustering.
Supplementary Data 7
Effector cloning.
Supplementary Data 8
Phylogenetic assignments of strains used in this study.
Supplementary Data 9
Effector identifiers and abbreviations used in this study.
Supplementary Data 10
Injection assays raw data and statistics.
Supplementary Data 11
Human–microbiome meta-interactome.
Supplementary Data 12
Reference interaction sets.
Supplementary Data 13
Assay sensitivity.
Supplementary Data 14
Validation rate.
Supplementary Data 15
Interactions between pathogens effectors and human proteins downloaded from IMEXs.
Supplementary Data 16
Relationship between effector interaction profile and effector sequence similarity.
Supplementary Data 17
AlphaFold2 interaction interface in HuMMI.
Supplementary Data 18
Domain–motif interface predictions.
Supplementary Data 19
Holdup assay and validation of PBM–PDZ interface predictions.
Supplementary Data 20
Functional enrichment analysis of targets.
Supplementary Data 21
Genetic predisposition enrichment of effector targets.
Supplementary Data 22
Prevalence of HuMMI effector proteins across the OhJ_2014 skin cohort.
Supplementary Data 23
Genetic predisposition trait enrichment in effector target neighbourhoods identified by RWR.
Supplementary Data 24
NF-kB assays.
Supplementary Data 25
Cytokine assays.
Supplementary Data 26
Prevalence of HuMMI effector proteins in IBD versus healthy patients.
Source data
Source Data Fig. 1
Unprocessed western blots.
Source Data Fig. 2
Unprocessed western blots.
Source Data Fig. 3
Unprocessed western blots.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Young, V., Dohai, B., Halder, H. et al. Effector–host interactome map links type III secretion systems in healthy gut microbiomes to immune modulation. Nat Microbiol (2026). https://doi.org/10.1038/s41564-025-02241-y
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41564-025-02241-y







