Introduction

More than 80% of all known bacterial species are motile at some stage of their life cycle1. These bacteria navigate toward nutrients or away from unfavorable environments through chemotaxis2,3,4, using movements such as swimming, swarming, and twitching, among others5. The flagellum, extracellular machinery, facilitates swimming and swarming movements and comprises three main components: the basal body complex, functioning as the motor; the connecting rod and hook, acting as a universal joint; and the flagellar filaments, serving as a mechanical propeller6,7. Canonical flagellar filaments are long, supercoiled structures composed of around 20,000 flagellins. As the filament rotates, it generates thrust, acting similarly to an Archimedean screw. To date, all known bacterial flagellins share a conserved D0/D1 domain architecture, which is uniformly arranged across various bacterial species. These domains exhibit a helical rise of ~5 Å and a twist of 65.4°, aligning every 11 flagellins near vertically. This configuration presents distinct 11-start protofilaments on the filament’s surface and its inner core. Recent research has revealed the molecular basis underlying flagellar supercoiling, identifying 11 different flagellin D0/D1 states that result in 11 unique protofilament conformations. Interestingly, similar supercoiling mechanisms have also been documented in archaeal flagella, despite the structural components of archaeal and bacterial flagella not sharing homology8.

For bacteria that colonize or infect other organisms, motility plays a crucial role in interactions between a bacterium and its host5. Beyond enabling movement, flagella are thought to possess several other functions, including adhesion to surfaces9,10, colonization11, biofilm formation12, and potentially modulating host immune response13, among others. Many of these properties are attributed to the flagellar outer domains, which are part of the central region of the flagellin, named D2–D4, and so forth. In several bacterial species, these outer domains are not essential for motility14,15. In fact, about half of the bacterial flagellins annotated in the UniProt database contain only the D0/D1 domain, lacking outer domains.

Several outer domain structures from a diverse range of species10,16,17,18,19,20,21,22,23,24,25,26 that include soil-borne bacteria such as Sinorhizobium meliloti, opportunistic human pathobionts such as Salmonella enterica and Pseudomonas aeruginosa, as well as life-threatening primary human pathogens like Burkholderia pseudomallei, a high-priority biological agent responsible for melioidosis27, have been reported in the previous studies. These structures encompass flagellin structures solved by X-ray crystallography and filamentous structures determined by cryo-electron microscopy (cryo-EM). Remarkably, the fold and architecture of the outer domains are highly variable. The lack of clear homology among the outer domains of flagellin across different species has historically made it challenging in the structural analysis of these domains. Recent advances in protein structure prediction methods, notably AlphaFold28, have shown great promise in accurately predicting protein structures at the fold level, even when no analogous structure exists. With most UniProt sequences now predicted and available in the AlphaFold database29, it is feasible to undertake large-scale structural analyses of flagellin outer domains.

In this study, we report the near-atomic resolution cryo-EM structures of supercoiled bacterial flagellar filaments from three diverse bacterial species. The first is from Cupriavidus gilardii, a Gram-negative, aerobic, opportunistic pathogen that has been increasingly associated with human infection and holds potential for bioremediation30,31. The second is from Stenotrophomonas maltophilia, a Gram-negative, aerobic bacterium known for its multidrug resistance and its ability to infect the lungs of individuals with cystic fibrosis32,33. The third is from Geovibrio thiophilus, a Gram-negative, non-sporulating bacterium residing and thriving in water sediments under anaerobic and microaerophilic conditions and that reduces sulfur and nitrate34. We discovered that the flagellar surface of G. thiophilus carries a significantly higher negative charge, suggesting a potential mechanism for its adhesion to positively charged minerals. Moreover, we observed remarkable diversity in the outer domains of the three elucidated structures in both size and fold. The only similarity was that the D2 domain of S. maltophilia and the D3 domain of G. thiophilus both exhibit an immunoglobulin-like (Ig-like) fold topology35.

This discovery prompted us to conduct a comprehensive review of annotated bacterial flagellins using AlphaFold predications, classifying the predicted flagellin outer domains into 682 structural clusters. Our results indicated that nearly half of the flagellin sequences with outer domains contain at least one Ig-like domain, suggesting that the Ig-like domain is ubiquitous in flagellin proteins. Additionally, our analysis provided a profile of the most frequently observed outer domains and determined whether their structures have been experimentally resolved. Ranked by the abundance of outer domain architectures, prior cryo-EM studies have documented outer domain structures in clusters #2, 3, 5, 20, and 28, while X-ray studies have explored outer domains in clusters #1, 2, 5, 6, 10, and 17. The structures we reported belong to clusters #1, 27, and 41, addressing a significant gap in knowledge and highlighting the existence of numerous other fascinating clusters yet to be discovered in structural studies.

Results

Cryo-EM structures of three flagellar filaments with outer domains

Cryo-EM was utilized to determine the structures of three flagellar filaments: one peritrichous from C. gilardii36, one lophotrichous from S. maltophilia32, and one monotrichous from G. thiophilus34. The established helical symmetry of canonical flagellar D0/D1 domains, with a helical rise of ~5.0 Å and a twist of ~65.45°, was confirmed by examining the power spectra. When this symmetry was applied to 320-pixel box particles, it led to high-resolution 3D reconstructions: C. gilardii D0/D1 domains at 3.1 Å, S. maltophilia D0–D2 domains at 3.2 Å, and G. thiophilus D0–D3 domains at 3.4 Å. This indicates that the D2 domain of C. gilardii and D4/D5 domains of G. thiophilus diverge from canonical flagellar symmetry, featuring additional interfaces within the outer domain region (Fig. 1). In order to generate 3D reconstructions for supercoiled flagellar filament, particles were re-extracted using a larger 640-pixel box, followed by subsequent inspection of power spectra to identify a layer-line with a meridional intensity that does not exist in the canonical flagellar symmetry. A weak meridional layer-line near 1/(110 Å) was observed in G. thiophilus flagella, suggesting a repeating feature every 22 flagellin molecules, similar to the flagellar filament in P. aeruginosa PAO116,18. Additionally, previously described intermediate layer-lines18,37,38,39 were seen in both power spectra of C. gilardii and G. thiophilus flagella, indicating non-helical perturbations were present in the structure. As described in Supplementary Fig. 1, the reconstruction of the supercoiled filament was executed using helical refinement, applying the respective outer domain symmetry and post-processing with homogeneous and local refinements to relax the helical constraints. The 3D reconstructions of the supercoiled filaments for C. gilardii, S. maltophilia, and G. thiophilus achieved final, near-atomic resolutions of 3.4, 3.3, and 4.1 Å, respectively, as determined by map:map FSC (Supplementary Fig. 2).

Fig. 1: Cryo-EM of flagellar filaments from C. gilardii, S. maltophilia, and G. thiophilus.
figure 1

Representative cryo-electron micrographs of the C. gilardii flagella (A), S. maltophilia flagella (B), and G. thiophilus flagella (C). Scale bar, 50 nm in (A)–(C). Orange arrowhead points to the C. gilardii flagellum; green arrowheads point to the S. maltophilia flagellum; blue arrowhead points to the G. thiophilus flagellum. 2D averages of all three flagella are shown in the top right corner. Cryo-EM reconstructions of the C. gilardii flagellum at 3.4 Å resolution (D), the S. maltophilia flagellum at 3.3 Å resolution (E), and the G. thiophilus flagellum at 4.1 Å resolution (F). Thin sections parallel to the helical axis of the flagella are shown on the right of the 3D reconstruction, colored by the radius. G–I The single flagellin structures modeled into the cryo-EM densities are displayed on the right, with their respective domain names indicated. The conserved D0/D1 domains are shown in gray. The outer domains of C. gilardii, S. maltophilia, and G. thiophilus are consistently colored in accordance with the 3D reconstructions.

The diameters of C. gilardii and S. maltophilia flagellar filaments were similar, ~165–170 Å, in contrast to the notably wider G. thiophilus filament, which measured around 240 Å (Fig. 1A–C). All three flagellar filaments exhibited a similar D0/D1 domain architecture, which is expected since all bacterial flagella share this homology. Notably, both C. gilardii and S. maltophilia have a single D2 domain. In C. gilardii, the D2 domain dimerizes, creating a screw-like feature on its surface (Fig. 1D). Conversely, in S. maltophilia, the D2 domain does not form an extra interface with adjacent domains, maintaining the same helical symmetry as the D0/D1 domain. The outer domains D2–D5 in G. thiophilus are the largest flagellin outer domains determined by cryo-EM (Fig. 1F, I), responsible for the diameter increase to 240 Å. Interestingly, three flagellin sequences are present in the G. thiophilus genome, quite similar in size and overall folds predicted by AlphaFold. Upon full-length modeling of all three flagellin sequences, the correct flagellin protein sequence was identified, as particular regions in the reconstruction were found that could not be explained by the other two sequences.

Outer domains in three flagellins share β-strand rich architecture

As expected, the D0/D1 domains among the three flagella filaments share an identical fold, with sequence identities ranging from 42% to 59%. However, the fold of their outer domains significantly diverges (Fig. 1G–I). Remarkably, the fold of the C. gilardii D2 domain (Fig. 2A) was not observed in known, experimentally determined structures, as indicated by Foldseek40 and the DALI41 server. Further investigation into sequence-level and AlphaFold prediction similarities revealed that similar outer domains are present in certain species of the Betaproteobacteria class, including those in the Burkholderiales order, and in species of the Gammaproteobacteria class, including those in the orders Enterobacterales, Oceanospirillales, and Pseudomonadales, many of which are opportunistic human and plant pathogens. It is probable that the species belonging to those groups have been sparsely sampled in the past for structural studies. The S. maltophilia D2 domain exhibits an Ig-like fold (Fig. 2B), a common two-layer β-sandwich domain found in numerous proteins with diverse functions35, including the globular domain in archaeal type IV pili42,43,44. Unsurprisingly and likely due to the ubiquitous nature of the Ig-like domain, similar outer domains were found in many AlphaFold-predicted flagellins across a diverse array of bacterial species, from anaerobes to aerobes and marine algae to human microbiota. A similar D2 domain was also experimentally captured in a partial flagellin crystal structure from Sphingomonas sp. A119, sharing approximately 28% sequence identity between the D2 domains. The G. thiophilus flagellum, notably the first flagellar structure from bacteria growing anaerobically, presents a large outer domain architecture with four domains D2–D5 (Fig. 2C). All four outer domains are rich in β-strands: the fold of D2 domain is mainly found in flagellar hook-associated proteins; the D3 domain is a variant of the Ig-like fold; D4 and D5 are β-sandwich domains similar to the Pfam45 DUF992 (domain of unknown function). Similar flagellar outer domains are mostly identified in bacteria from environments like lake or ocean sediment, soil, and wastewater, across the classes Chrysiogenetes, Clostridia, Deferribacteres, and Synergistia.

Fig. 2: The fold of flagellar outer domains from C. gilardii, S. maltophilia, and G. thiophilus.
figure 2

Cryo-EM structures of the flagellar outer domains from C. gilardii (A), S. maltophilia (B), and G. thiophilus (C) were presented, with all α-helices colored brown and β-sheets colored green. These domains are also drawn in schematic representation, with β-sheets shown as green arrows, α-helices as brown cylinders, and loops as dark gray lines. When multiple domains are present, they are connected by blue dashed lines near the N-terminus and red dashed lines near the C-terminus.

Dimeric outer domain interactions in C. gilardii and G. thiophilus flagella

Next, we explored how the outer domains interact along the flagellar filaments. Interestingly, the outer domain of S. maltophilia maintains the same D0/D1 symmetry and does not form additional polymeric contacts with adjacent subunits. In contrast, the C. gilardii and G. thiophilus flagella display partial or entire outer domain dimerization, thereby disrupting the D0/D1 symmetry. The dimeric interface happens at the radius of ~80 Å in C. gilardii and ~120 Å in G. thiophilus (Fig. 3A). Given the canonical D0/D1 symmetry, adjacent flagellins along the 11-start have minimal twist (~65.45° multiplied by 11), rendering the protofilament nearly parallel to the helical axis. In both C. gilardii and G. thiophilus, the outer domains—D2 in C. gilardii and D4/D5 in G. thiophilus—protrude from the 11-start and adopt either an “up” or “down” conformation.

Fig. 3: Outer domain arrangements on flagellar surface.
figure 3

A The top view of three flagellar filaments with the conserved D0/D1 domains in gray, the D2 domain of C. gilardii flagellum in orange, the D2 domain of S. maltophilia flagellum in green, the D2/D3 domains of G. thiophilus flagellum in light blue, and the D4/D5 domains of G. thiophilus flagellum in dark blue. B The D2 domain dimerization in C. gilardii is presented within the cryo-EM map density. C The D4/D5 domain dimerization in G. thiophilus, with the model shown within cryo-EM map density. D The helical net of the flagellar filament is illustrated using the convention that the surface is unrolled, providing a view from the outside. Gray dots indicate the approximate symmetry present in the D0/D1 domains of all flagellar filaments. The conventional 11-start and left-handed 5-start, originating from the D0/D1 domain, are indicated with black lines. The dimer interface that occurs in the flagellar outer domains of both C. gilardii and G. thiophilus is illustrated with transparent orange stadium shapes. The seam arising from this dimerization packed along the D0/D1 5-start is indicated by a red dashed line.

This dimeric interface, formed between the “up” and “down” conformations of the outer domains, generates a screw-like feature in the C. gilardii flagellum, where the surface area buried between subunits is ~270 Å2 (Fig. 3B) as calculated by PDB PISA46. Conversely, the G. thiophilus outer domain lacks a distinct screw-like feature, with the dimeric interface between the D5 domains of subunit S0 and subunit S+11 enclosing a buried interface of about 220 Å2. The interaction area between D4 domains along the 11-start protofilament is relatively small, at around 80 Å2 (Fig. 3C). Notably, the D2/D3 domains in G. thiophilus, located at a smaller cylinder radius of about 80 Å, do not dimerize, preserving the approximate symmetry seen in D0/D1.

The dimerization interface occurs between subunits on the same 11-start protofilament, with dimer packing extending along the left-handed 5-start protofilament. Thus, a seam occurs between two 11-start protofilaments (Fig. 3D), similar to flagella previously studied in P. aeruginosa PAO116,18. This seam, reminiscent of those observed in microtubule filaments, interrupts the continuity along the dimer extension, which is the left-handed 5-start of D0/D1 domains (Fig. 3D). Although such structures with a seam were termed “non-helical” in the past, they can be viewed as helical polymers with a large asymmetric unit comprising 22 flagellin subunits while ignoring flagellar supercoiling. This larger ASU reconstruction strategy failed to reach high resolution seven years ago in the reconstruction of P. aeruginosa PAO1 flagella data18 recorded using a Falcon II camera in integrating mode and processed with legacy software SPIDER47. However, advancements in cryo-EM now make this strategy possible (Supplementary Fig. 1) using a Gatan K3 camera in counting mode, processed in CryoSPARC48.

The flagellar surface of G. thiophilus is negatively charged

The flagellar outer domains of C. gilardii and S. maltophilia are significantly smaller compared to the outer domains in G. thiophilus, thereby failing to completely shield the inner D0/D1 domain from solvent exposure. In contrast, G. thiophilus possesses a two-layered outer domain, comprised of the D2/D3 middle layer and the D4/D5 outer layer, which effectively encapsulates the D0/D1 domain from solvent access. This observation prompted us to investigate whether such coverage impacts the surface charge by estimating the coulombic electrostatic potential for the flagellar surfaces of these three species. Remarkably, the surface of the G. thiophilus flagellum is significantly more negatively charged than those of the other two species, attributed to a higher presence of aspartic acid and glutamic acid as surface residues (Supplementary Fig. 3A); analysis of amino acid composition suggested the same conclusion. When examining the D0/D1 domains, the amino acid distribution among the three flagella was nearly identical. However, a comparison of the outer domains revealed that the C. gilardii flagellum, while having an overall amino acid composition similar to S. maltophilia’s, possesses slightly more negatively charged residues, resulting in a mildly negative surface charge. On the other hand, the outer domain of the G. thiophilus flagellum displays a notable difference, containing three times as many negatively charged residues compared to positively charged ones. Even though the G. thiophilus outer domain is ~3.5–4 times larger than those of the other two flagella, it has 6–7 times more negatively charged residues, resulting in a predominantly negatively charged surface that fully covers the D0/D1 domain (Supplementary Fig. 3B, C).

The Ig-like fold is ubiquitous in the bacterial flagellin outer domain universe

Flagellar outer domains are known to vary greatly in protein sequence and length49, which is consistent with our findings. We observed that two of the three flagellins we report contain an Ig-like domain, which has been noted in several other previously reported flagellin structures16,18,19,20,22. This prompted us to question whether flagellin outer domains possess certain folds/domains more commonly than others. The advent of AlphaFold28 has enabled this type of computational analysis. To date, all AlphaFold predictions have been clustered based on fold similarity50. These results, unfortunately, were biased by D0/D1 alignment and could not be directly applied to our study, which seeks domain clustering results independent of D0/D1 domains. Therefore, we downloaded all bacterial flagellin AlphaFold predictions for analysis, trimming the D0/D1 domains (263 residues) based on prior knowledge of experimental flagellin structures. Interestingly, about 60% of flagellins are shorter than 350 residues (Fig. 4A, B), containing either only the D0/D1 domain, such as the Bacillus subtilis flagellum18, or the D0/D1 domain of Borreliella burgdorferi flaA51, which contains an extra small domain/disordered loop (Fig. 4B). Thus, we set a cutoff at 350 residues for substantial outer domains analysis, creating a library of 16,948 bacterial flagellin outer domain predictions.

Fig. 4: Analysis of AlphaFold predictions of bacterial flagellin outer domains.
figure 4

A The typical architecture of flagellin includes outer domains. The D0 and D1 domains, which have 263 or a bit more residues, are colored in red and purple, respectively. The other outer domains, extending from inner to outer diameter, are sequentially named D2, D3, D4, and so on. B This plot represents the number of UniProt entries corresponding to various lengths of flagellin proteins. Flagellin AlphaFold predictions equal and longer than 350 residues are selected for subsequent multistep analysis, as described in the method section. 682 structural clusters were generated. C The matrix visualizes a DALI all-to-all analysis of 682 structural cluster representatives. It is constructed using pairwise DALI Z-scores, with the corresponding Z-score color scale presented on the right. The most prominent cluster, located at the bottom right of the matrix, contains representatives with Ig-like domains.

For initial clustering, we employed a protein sequence-based approach, similar to, yet more aggressive than, AFDB clustering, by grouping similar protein sequences with MMseqs252 at a 30% sequence identity and 80% overlap cutoff. This method categorized the 16,948 predictions into 1281 sequence-based groups. We noticed many flagellin outer domains had a variable number of domains present. We initially considered analyzing individual domains within multi-domain outer regions, but this proved impractical. The variability in protein domain folds makes it difficult to define clear matching criteria, and many domains are unique to bacterial flagellin, complicating their classification. Additionally, analyzing small domains is challenging, particularly given the limitations of current protein fold libraries. Therefore, we focused our analysis on entire flagellin outer regions. Using this strategy, we next applied a more conserved rigid-body alignment strategy by utilizing USalign53 combined with graph-based community detection clustering via the walktrap54 algorithm to further group entries with similar lengths and structures. This step organized 1281 sequence-based groups into 682 structural clusters. Each cluster contained a unique “representative” flagellin, distinct from others at the sequence or rigid-structure level (Fig. 4B). Lastly, we examined whether prevalent domain folds were maintained within this library of 682 structural clusters (Supplementary data 1). We used DALI, a tool for analyzing deep phylogenetic relationships and protein homology, known for its sensitivity to fold similarities, to perform an all-to-all analysis41. Strikingly, this analysis revealed significant clustering, as shown in the Z-score heatmap (Fig. 4C), with about 40% of the representatives carrying at least one Ig-like domain, comprising nearly half of the 16,948 AlphaFold predictions.

Diverse outer domains remain to be studied in bacterial flagella

Lastly, we asked what the most abundant folds/domains are in known bacterial flagellar outer domains and how they are related. To accomplish this, using the same community-based detection strategy for the USalign clustering discussed above, we grouped the remaining representatives using a DALI Z-score >10 as the cutoff for significance. We listed the most populated structural clusters, the number of predictions within those clusters, AlphaFold predictions of the representatives, and the known experimental structures belonging to a given cluster (Fig. 5A). The phylum with the highest abundance in each cluster is highlighted accordingly (Supplementary Fig. 4). Not surprisingly, the largest cluster in the center is comprised of representatives that have at least one Ig-like domain within the outer domains. Among 682 structural clusters, five out of the six most populated clusters contain one or more Ig-like domains: cluster #1 with 2304 entries, including S. maltophilia flagellin; cluster #2 with 1838 entries, featuring two Ig-like domains as observed in P. aeruginosa PAO116,18; cluster #3 with 1094 entries, including another variant of two Ig-like domains, as in Campylobacter jejuni flagellin20; cluster #4 with 1094 entries, possessing one Ig-like and one β-barrel domain; and cluster #6, showing a single Ig-like domain variant previously identified in a different strain of P. aeruginosa22 (Fig. 5B).

Fig. 5: Populated flagellin outer domains in the bacterial domain.
figure 5

A The relationships among 682 structural clusters are depicted using the Fruchterman-Reingold algorithm, a force-directed layout algorithm from the R/igraph package. The communities, indicated with transparent color circles, were detected using the walktrap algorithm with a step size of 6. Additionally, some most populated clusters, as illustrated in (B), are highlighted with black circles. B AlphaFold predictions or experimentally determined structures from populous structural clusters are organized according to the population size of the first representative in a given community. Within these models, all α-helices are colored yellow, β-sheets blue, and loops gray. Known domain folds are labeled. For clusters with available cryo-EM or X-ray structures, the corresponding species and PDB IDs are provided.

Apart from the Ig-like domain, we observed various other widespread folds in flagellin outer domains. For instance, several flagella structures in similar clusters like #5, #21, and #28 have been reported, including Salmonella enterica21, Salmonella enterica serovar typhimurium10, and Escherichia coli K1217. Intriguingly, reported flagellin structures FliC25 and FljB10 of S. Typhimurium, while sharing similar individual domains, are located in clusters #5 and #28, respectively, due to significant differences in the orientation between their two domains. Structural clusters mainly comprising α-helices, such as #7, #29, #115, etc., were also identified. Interestingly, no experimental structure has been documented in these α-helix rich clusters yet. Other clusters having experimental structures include cluster #8, featuring E. coli O157:H7, E. coli O127:H6, and Achromobacter spp. flagella17, and cluster #10, containing Salmonella dublin23 flagellum. These outer domains typically exhibit “skinny” architectures containing domain folds similar to the de novo-designed domain named foldit355. Other less abundant, but intriguing, clusters were also observed; for instance, the outer domain in cluster #15 resembles the peptidase-M61 catalytic domain56, referred to as flagella_HEXXH domain57; the fold in cluster #35 is similar to the pectate trisaccharide-lyase domain58; and the WD40-repeats in cluster #55, more commonly found in eukaryotes, is often thought to serve as a rigid scaffold for protein interactions59. Two of the three flagellar structures reported in this study belong to distinct outer domain clusters: C. gilardii (cluster #27) and G. thiophilus (cluster #41). Although these discovered families are not the most abundant in our analysis, there may be a bias introduced by commonly studied bacterial species that are more redundantly sampled in the database. Overall, this analysis suggests that many more folds exist in flagellar filaments that have yet to be sampled, and their functions remain to be discovered.

Discussion

The field of structural biology has rapidly advanced due to recent developments in cryo-EM60,61 and structure prediction28 techniques. These developments, when combined with experimental data, now enable the performance of large-scale29 and robust structural analyses at the protein fold level. In this study, we determined the cryo-EM structures of three bacterial flagellar filaments at near-atomic resolution. These structures encompass both the conserved D0/D1 regions and the outer domains, providing a general 3D reconstruction strategy for wild-type supercoiled bacterial flagella. Among our findings, we identified distinct folds not previously observed in other bacterial flagella to the best of our knowledge. Of particular interest, was the prevalent Ig-like fold within the known flagellar outer domains. Our in-depth analysis of bacterial flagellar outer domains confirmed the widespread presence of the Ig-like domain; it is found in nearly half of all known flagellin proteins that are longer than 350 residues. The Ig superfamily is ubiquitous, present in various kingdoms, including eukaryotes, prokaryotes, bacteria, viruses, fungi, and plants, and encompasses hundreds of protein families with diverse functions62. These include antibodies63, receptor tyrosine kinases64, archaeal type IV pili43, and many others. Given its widespread distribution, the Ig-fold was likely selected as the most abundant outer domain due to its structural stability, its ability to adapt functionally within the loop regions while maintaining the same fold, and its potential for duplication into multiple linear Ig-folds, enabling complex architectures62. Additionally, we discovered other interesting flagellar outer domains, which could spark future research to study flagellar filaments in unexplored species.

From the perspective of bacterial “economics”, flagellin emerges as one of the most energetically demanding proteins for bacteria to produce, requiring an estimated 20,000 copies per flagellum. This high energy investment in flagella production was clearly demonstrated over 30 years ago in studies when bacteria were grown in stirred liquid cultures where motility provides no advantage but incurs additional energy expenditure. Bacterial cells lost their flagellar filaments resulting from spontaneous mutations in flagellar genes within just 10 days, highlighting an evolutionary advantage for bacteria that conserve energy by not producing flagella65. Recent estimates suggest that the energy cost of synthesizing flagellar filaments accounts for 0.5–40% of the total energy budget across different species66. The presence of outer domains in flagella, especially in species like G. thiophilus, where the flagellin size can reach up to 793 residues—approximately triple the size of a flagellin composed solely of the D0/D1 domain—underscores the hypothesis that these elaborate structures must confer substantial benefits to justify their high energy costs.

While the conserved D0/D1 domain is crucial for swimming motility, the necessity of the outer domains for this function has been questioned. In S. typhimurium, it has been shown that flagella lacking an outer domain remain motile, suggesting that this domain is not required for swimming motility15. Yet, studies suggest that the outer domain may enhance motility in several ways. For instance, in E. coli, the outer domains have been shown to extend tumbling time, thereby improving navigation efficiency17. Similarly, in P. aeruginosa PAO1, mutations in the outer domain disrupt the formation of supercoiled filaments, adversely affecting motility16. Conceptualizing the flagellar filament as a propeller, the outer domain could enhance the motility by increasing the flagellar diameter and adjusting the angle of attack, potentially aiding movement in high-viscosity environments37. Beyond motility, outer domains may serve diverse functions across different bacterial species. For example, the outer domains in cluster #15 resemble glycyl aminopeptidase, suggesting a role in digesting extracellular proteins to facilitate the uptake of essential amino acids67. In the case of G. thiophilus (cluster #41), which thrives in anoxic or microaerophilic environments, and has the largest flagellin structure identified, its flagellum features an unusually negatively charged surface. This characteristic might regulate motility modes in various environments. For instance, in the presence of positively charged surfaces like minerals, iron oxides, and most metal oxides, the flagellum may adhere more readily, potentially leading bacteria to interact with the surface. Conversely, in environments that are more negatively charged and devoid of minerals, the flagellum could experience reduced resistance, allowing for smoother rotation and enhanced motility.

Several studies have shown that flagellar outer domains can undergo additional surface structure organization, disrupting the D0/D1 symmetry and resulting in the formation of structures such as dimers, dimers with seams, tetramers, and tetramers with seams16,17,18. Notably, despite the similarity in flagellar outer domain architectures between E. coli O157:H7 and E. coli O127:H6, which both belong to cluster #8, they exhibit distinct surface organization patterns: O157:H7 forms dimers without a seam, whereas O127:H6 assembles into tetramers with a seam. On the other hand, dimers with seams are commonly observed in flagellar outer domains, with known cryo-EM structures identified in bacteria like P. aeruginosa PAO1 (cluster #2), C. gilardii (cluster #27), and G. thiophilus (cluster #41). This suggests that such surface organization reflects a strategy for optimized spatial packing, influenced more by the size and surface properties of the outer domains than by the protein fold or specific function. From the helical net (Fig. 3C), it becomes apparent that the linker connecting D1 and D2 domains in subunit S0 is positioned close to S5, S6, and S11. Should dimerization occur between S0 and S11, with the resulting protofilament extending along the left-handed 5-start (all the numbered start refers to the classic D0/D1 symmetry), a seam will form (Fig. 3C). Conversely, when dimerization occurs between S0 and S5, extending along the right-handed 6-start, as seen in S. meliloti17, no seam is formed. Thus, it is reasonable to anticipate the discovery of other packing arrangements, such as trimerization or pentamerization, within the flagellar outer domain in the future. Such diversity in packing could further reveal the adaptability and complexity of flagellar structure and function.

The diversity seen in flagellin outer domains hints that these structural folds have been co-opted randomly or through horizontal gene transfer from other proteins. A similar phenomenon has been observed in bacterial type IV pili (T4P). One well-studied T4P from Pseudomonas aeruginosa PAO11 has a long N-terminal partially melted helix and a C-terminal globular domain68. Later, T4Ps with very different outer domains but conserved N-terminal helices were found in other bacteria, like Thermus thermophilus69. Additionally, a bacterial T4P was found with two chains (two genes) for the N-helix and C-domain70, and another with only an N-terminal helix71. Returning to bacterial flagella, all share a conserved D0/D1 domain, sufficient for supercoiled motility in many species, similar to the T4P N-terminal helix. The diverse D2/D3, etc., domains, akin to T4P outer domains, were likely acquired independently during evolution. Identifying their origins can be challenging. For instance, we could not pinpoint the D2 domain’s origin in C. gilardii, possibly due to its small size. S. maltophilia’s D2 and G. thiophilus’s D2/D3 yielded many hits due to their Ig-domain nature, found in various protein families, including archaeal type IV pilin. The origin of G. thiophilus’s D4 and D5 is clearer. D5 is a duplication of D4, inserted within one of its loops. D4 is related to the D2 domain of FliD cap proteins, found at the flagellum’s tip (Supplementary Fig. 5A). This suggests that the augmentation and diversification of this organism’s flagellum’s outer domains occurred through the gradual addition of preexisting domains encoded by flagellar genes via recombination. Finally, we examined additional clusters that could imply the origin of outer domains. For three clusters (peptidase, protease, WD40), we found cellular homologs with annotations and solved structures not labeled as flagellin (Supplementary Fig. 5B–D), suggesting they may be co-opted or horizontally transferred from other proteins.

Bacteria often harbor multiple flagellin genes, such as those seen in G. thiophilus, suggesting the potential for diverse functionality fulfilled by different flagellin variants, which are produced in response to environmental needs. The flagellin genes in G. thiophilus are situated in close proximity, with two of them being particularly near each other but not necessary in the same operon: EP073_12645 (which has a reported cryo-EM structure) and EP073_12655. The third flagellin gene, EP073_12695, is clearly in a different operon. The conditions under which these flagellin genes are expressed and the timing of their production remain unclear based on protein fold analysis using Alphafold predictions. All three flagellins have similar lengths (ranging from 793 to 801 residues), and all their outer domains belong to cluster #41. The most notable difference lies in the surface of the D3/D4 domains in EP073_12695, which has considerably fewer negatively charged residues (Asp and Glu combined) compared to the other two flagellins, with a decrease from 44 to 23 residues. This suggests that EP073_12695 might be produced under specific environmental conditions, such as varying pH levels. However, further investigation using mutagenesis will be necessary to confirm this hypothesis once genetic tools become available for this species.

Such complexity in flagellin expression and function is not unique to G. thiophilus and has been observed in various bacteria. For instance, bacteria like Azospirillum lipoferum and Shewanella piezotolerans have sophisticated adaptive behavior by producing polar flagella for swimming in liquid environments and multiple lateral flagella for swarming on surfaces. In some bacteria, including Shewanella putrefaciens and Helicobacter pylori, flagellar genes are regulated hierarchically, leading to a single flagellar filament composed of two different flagellins, one located near the hook and the other further away. This hierarchical regulation of appendages can be seen in the archaeal domain as well, with Sulfolobales islandicus REY15A utilizing the same pilin protein to generate two different type IV pili structures by modifying its Ig-like outer domain in response to varying environmental needs42. This underscores a broader principle of microbial adaptability and the strategic regulation of extracellular filament secretion. Clearly, despite these advances, our understanding of the regulatory mechanisms governing flagellin gene expression, extracellular filament secretion, and the specific roles of uncharacterized outer domains remains incomplete. Future research dedicated to unraveling these aspects promises to deepen our insights into microbial motility and interaction with their environments.

Methods

C. gilardii flagellar filament preparation

C. gilardii cells were grown aerobically at 37 °C for 48 h in chemical defined medium72 in 5% CO2 incubator. For isolation of filaments, cells were spun down at 7000 rpm (9120 × g, Beckman JLA, 12.500 rotor). Pellet was suspended with 4 ml phosphate-buffered saline (PBS) buffer pH 7.2, and filaments were sheared off from the cell using a homogenizer at 10,000 rpm for 10 min. Cells and debris were removed by centrifugation at 10,000 × g for 10 min. After this, filaments were pelleted from the supernatant by ultra-centrifugation (Beckman 50.3 Ti ultra-rotor, 35,000 rpm) at 4 °C for 1.5 h, and subsequently resuspended with 150 µl PBS buffer. DNase I (NEB) was added to the sample to remove possible extracellular DNA fibers.

S. maltophilia flagellar filament preparation

S. maltophilia cells were grown aerobically at 37 °C for 48 h in Tryptic Soy Broth (TSB, BDTM) medium in a 5% CO2 incubator. The resultant pellet was resuspended in 4 mL of PBS buffer, and the cell suspension was put under a homogenizer for 10 min to shear off the extracellular filaments as described above. The cells were then removed by centrifugation at 10,000 × g for 10 min. The supernatant was collected, and the filaments were pelleted by ultracentrifugation (Beckman 50.3 Ti ultra-rotor, 35,000 rpm, 1.5 h, 4 °C). After the run, the supernatant was removed, and the pellet was resuspended in 150 µL of PBS. DNase I (NEB) was added to the sample to remove possible extracellular DNA fibers.

G. thiophilus flagellar filament preparation

G. thiophilus cells were grown anaerobically at 30 °C in anaerobic freshwater medium 503 (DSMZ) in a volume of 10 mL. After this, cells were vortexed for 30 min to mechanically shear off the extracellular filaments, and cells were subsequently removed by centrifugation at 10,000 × g for 10 min. The supernatant was collected, and filaments were further enriched by an overnight 20% ammonium sulfate precipitation at 4 °C. The resultant flagellar filament pellet was collected and resuspended with 150 µL 100 mM ethanolamine buffer pH 10.5, incubated with Dnase I (NEB) prior to plunge freezing.

Cryo-EM conditions and image processing

The flagellar filament sample (4.5 μL) was applied to glow-discharged lacey carbon grids and then plunge-frozen using an EM GP2 Plunge Freezer (Leica). The cryo-EM micrographs were collected on a 300 keV Titan Krios with a K3 camera at 1.11 Å per pixel and a total dose of 50 e2. The cryo-EM workflow was initiated with patch motion corrections and CTF estimations in cryoSPARC48,73,74. Following this, an automated picking of particle segments was conducted using the ‘Filament Tracer’ function with a shift of 10 pixels between adjacent boxes. All auto-picked particles were subsequently 2D classified with multiple rounds, and all particles in bad 2D averages were removed. Next, the possible helical symmetries were calculated from averaged power spectra generated from the raw particles (640-pixel box)75,76. The identified parameters consistent with canonical flagellar filaments were then applied in 3D helical refinement to generate a high-resolution map of the D0/D1 domains. Subsequently, the power spectra’s meridian area was carefully examined to detect layer-lines that might indicate interfaces of additional outer domains. Upon the identification of such layer-lines, a re-extraction of particles was performed, applying a shift slightly greater than the periodicity observed in the power spectra to avoid particle duplications. After that, 3D reconstruction was performed using “Helical Refinement” first, “Homogenous Refinement” next, then “Local Refine”, Local CTF refinement, and another round of “Local Refine” using CTF-refined particles (Supplementary Fig. 1). The resolution of each reconstruction was estimated by Map:Map FSC, Model:Map FSC, and d9977. Maps used in the resolution estimation were sharpened using Local Filter available in cryoSPARC. The statistics are listed in Table 1.

Table 1 Cryo-EM and refinement statistics of flagellar filaments

Model building of flagellar filaments

The first step in model building is to identify the correct flagellin protein from the experimental cryo-EM map, especially when there are multiple candidates. This was the case with the G. thiophilus flagellar sample, which had three similar flagellin sequences in its genome: A0A3R5UZF4, A0A3R5UW90, and A0A3R5V2W1. To address this, full-length modeling was carried out for each sequence until a region in the map was encountered that the sequences could not account for (see Supplementary Fig. 6). A0A3R5UZF4 was identified as the matching sequence through meticulous examination, fully agreeing with the cryo-EM density.

Regarding the modeling, the AlphaFold predicted structure of a single flagellin subunit was initially docked into the cryo-EM map. Domains (for instance, D0–D5 in G. thiophilus flagella) were docked individually since AlphaFold predictions for domain-domain orientation are frequently imprecise. The subunit was then manually adjusted and refined in Coot78. Given that the outer domain forms a dimer in C. gilardii and G. thiophilus flagella, this procedure was repeated for the other flagellin within the dimer. Following this, the refined single flagellin or flagellin dimers were docked into the other ten protofilaments within the supercoiled flagellar filament map and subjected to real-space refinement via PHENIX79. The filament model’s quality was assessed using MolProbity80, and the refinement statistics for all three flagellar filaments are detailed in Table 1. Cryo-EM map and model visualization were primarily done in ChimeraX81.

Bioinformatic clustering of flagellin outer domains

First, AlphaFold28 predictions of all bacterial proteins annotated in UniProt as “flagellin” were downloaded from the AlphaFold Protein Structure Database29. To conduct the analysis in the absence of D0/D1 domains, protein regions corresponding to D0/D1 domains were carefully trimmed from the prediction by excising the first 163 residues and the last 100 residues. This cutoff is selected based on existing experimental flagellin structures. Furthermore, flagellins shorter than 350 amino acids (outer structure shorter than 87 residues) were excluded. This is because the average size of a protein domain is approximately 100 residues, and analysis is based on AlphaFold predictions, which are very meaningful for such a short region. Next, DSSP82 was used to estimate the secondary structure components of the remaining models, and models with <35% total secondary structures were removed, leaving 16,948 flagellin predictions for clustering analysis.

The initial clustering was performed at the amino acid sequence level using MMseqs252, with the conversion from PDB to FASTA format achieved via Pdb2Fasta. The Mmseqs2 easy-cluster workflow was used based on a minimum of 30% sequence identity, requiring the short sequence to be at least 80% the length of the other sequence. This process reduced the number of 16,948 predictions to 1281 “sequence groups”, with representatives from each group chosen by MMSeqs2. Further investigation into rigid-body structural similarities among the 1281 groups was conducted using USalign53 pairwise alignments using the semi-non-sequential alignment (sNS or -mm 6), resulting in 819,480 alignments from the 1281 representative structures. An undirected graph was built using the pairwise TM scores, TM1 and TM2, as described in the USalign documentation, and from an alignment length score calculated by taking the length of alignment (Lali in USalign) and dividing it by the total sequence length of the larger outer domain of the two. The cutoffs for edges to be included were TM1 and TM2 >0.5, and the alignment length score >0.8. This was done to include strong alignments yet eliminate edges that corresponded to high TM scores for single-domain structures aligning with a similar domain in a multi-domain structure. Using this graph, community detection-based clustering was performed using the walktrap algorithm with a step size of 6. From the 1281 sequence groups, 705 were included in the graph with 7115 edges and clustered into 107 communities, the remaining 575 had no significant connections. Merging clustering results from both MMSeqs2 and USalign left us with 682 total clusters encompassing all 16,948 input structures. Clusters were sorted and numbered by member number, largest to smallest. 142 clusters had 10 or more members, 251 clusters had 2–9 members, and 274 clusters had only one member, finding no significant structural or sequence similarity through either MMSeq2 or USalign. Within each group, a representative model was chosen for downstream steps by sorting by highest AlphaFold predicted score, pLDDT, and choosing the first structure within five residues of the average sequence length of the cluster. Finally, an all-to-all DALI41 alignment was performed among those 682 representatives to identify similar domains within multi-domain flagellar outer structures, as well as similar structures that rigid-body methods failed to align due to different predictions in domain arrangements. A similar strategy to the USalign clustering was used with the cutoff for edge inclusion set at Z-scores above 10. We used a stricter cutoff of 10 here, rather than the standard 8, to prevent grouping larger domains that share only limited similarities. The walktrap algorithm with a step size of 6 was again used for community detection. The graph was drawn using the Fructerman–Reingold algorithm83, a force-directed layout algorithm within the R/igraph package.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.