Introduction

Carotenoids are membrane-bound specialized metabolites found in all photosynthesizing organisms, such as plants, algae and cyanobacteria. They play vital roles in protecting producer cells against excess light energy and scavenging reactive oxygen species1,2,3. However, these molecules also perform functions that are unrelated to photosynthesis, such as being precursors to phytohormones4,5 or attracting pollinators and promoting seed dispersal by accumulating in flowers and fruits6. Moreover, carotenoids are found in non-photosynthetic organisms, for example in the pathogenic bacterium Staphylococcus aureus, where the carotenoid staphyloxanthin increases virulence by protecting the bacteria against reactive oxygen species and inactivating host neutrophils7. In leaf-dwelling non-photosynthetic bacteria, such as Clavibacter, Pseudomonas, and Methylobacteria, carotenoids protect bacterial cells and, in some cases, the plant host from UV radiation8,9. Animals require—but are generally unable to synthesize—carotenoids and consequently supplementation depends on food intake, with the notable exception of aphids, which have acquired the ability to produce carotenoids as fungal carotenoid biosynthetic genes have been taken up into their genome10. In humans, dietary intake of carotenoids is essential for vitamin A biosynthesis11 and is associated with a reduction in inflammatory markers12 and a reduced risk of prostate cancer and breast cancer13, although also opposite associations have been found such as for lung cancer14. Because mostly positive health benefits have been found, carotenoids are increasingly used as functional ingredients in food and feed, where they can also have additional technological benefits, for instance as natural pigments and antioxidants1.

The best-studied carotenoids are those found in photosynthetic organisms. These specialized metabolites are mostly C40-carotenoids, which have 40 carbon atoms within their backbone15. More recently, the full scope of carotenoid diversity and evolution has gained increasing attention15,16. Phylogenetic analyses revealed that carotenoid biosynthesis is an ancient process that evolved prior to photosynthesis, likely under increased UV radiation conditions16,17,18. According to Santana-Molina et al.18, the earliest evolved group of carotenoids might be composed of 30 carbon atoms, or triterpenoids. These specialized compounds are found in plants under various forms, such as triterpenoid saponins19 or oleanolic acids20, and they are often involved in the plant’s defense system. However, C30 carotenoid biosynthesis originated in prokaryotes16,18 and has been detected in the photosynthesizing bacteria Heliobacter21, and several non-photosynthetic bacteria such as Lactiplantibacillus plantarum22, Bacillus subtilis23, Enterococcus faecium (formerly Streptococcus faecium)24, Staphylococcus aureus25, and Methylobacterium rhodinum (formerly Pseudomonas rhodos)26. In addition to these phenotypic observations, Santana and colleagues18 showed that both 4,4′-diapophytoene (synthesis encoded by the crtM gene) as well as squalene (encoded by Sqs or the HpnCDE cluster) can be precursors to C30 carotenoids, and that these pathways are now found scattered across prokaryotes, ranging from Firmicutes to Planctomycetes to Archaea and found to be mobile via horizontal gene transfer.

While carotenoid biosynthesis originated in, and is now scattered across nonphotosynthesizing prokaryotes, the function and ecological role of these specialized metabolites in these organisms have not been systematically studied. Within the Firmicutes, Lactobacillaceae are an interesting family of nonphotosynthesizing bacteria, with great potential for use in food, feed, and pharma27,28. Two phenotypical screenings of Lactobacillaceae, focusing on isolates from fermented foods, identified Lp. plantarum as a C30 carotenoid producer via the 4,4′-diapophytoene pathway22,29, including its antioxidant capacity, which can mitigate oxidative stress in Lp. plantarum30. However, the genomic and ecological diversity of the biosynthetic and phenotypic potential of this family has not yet been investigated.

In this study, an in-depth pangenome and evolutionary analysis of the presence of terpenoid biosynthetic gene clusters was performed on a large dataset of 4203 unique genomes of the Lactobacillaceae family. These genomic analyses were complemented with phenotypic confirmation using a biobank of Lactobacillaceae from various habitats by high-throughput screening based on absorbance maxima and high-performance liquid chromatography (HPLC) and high-resolution quadrupole time-of-flight mass spectrometry (Q-TOF MS). In addition, the role of C30 carotenoids in UV stress resilience was assessed for selected strains. Finally, associations between the presence of carotenoid biosynthesis genes in a species and its lifestyle were established, using assigned lifestyles from literature31,32, and new isolation and metabolic analyses from our in-house library of strains.

Results

C30 carotenoid biosynthesis genes are dispersed across the Lactobacillaceae family and show frequent horizontal transfer

To explore the carotenoid biosynthetic gene cluster potential within the Lactobacillaceae family, the presence of all known enzymes involved in carotenoid biosynthesis was evaluated. For this purpose, 6297 publicly available genomes were screened. Due to the high similarity among genomes in this dataset, genomes were dereplicated based on pairwise ANI < 99.99%, resulting in 4179 unique genomes, from which a pangenome was constructed. Enzymes involved in the 4,4′-diapophytoene pathway for the production of C30 carotenoids were present; 4,4′-diapophytoene synthase (encoded by the crtM gene) and 4,4′-diapophytoene desaturase (crtN) (Fig. 1). The orthogroups to which each gene belonged were the same in the experimentally confirmed carotenoid-producers, Lp. plantarum WCFS122 and Lt. fragifolii AMBP162T (phenotype first observed during the description of the species33 and confirmed in this paper), validating the use of these orthogroups to detect crtMN genes across the Lactobacillaceae pangenome. In addition, analysis with antiSMASH confirmed that no other known clusters for carotenoid biosynthesis, such as the squalene pathway, were present in the Lactobacillaceae. The crtMN C30 carotenoid biosynthesis gene families showed to be scattered across 28 species from 14 genera out of a total of 361 species and 34 genera analyzed within the Lactobacillaceae family (i.e., 7.7% species and 41.2% genus prevalence) (Fig. 2). We selected 37 in-house strains (Supplementary Table S1) for genome sequencing to complement the public dataset based on the likelihood for presence of crtMN genes (29 strains) and underrepresented species (8 strains). After genomic dereplication, 24 remained (Supplementary Table S1) and were included in the subsequent analysis (4203 unique genomes). The detected crtMN genes clustered into two clades (Fig. 3). One clade included Lactiplantibacillus, Fructilactobacillus, Latilactobacillus and Companilactobacillus, while the other exhibited broader diversity, including genera such as Leuconostoc, Oenococcus, Holzapfelia and other taxa within Fructilactobacillus. Notably, the gene trees for crtM and crtN were highly similar, indicating that these genes were typically transferred or inherited together. The crtN tree was preferred for visualization because it was more accurate due to the greater length of the gene (i.e., 499 amino acids). Notably, the Fructilactobacillus genus was present in both clades, indicating two independent acquisition events of the crtMN genes in this genus and showing that this trait had been gained convergently in this genus. The crtMN phylogenetic trees did not align with the species tree based on the core genome (Fig. 2), suggesting that these genes were not only inherited through vertical transmission but also horizontally transferred across species and even genera. Clear examples of recent horizontal gene transfer (HGT) events were observed within clusters of multiple species that contain nearly identical crtMN genes (up to 100% similarity at the protein level), for example amongst Lp. plantarum, Levilactobacillus buchneri, Levilactobacillus brevis, Lactiplantibacillus pentosus, and Pediococcus pentosaceus, as well as amongst several Leuconostoc species. Of note, one cluster present in Apilactobacillus ozensis and Apilactobacillus xinyiensis seemed to contain two genes that were both annotated as crtN.

Fig. 1: Carotenoid biosynthesis via the 4,4′-diapophytoene pathway, which includes 4,4′-diapophytoene synthase (encoded by the crtM gene) and 4,4′-diapophytoene desaturase (encoded by the crtN gene).
figure 1

Theoretical mass of the final product 4,4′-diaponeurosporene is shown (made with ChemDraw and inkskape based on KEGG).

Fig. 2: Family-wide occurrence of 4,4′-diapophytoene synthase (crtM gene) and 4,4′-diapophytoene desaturase (crtN) in Lactobacillaceae.
figure 2

The tree was inferred with IQ-Tree via the LG + F + G4 method. An orange tip indicates that carotenoid biosynthesis genes were present in at least one genome in a species. The inner circle corresponds to the assigned lifestyle, as described in Zheng and Wittouck et al.33; the second circle corresponds to the percentage of genomes that contain crtMN genes compared to all tested genomes of that species (n is given at the branch tip). The outer circle corresponds to genome size. Species for which the phenotype was experimentally confirmed are indicated in bold. The genera were collapsed when no crtMN genes were detected.

Fig. 3: Phylogenetic tree of the 4,4′-diapophytoene desaturase (crtN) gene detected in Lactobacillaceae, which shows clustering into two clades.
figure 3

The tree was inferred with IQtree via the LG + F + G4 method. To reduce the number of branches, sequences were first clustered with cd-hit with a 95% similarity threshold. When a cluster contained multiple species, the species were clustered into multi-species horizontal gene transfer (HGT) groups, as shown in a table for clarity. The numbers indicate the number of strains that collapsed for each cluster. For each tip, the biosynthetic cluster is also shown as predicted by Bigscape using one representative.

Identification of C30 carotenoid-producing species in Lactobacillaceae biobank

Having shown the taxonomic spread of C30 carotenoid biosynthesis genes within Lactobacillaceae, we subsequently aimed to phenotypically substantiate the carotenoid biosynthesis capacity in strains containing crtMN genes. First, C30 carotenoid biosynthesis was confirmed in Latilactobacillus fragifolii AMBP162T, originally isolated from the phyllosphere of a strawberry plant33 and compared to that of the known C30 carotenoid producer Lp. plantarum WCFS122. The pellets of both strains appeared yellow, providing the first indication of carotenoid biosynthesis (Fig. 4A). The extract was purified using HPLC coupled inline to a diode-array detector (DAD) and high-resolution quadrupole time-of-flight mass spectrometry (Q-TOF MS). The highest peak in the UV chromatogram of the HPLC eluent (Fig. 4B) had absorption maxima at 467, 438, and 414 nm (Fig. 4C), similar to the peaks prior to purification (465, 435, and 409 nm) and corresponding to the absorption maxima of 4,4′-diaponeurosporene22. The peak also had a mass-to-charge ratio (m/z) of 403.3338 (Fig. 4D), corresponding to the formula C30H42, which is identical to that of 4,4′-diaponeurosporene.

Fig. 4: Pigment identification in Latilactobacillus fragifolii AMBP162 and Lactiplantibacillus plantarum WCFS1.
figure 4

Lt. graminis DSM20719T was used as a nonpigmented control. A Washed cell pellets (left) and unpurified extracts (right) of 2-day cultures. B UV chromatogram of the HPLC eluent of the extracted pigments. C UV‒visible absorption spectrum of the highest peak in the HPLC chromatogram; absorption maxima (nm) are indicated above the peaks. D High-resolution quadrupole time-of-flight mass spectrometry image of the highest peak in the HPLC eluent. The mass-to-charge ratios are indicated above the peaks.

Subsequently, a high-throughput method was developed to phenotypically test C30 carotenoids, using less solvent and excluding unnecessary purification of the extract (see Methods). Starting from 575 in-house isolates and five publicly sourced strains (Supplementary Table S1) based on taxonomy and isolation source, 45 in-house and 5 public Lactobacillaceae were tested phenotypically. This selection included 17 related noncarriers to assess the absence of phenotype. In total, the biosynthesis of 4,4′-diaponeurosporene-like carotenoids was substantiated in 27 strains belonging to the crtMN-harboring species Lt. fragifolii, Lp. plantarum, Leuconostoc citreum, Lc. pseudomesenteroides and Holzapfelia floricola, and absent in all tested noncarrier strains (17) (Fig. 3, Supplementary Table S1). For only six of the isolates tested, the observed phenotype did not match the genotype. Specifically, the two tested strains of the species Lactiplantibacillus mudanjiangensis (AMBF-0197 and AMBF-0209) and one strain of Lc. mesenteroides (LMG 6893) did not appear to express the phenotype under the tested conditions (Fig. 1C), whereas two Lp. plantarum strains (AMBP-0214 and AMBP-0424) appeared to harbor inactive 4,4′-diapophytoene desaturase caused by the deletion of 522 bp in the crtN gene. These latter strains were considered natural nonproducing mutants of carotenoid production in Lp. plantarum, a characteristic employed in subsequent physiological tests. Finally, the Apilactobacillus xinyiensis (AMBP-0461) containing the unusual cluster with the duplicated crtN did not appear to express the phenotype under the tested conditions.

C30 carotenoids are associated with UV stress resistance in Lp. plantarum strains

To assess whether carotenoid biosynthesis in Lactobacillaceae plays a role in resistance to UV stress, we compared seven carotenoid-producing Lp. plantarum strains to the natural nonproducing strains AMBP-0214 and AMBP-0424. A phylogenetic tree of the core genomes of all the strains tested confirmed that all the strains were closely related and that the non-producers were not an outgroup (Supplementary Fig. S1). A general linear model fit revealed that carotenoid-negative strains were significantly more susceptible (higher Δlog10) to UV-induced stress than carotenoid-positive strains for both 30 s (effect size = 0.5749, p = 0.003) and 40 s (effect size 1.1474, p = 0.01) of UV exposure (Fig. 5A). A Linear Mixed-Effects Model (LMER) was used to include strain-specific variation as a random effect. A likelihood-ratio test showed that including strain as a random effect improved the fit significantly for both 30 s (p = 0.0001) and 40 s (p = 0.002) of UV exposure. The interaction term between carotenoid production and experiment did not significantly improve the model and was therefore removed from the final model. Including carotenoid production significantly improved the fit for both timepoints of UV exposure (p = 0.04 for both). Lastly, when examining the effect of the experiment, it was shown that solely for 40 s of UV exposure, including the experiment leads to a significantly better fit (p = 0.002). To summarize, for 30 s of UV exposure, Δlog10 was significantly influenced by both strain and carotenoid production. For 40 s of UV exposure, Δlog10 was significantly influenced by strain, carotenoid production, and experiment.

Fig. 5: Functional and ecological roles of C30 carotenoid biosynthesis.
figure 5

A Survival rate of Lp. plantarum strains under ultraviolet (UV) stress. All the strains were tested in three separate trials. Seven producers and two natural knockout mutants that do not produce carotenoids were included (Supplementary Table S1). Mean and standard error of the mean are visualized per strain. Significance (p < 0.05) between conditions is shown with a “*”. B Percentage of species within a lifestyle (based on Zheng et al.33) harboring the crtMN gene as a core or accessory trait. The total number of species analyzed is given for each lifestyle. C The genome sizes of species in relation to their lifestyle and the presence of crtMN genes.

C30 carotenoid biosynthesis is associated with nomadic and insect-adapted lifestyles

We subsequently evaluated whether the C30 carotenoid biosynthesis genes were associated with a particular Lactobacillaceae lifestyle and habitat adaptation using the categories defined by Zheng et al., as free-living, vertebrate-adapted, invertebrate-adapted or nomadic32. Such lifestyle assignments were available for 170 out of 361 species in our public genome datasets. The presence of C30 carotenoid biosynthesis genes (crtMN) was most common in nomadic and insect-adapted lactobacilli, where they appeared as core (e.g., in nomadic Lp. plantarum, insect-adapted Fructilactobacillus lindneri and uncategorized Lc. citreum) and accessory traits. In contrast, crtMN genes were completely absent in vertebrate-associated lactobacilli such as Lactobacillus crispatus and Limosilactobacillus reuteri. They were also found in a few genomes representing free-living species such as Lv. buchneri (Fig. 5B). Consequently, a significant association between the proportion of genomes with C30 carotenoid biosynthesis and the lifestyle of a species (p < 0.001) was found when the Kruskal-Wallis test was used. Pairwise comparison with a Dunn test with Bonferroni correction for multiple testing (Supplementary Fig. S2A) revealed the strongest association with nomadic and insect-adapted lifestyles. When considering the phylogenetic background with phylogenetic generalized least squares (PGLS), the same trends were found, but only insect-adapted versus free living was near significant (p = 0.059) (Supplementary Fig. S2B).

In addition, nomadic and insect-associated crtMN-harboring strains differed in genome size, with the nomadic strains having large genomes, similar to other nomadic Lactobacillaceae, and the insect-adapted strains having the smallest genomes within the family. These findings point to positive selection of these genes after horizontal gain in species with nomadic and insect-adapted lifestyles (Fig. 5C). To further investigate the lifestyle association, we also screened the isolation sources of 575 in-house isolates and three publicly sourced isolates. We correlated this with the C30 carotenoid phenotype and genotype. This analysis pointed towards leaves and flowers carrying a high number of carotenoid producers belonging to the species Lt. fragifolii (n = 13), Lc. pseudomesenteroides (n = 1), Lc. citreum (n = 3), and H. floricola (n = 1) (Supplementary Fig. S3). A second major habitat of carotenoid producers showed to be plant-based fermentations, which harbored carotenoid-producing Lc. citreum (n = 2), Lp. plantarum (n = 6), and Lactiplantibacillus paraplantarum (n = 2). In contrast, in our laboratory, carotenoid-producing Lactobacillaceae strains were only sporadically isolated from vertebrate habitats (e.g., the human vagina34 or respiratory tract35) and were all nomadic Lp. plantarum strains.

Discussion

In this study, we investigated the biosynthesis of C30 carotenoids in the Lactobacillaceae family, the largest family of beneficial bacteria known to date, and linked the crtMN-positive genotype to their lifestyles. For this purpose, we used an integrated comparative genomic approach combined with a phenotypic screening of a diverse in-house Lactobacillaceae strain collection and an assessment of the functional role of carotenoids in UV resistance.

Pangenome analysis of Lactobacillaceae revealed that crtMN-mediated carotenoid biosynthesis was a rare and scattered trait within the family: it occurred in 41% of all the genera but only in a few species or strains within each genus. It was found to be a core property in some nomadic species, such as Lp. plantarum; insect-adapted species, such as Fl. lindneri; and accessory in many others. Phylogenetic analyses indicated that the biosynthetic crtMN genes are frequently transferred horizontally across species and genera, for example, from Lp. plantarum to Lv. brevis and within the Leuconostoc genus. This finding is consistent with the high mobility of carotenoid pathways observed at higher taxonomic levels16,17,18. Moreover, the trait appeared to have been gained convergently in Fructilactobacillus, with this genus having acquired crtMN genes from distinct donors during distinct events. Using a high-throughput extraction and analysis method, the crtMN genotype was matched with the synthesis of 4,4′-diaponeurosporene in five species. Functionally, we showed that Lp. plantarum strains that do not produce carotenoids were less resistant to UV stress, in line with the general knowledge on carotenoids16. Previous research has also demonstrated that 4,4′-diaponeurosporene biosynthesis protects against oxidative stress in Lp. plantarum30. Finally, the scattered distribution and mobility of this trait across the family, coupled with its advantages in UV stress, prompted us to systematically investigate the link between carotenoid biosynthesis and the lifestyle and ecology of Lactobacillaceae.

As we have done here for carotenoids, phylogeny needs to be considered when testing for associations between two features associated with a set of species. This was done in our study by applying a phylogenetic generalized least squares (PGLS) approach36. Such an approach effectively takes into account that closely related species will likely share similarity in any two traits (such as crtMN prevalence and lifestyle studied here) because of ”phylogenetic inertia”, not necessarily because the traits are correlated37. Since the lifestyles of Lactobacillaceae species are mostly conserved at genus level, this implies that the independent units of information are in general the genera rather than the species, resulting in a lower effective sample size and limitations in our dataset concerning statistical power. In addition, another limitation of our work is that a lifestyle assessment has not yet be attributed by Duar et al. and Zheng et al. to 191 of the 361 Lactobacillaceae species in our dataset31,32. This is due to lack of sufficient data on the isolation sources, metabolic potential, and related properties for these species31,32. For many of these species, only a single strain has been isolated from a single source. Repetitive isolation of species from the same environment, as well as substantiation with specific metabolic and experimental validation is required to attribute lifestyles, as the environment of isolation does not reflect the environment of adaptation (niche) for various reasons, such as random dispersal events and increasing anthropogenic effects on the biosphere. Such detailed information will have to be collected for these 191 Lactobacillaceae species to also be able to attribute a lifestyle in the future and further substantiate our analyses.

Despite these shortcomings in public data and taking into account the phylogeny, a near significant association was found between carotenoid biosynthesis genes and an insect-adapted lifestyle (p = 0.056). However, excluding phylogenetic relatedness, we found that crtMN genes were mostly absent in free-living species and completely absent in vertebrate-associated species, such as the L. crispatus, which is dominant in the human vagina34 and Lm. reuteri which typically colonizes the vertebrate gut31. The complete absence of carotenoids in well-studied vertebrate-associated Lactobacillaceae suggests that carotenoid production is not selected for in mucosal and low-oxygen habitats, such as the gut and vagina. In contrast, carotenoid biosynthesis genes were strongly associated with nomadic and insect-adapted Lactobacillaceae indicating that oxygen- and UV-rich environments encountered by nomadic and insect-adapted strains can select for this trait. The habitat-adaptation strategy of nomadic carotenoid producers appeared to differ from that of insect-adapted species. Nomadic Lactobacillaceae species typically have large genomes (between 2.4 and 3.6 Mbp), making them metabolically versatile and adaptable to various environments. In other studies, nomadic species have been found to typically occur in low numbers in oligotrophic environmental niches, such as plant surfaces28,38, and in high abundances once carbohydrates become more available, such as in vegetable fermentation products39. Such fermentations are characterized by intense microbial competition and high-salt concentration39, possibly leading to osmotic and oxidative stress. In such fermentations and outdoor oligotrophic environments, we speculate based on the data obtained here that C30 carotenoids could provide a fitness advantage to nomadic lactobacilli by reducing susceptibility to oxidative stress.

Insect-adapted lactobacilli have smaller genomes (1.2–2.2 Mbp) with a concomitant decrease in carbohydrate metabolic capacity31,32. This difference was also observed in our present study among crtMN carriers, as the insect-adapted crtMN carriers had remarkably small genome sizes (between 1.6 and 2.1 Mbp), but they still contained crtMN genes, indicating an evolutionary advantage. Interestingly, the insect-adapted crtMN carriers were all part of one clade within the Fructilactobacillus genus, a genus known to be transferred between pollinators via the environment, with flowers serving as key hubs31,40,41. Our data presented here, together with previous knowledge on C30 carotenoids30, indicate that the biosynthesis of these terpenes, and their associated protection against UV radiation and oxidative stress, could be a significant advantage for these environmentally dispersed, insect-adapted Lactobacillaceae species. Our hypothesis is in line with the adaptation strategy previously described for leaf-dwelling bacteria, such as Clavibacter and Pseudomonas, which produce C40 carotenoids and other pigments to increase their survival in this UV-stressed environment9. Notably, carotenoid biosynthesis was absent in Lactobacillaceae associated with social pollinators, such as Lactobacillus apis and Bombilactobacillus. These bacteria are vertically passed down to offspring within the hive42, where they are protected from UV- and oxidative stress. An environmental survival strategy does not seem required for these bacteria42. An exception to this is the Apilactobacillus genus, which is also known to be dispersed among solitary bees via flower43. This genus was shown in our study here to have an unusual putative terpenoid cluster, characterized by a duplicated crtN and an absent crtM gene. However, it remains to be substantiated whether this duplication is associated with a particular phenotype or pigment.

Our hypothesis that Lactobacillaceae bacteria that are dispersed to plants via insects have a competitive advantage expressing C30 carotenoids was supported by our extensive culture approach and collection. A diverse array of carotenoid-producing Lactobacillaceae were isolated from flowers and leaves, with a relative high prevalence found for Lc. citreum in flowers. This species has—to the best of our knowledge—not yet been assigned to a certain lifestyle but our data presented here add support to an insect or flower-adapted lifestyle. In contrast, Lactobacillaceae isolates from vertebrate habitats studied here (mainly from the human vagina and respiratory tract) showed to be predominantly non-producers. Among the producing strains isolated, the nomadic Lp. plantarum was the most predominant species.

In addition to the ecological role, the presence of crtMN genes in Lactobacillaceae is of interest from an applied perspective, especially considering the beneficial properties and lack of virulence factors in this family of bacteria. For example, incorporating carotenoid-producing bacteria into food fermentations could add additional functional properties to these foods. In fact, 4,4′-diaponeurosporene is already present in many vegetable fermentations, as Lp. plantarum, a core producer, dominates the later stages of most typical vegetable fermentations, and Leuconostoc species generally dominate in the early stages39. Although the added benefits of 4,4′-diaponeurosporene in these fermented food ecosystems have not yet been studied, this metabolite has been connected to health-promoting effects via immune modulation. For instance, the introduction of crtMN genes originating from Staphylococcus aureus into Bacillus subtilis has been shown to reduce colitis in mice44 and increase resistance to Salmonella typhimurium infection45. Furthermore, in piglets, heterologously crtMN expressing-B. subtilis bacteria have been shown to improve the mucosal immune system of the gut46 and the respiratory tract47. These studies were carried out with genetically modified bacteria and are thus unlikely to reach large market applications, especially in Europe. In contrast, our results presented here indicate that natural carotenoid-producing Lactobacillaceae constitute an interesting alternative. Moreover, microbial biosynthesis can offer advantages over traditional production methods, as it can be safer and less reliant on fossil resources than chemical synthesis and less influenced by seasonality or climate than plant-based biosynthesis48 and applied to C30 carotenoids49.

In summary, this study on the ecology and evolution of carotenoid biosynthesis in the Lactobacillaceae family revealed a scattered distribution of crtMN-mediated C30 carotenoid 4,4′-diaponeurosporene biosynthesis across 28 species and 14 genera and highlighted the mobility of this trait. C30 carotenoid biosynthesis appears to have emerged as a core property in several species, notably Lp. plantarum, Lc. citreum, and Fl. lindneri. Furthermore, carotenoid biosynthesis was strongly associated with nomadic and insect-adapted lifestyles, where it offers an advantage via protection from UV stress.

Methods

Pangenome and phylogenetic analysis of the terpenoid biosynthetic gene clusters

All publicly available Lactobacillaceae genomes were downloaded from the Genome Taxonomy Database (GTDB, gtdb.ecogenomic.org, version r214). With checkM, any incomplete (<90%) or contaminated (>5%) genomes were excluded50. This resulted in a dataset of 6259 public genomes. To avoid pseudoreplication, the sample function of SCARAP (github.com/SWittouck/SCARAP) was used with an average nucleotide identity (ANI) cutoff of 99.99%, retaining only genomes with ANI values below this cutoff. Second, the pan genomes of this family were inferred using the SCARAP tool51. As a start, orthogroups of detected genes, i.e., 4,4′-diapophytoene desaturase (encoded by the crtN gene) and of 4,4′-diapophytoene synthase (encoded by the crtM gene) were determined (Table 1)32. For reference, the genes from two experimentally confirmed species, Lp. plantarum WCFS122 and Latilactobacillus fragifolii AMBP162 (experimentally confirmed in this study), were used as queries. Based on the results of this pangenome analysis, 37 in-house isolates were selected for whole genome sequencing, and were included in the subsequent analysis. Finally, all genomes were checked for other terpene biosynthesis genes using Antismash 6.052 focusing on terpenes and terpene pathways.

Table 1 References to experimentally confirmed crtMN genes used to identify similar sequences in the pangenome

The core tree of all Lactobacillaceae was constructed following the method described in Eilers et al.53. The core genome was inferred using SCARAP, which consists of 296 core genes. Protein sequences were extracted, aligned with MAFFT and trimmed with trimAL with a gap threshold of 10%. Aligned and trimmed core proteins were subsequently inferred with IQ-TREE using LG + F + G4.

First, the prevalence of crtMN genes within all species of the family Lactobacillaceae was calculated and integrated with lifestyle data from Zheng et al.32, and metadata from the GTDB using tidygenomes (github.com/SWittouck/tidygenomes) based on ggtree packages in R. Genera were collapsed when no species of this genus contained the crtMN genes.

Second, to examine the phylogeny of the crtM and crtN within the Lactobacillaceae family, a gene tree was inferred at the amino acid level using a similar procedure to the species tree. Since the crtM and crtN tree were highly similar, the crtN tree was used as a model due to its larger size and thus resolution. To reduce the number of branches, sequences were first clustered with cd-hit54 with a 95% similarity threshold. In instances where sequences from different species were present in the same CD-hit cluster, they are shown in the adjacent table. The biosynthetic gene clusters obtained from antiSMASH were visualized using Bigscape55 based on dereplicated genomes, and the clusters were mapped onto the crtN tree.

Strains used in this study

Phenotypic characterization started from the crtMN carriers within 575 in-house Lactobacillaceae isolates and three publicly available strains (Supplementary Table S1). These strains were previously isolated from vegetable fermentations54; the human vagina34; the human respiratory tract35, the phyllosphere33; anthosphere56; and liquid compost fermentations57. Forty-eight isolates, taxonomically related to carotenoid producers based on the pangenome analysis, were phenotypically screened. To complement the publicly available data, the genomes of 25 crtMN and 17 non-crtMN containing in-house isolates were sequenced and included in the pangenome analysis (Supplementary Table S1, study number PRJEB57255). UV stress assays were performed using nine Lp. plantarum strains, as indicated in Supplementary Table S1.

C30 carotenoid extraction and identification

Lipophilic compounds were extracted, and their absorption spectra were measured based on the methods of Garrido-Fernández and colleagues22. In brief, cells were harvested from a 50 ml overnight culture in Weissella Medium Broth (WMB) by centrifugation for 15 min at 2000 × g. The cells were subsequently washed with 50 ml of sterile distilled water. Afterward, 10 ml of N,N-dimethylformamide was added to the washed cells, which were incubated for 15 min at 65 °C. The cell debris was separated by centrifugation at 3000 × g for 10 min, after which the supernatant was transferred to a separator funnel. The extraction of the remaining cell debris with N,N-dimethylformamide was repeated four times. All the extracts were pooled and mixed with 100 ml of diethyl ether, and 10% NaCl was added to aid in the separation of the liquid phases. The organic phase was dried with anhydrous Na2SO4, followed by solvent evaporation in a rotary evaporator. The resulting residue was dissolved in 2 ml of methanol/tert-butyl methyl ether (1:1 v:v) containing 1% BHT. The absorption spectrum was measured between 550 nm and 300 nm using a spectrophotometer to determine the characteristic absorption maxima. Carotenoids were further purified and identified using high-performance liquid chromatography (HPLC) coupled inline to a diode array detector (DAD) and high-resolution quadrupole time-of-flight mass spectrometry (Q-TOF MS), a procedure conducted by the RIC group. For high-throughput identification, a similar procedure was used with different extraction solvents and without further purification. The washed cells were extracted with 2 ml of molecular biology grade ethanol, vortexed thoroughly and incubated for 20 min at 65 °C. The cell debris was separated by centrifugation at 8603 × g for 5 min, after which the supernatant was transferred to a 15 ml tube. The ethanol extraction step was repeated once on the remaining cell debris. Both extracts were pooled, and 2 ml of heptane was added and mixed vigorously for 1 min. Next, 2 ml of distilled water was added and mixed. The hydrophilic and lipophilic phases were then separated by centrifuging the tubes for 5 min at 4000 × g. The top layer (lipophilic phase) was cautiously transferred to a clean 2 ml tube and dried with anhydrous Na2SO4. Finally, the absorption peaks between 300 and 550 nm were measured using a spectrophotometer to detect characteristic absorption peaks.

UV stress resistance assays

Cells were harvested by centrifugation (10 min at 1500 × g) from a 10 mL two-day culture, grown in MRS medium at 28 °C with shaking, washed with sterile phosphate-buffered saline (PBS) and resuspended to an optical density (OD) of 0.16 at 600 nm. Each suspension (500 µl) was dispensed together with 8 mL of sterile PBS into small Petri plates and placed on an orbital shaker with a gentle swirling motion inside a laminar air flow cabinet, after which the lids were removed. The swirling Petri plates were exposed to UV treatment, and samples were collected at three timepoints: before UV exposure, after 30 s, and 40 s of UV exposure. The number of colony-forming units (CFUs) was determined for all the samples through serial dilution and plating out in triplicate on MRS agar. The entire assay was repeated three times.

The rate of decrease in CFU counts, indicating susceptibility to UV stress, was determined by transforming the CFU counts to a logarithmic scale and calculating the log10 reduction (Δlog10) at 30 and 40 s of UV irradiation compared to the initial CFU counts. A general linear model was used to evaluate the effect of carotenoid production and experimental variation on the Δlog10. To account for strain-specific variation, a Linear Mixed-Effects Model (LMER) was used, where strain was included as a random intercept. The significance of carotenoid production, experiments, and strain effects was assessed using log-likelihood-ratio tests.

$$\begin{array}{c}\varDelta lo{g}_{10} \sim {^{\prime\prime}} carotenoid\,production{^{\prime\prime}} +{^{\prime\prime}} experiment{^{\prime\prime}} \\ \begin{array}{c}\,+{^{\prime\prime}} carotenoid\,production{^{\prime\prime}} :{^{\prime\prime}} experiment{^{\prime\prime}} \, (fixed\,effects\,including\,interaction\,term)\\ +(1|{^{\prime\prime}} strain{^{\prime\prime}} )(random\,effects)\end{array}\end{array}$$

The log-likelihood test was used to compare the full model with simpler models, where individual terms were removed. If a more complex model does not fit significantly better, a term is removed. If it is significantly better, then the term has a significant effect on the model and is kept.

Association of C30 carotenoid biosynthesis capacity and habitat

To further associate the carotenoids and habitats, multiple statistical analyses were performed on the dereplicated dataset in R with the packages dunn.test, caper, nlme, and geiger to assess the correlation between crtMN prevalence and lifestyle and genome size, taking phylogenetic effects into account. A gene was considered a core trait of a species when more than 95% of the genomes within a species had this trait and an accessory trait when the prevalence was between 5% and 95%.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.