Abstract
Metatranscriptomics methods for the skin are hampered by low microbial biomass, contamination with host cells and low RNA stability. In this study, we developed a robust, clinically tractable skin metatranscriptomics workflow that provides high technical reproducibility of profiles, uniform coverage across gene bodies and strong enrichment of microbial mRNAs. Paired application of this protocol with metagenomics to five skin sites in a cohort of 27 healthy adults identifies a notable divergence between transcriptomic and genomic abundances. Specifically, Staphylococcus species and the fungi Malassezia had an outsized contribution to metatranscriptomes at most sites, despite their modest representation in metagenomes. Species-level analysis shows signatures of microbial adaptation to their niches. Gene-level analysis identifies diverse antimicrobial genes transcribed by skin commensals in situ, including several uncharacterized bacteriocins. Correlation of microbial gene expression with organismal abundances uncovers more than 20 genes that putatively mediate interactions between microbes. This work highlights how skin metatranscriptomics identifies active species and microbial functions in situ.
Similar content being viewed by others
Main
The human skin is home to diverse interacting microbes (bacteria, fungi and viruses) that impact host microenvironments, immune homeostasis and health1. Understanding of the skin microbiome’s role in disease has greatly benefited from growing accessibility of whole-metagenome sequencing to identify key organisms and associated differences in genetic potential2,3,4. This is complemented by ex vivo5,6 and in vitro7,8 studies revealing mechanisms of host–microbiome crosstalk. However, understanding of in vivo gene/pathway utilization remains limited, as metagenomics only estimates functional potential9 (DNA content). Metagenomic signals are composed of living and dead cells, with genes being variably expressed or silent in living microbes responding to environmental cues10,11,12.
Metatranscriptomics, which assays mRNAs, has been used to study gut13 and ocean microbiomes14 but is not commonly applied to skin. Besides genome content, variations in gene expression and host–microbe interactions can explain why skin microbes such as Staphylococcus aureus and Cutibacterium acnes are harmless colonizers in some but exacerbate disease in others15,16. For example, skin itch can be triggered by an S. aureus V8 protease (sspA), which cleaves the PAR1 receptor on sensory neurons6. Increased sspA transcript levels were associated with itch and were significantly higher in atopic dermatitis lesions compared to healthy skin6, emphasizing its potential as a biomarker or therapeutic target for symptom relief. Similarly, transcriptional downregulation of the vitamin B12 biosynthetic pathway in C. acnes was associated with increased inflammation in acne17. Molecular mechanisms for host–microbe interactions on skin have primarily been studied through in vitro experiments, animal models or transcriptomics of individual species. The development of skin metatranscriptomics, enabling simultaneous profiling of gene expression from the microbiome and host, can accelerate hypothesis generation about the role of the microbiome in skin health, help discover microbial triggers expressed before and during skin diseases and leverage these insights for clinical management and product discovery.
Skin metatranscriptomics is hampered by lack of a robust, non-invasive protocol that accommodates a range of sites with low microbial biomass but substantial host and environmental contamination. Human skin is relatively sparsely colonized by microbes, with an estimated average density of 103–104 prokaryotes per cm2, several orders of magnitude lower than the gut18. To date, only one study has leveraged RNA sequencing (RNA-seq) to describe microbial contribution in vivo to acne, albeit restricted to a single species (C. acnes) in nose follicles using pore stripping17. Although biopsies can capture more microbial biomass19, their invasiveness is impractical for large-scale clinical studies. Consequently, data from transcriptionally active microbes and genes/pathways across sites are scarce, highlighting the need for a widely applicable skin metatranscriptomic workflow20.
To address these challenges, we developed a robust workflow for non-invasive sampling and skin metatranscriptomics across body sites with high reproducibility (Pearsonʼs r > 0.95), uniform coverage across bacterial and fungal genes and strong enrichment of microbial mRNAs (2.5–40×) relative to total RNA. We further developed a data analysis workflow with rigorous control of ‘kitome’ and taxonomic misclassification artifacts. The workflow is customized for skin metatranscriptomics using a skin microbial gene catalog based on metagenome-assembled genomes (MAGs) for diverse populations21. Leveraging this capability, we present a multisite metatranscriptomic survey of healthy human skin (n = 27 individuals) from physiologically diverse sites (scalp, cheek, volar forearm, antecubital fossae and toe web). Analysis of paired metagenomes and metatranscriptomes (n = 102 paired libraries passing quality control) revealed a marked divergence between transcriptomic and genomic abundances, with Staphylococcus and Malassezia species having outsized contribution to metatranscriptomes at most sites, despite their limited metagenomic representation. Species-level analysis showed site-specific enrichment of gene expression (for example, increased secreted fungal phospholipase C level on cheeks versus scalp) and revealed transcriptional activity in vivo of key pathways (for example, propionate and 4-aminobutyrate metabolism, potentially impacting skin barrier function). Skin commensals transcribed diverse antimicrobial genes in situ, including several uncharacterized bacteriocins expressed at levels similar to known antimicrobial genes. Correlating microbial gene expression with organismal abundances uncovered more than 20 genes that putatively mediate interactions between microbes (for example, a secreted Malassezia restricta protein with strongly negative in vivo association with C. acnes). Overall, this work highlights the importance of metatranscriptomics for a holistic view of active species, expressed microbial functions, key pathways and microbe–microbe interactions occurring in situ on skin.
Results
Development of a robust skin metatranscriptomics workflow
We optimized an experimental workflow that is robust for metatranscriptomics across different sites by systematically testing different sampling tools, lysis conditions and RNA purification techniques (Supplementary Data 1 and Methods). Our final optimized protocol used skin swabs, preservation in DNA/RNA Shield, bead beating, rRNA depletion using custom oligonucleotides and direct-to-column TRIzol purification (Methods). We then performed metatranscriptomics of defined mock communities in replicates, obtaining highly consistent species and gene profiles (median correlation coefficient > 0.98; Extended Data Fig. 1a). Protocol robustness was further assessed with a pilot cohort consisting of a cross-sectional group (n = 25 samples) and a longitudinal group sampled across three consecutive days (n = 45 samples), representing five distinct microenvironments: scalp (Sc), cheek (Ch), volar forearm (Vf), antecubital fossae (Ac) and toe web (Tw). A high proportion of libraries were successfully sequenced (63/70 = 90%), enabling generation of a target million microbial reads per sample in most samples (>84%, median = 2.2 million microbial reads, 0.66 Gbp; Supplementary Data 1 and Supplementary Table 1). Oligonucleotide-based rRNA depletion resulted in substantial enrichment (2.5–40×) of non-ribosomal RNA (non-rRNA) reads relative to undepleted controls (median > 79.5% of reads; Fig. 1a). After removing human and contaminant signals, metatranscriptomes were highly reproducible for all sites in microbial species (Sorensen similarity ≥ 0.98; Fig. 1b) and gene-level (Pearson’s r ≥ 0.99; Fig. 1c) profiles. Longitudinal analysis highlighted substantial gene-level temporal stability (median Pearson’s r ≥ 0.897 within individuals; Fig. 1c), with slightly higher species-level variability (median Sorensen similarity ≥ 0.768; Fig. 1b). Temporal gene/species-level variation was significantly lower than interindividual variation at the same site (Wilcoxon P < 10−10; Fig. 1b,c).
a, Box plot showing the fraction of non-rRNA reads with (n = 63) and without (n = 4) oligonucleotide-based microbial rRNA depletion during library preparation for the pilot cohort. A two-sided Wilcoxon rank-sum test was used. b, Box plot of Sorensen similarity indices (1 – Bray–Curtis dissimilarity) computed from species-level relative abundances for the pilot cohort (n = 42 longitudinal samples passing quality control and n = 3 distinct sets of technical triplicates), after removal of human and contaminant reads. Pairwise similarities were computed between samples in three different categories. ‘Technical replicates’ refers to different RNA-seq libraries prepared and sequenced from the same samples (n = 9 comparisons). ‘Within individuals’ refers to data from resampling the same individuals across three consecutive days (n = 40 comparisons), whereas ‘between individuals’ represents interindividual variability in our cohort (n = 117 comparisons). A two-sided Wilcoxon rank-sum test was used, and FDR-adjusted exact P values are displayed. c, Same as b but showing Pearsonʼs correlation of gene expression signatures in the pilot cohort (n = 42 longitudinal samples passing quality control and n = 3 distinct sets of technical triplicates). A two-sided Wilcoxon rank-sum test was used, and FDR-adjusted exact P values are displayed. d, Schematic of the full cohort comprising 27 healthy adult volunteers and five different skin sites and the data analysis workflow. Created with BioRender.com. Box plot showing the number of metatranscriptomic reads before computational removal of host and rRNA reads (red), the number of non-human, non-rRNA reads after computational filtering (green) and the number of microbial reads (comprising bacteria, viruses, fungi and archaea) after deduplication (blue). e, Box plot showing the fraction of non-rRNA reads with experimental rRNA depletion during library preparation for the full cohort (n = 102 metatranscriptomes). f, Coverage plots showing the distribution of reads over bacterial gene bodies in samples belonging to the full cohort. Reads were mapped to bacterial pangenomes, and samples with either >50,000 or ≤50,000 mapped bacterial reads are colored differently. g, Coverage plots showing the distribution of reads over fungal gene bodies for libraries (full cohort) with ≥500,000 fungal reads. h, Percentage of reads mapped to different regions (intergenic, intronic and exons) of the M. globosa genome. ‘This study’ refers to samples from the full cohort (n = 33 samples with ≥500,000 M. globosa reads). ‘Wu et al. 2015’ refers to an external RNA-seq dataset (n = 5 samples) from cultured M. globosa isolates29. For all box plots, the bolded center line denotes the median value, and the lower and upper bounds of the boxes denote the first and third quartiles, respectively. The upper and lower whiskers extend up to 1.5× the interquartile range in their corresponding directions. Data beyond the whiskers are outlier points that are individually plotted. CDS, coding sequence.
We next developed a computational workflow to annotate skin metatranscriptomic reads with high sensitivity while accounting for contaminants and off-target matches. Using a skin-specific microbial gene catalog21 (integrated Human Skin Microbial Gene Catalog (iHSMGC)) and a custom workflow resulted in a significantly higher median percentage of functionally annotated reads (81% versus 60% for general-purpose workflow HUMAnN3 (ref. 22), Wilcoxon P < 3.1 × 10−5; Extended Data Fig. 1b). Data from negative handling controls and prior reports23 were used to identify and filter potential contaminant taxa24 (Supplementary Fig. 1a and Methods). Handling controls reflect contaminant signals from swabs, extraction kits and sample processing. Skin microbes and potential kitome taxa (for example, Achromobacter, Bradyrhizobium, Mycolibacterium, Mycobacterium and Brevundimonas species) formed distinct clusters of co-occurring taxa (Supplementary Fig. 1b). To account for taxonomic classification errors, particularly in low-complexity or misassembled regions of microbial genomes25, a measure of unique genome matches relative to total read count25,26 was used to identify false positives (Methods). Based on spike-ins, an empirically determined threshold of unique minimizers per million microbial reads discriminated false-positive from true-positive taxa at relative abundances as low as 0.1% over a range of read counts (104–106 reads; Supplementary Fig. 1c and Methods) and was applied as a filter.
The combined skin metatranscriptomics workflow was applied to the full cohort (n = 27 healthy individuals, five sites) to provide comprehensive in vivo characterization of skin microbial gene expression (n = 135 samples for metagenomics and metatranscriptomics; Fig. 1d, Supplementary Data 1 and Supplementary Table 1). A median of 2.08% of reads aligned to human transcriptomes and were removed, leaving approximately 98% of metatranscriptomic reads as non-human (Fig. 1d). In the full cohort, the success rate for metatranscriptomic (102/135 = 75%) and metagenomic (130/135 = 96%) libraries was moderate to high, emphasizing protocol robustness across individuals. After laboratory and bioinformatics quality control, 102 paired metagenomic and metatranscriptomic libraries were analyzed as the full cohort (Supplementary Table 1). Typically, more than 1 Gbp of deduplicated non-rRNA sequencing data (median = 3.7 × 106 read pairs) was generated per library, similar to other metatranscriptomic studies13,27, with a relatively high median RNA quality across sites (DV200 ≥ 76; Fig. 1d and Extended Data Fig. 1c). Paired metagenomes were sequenced to sufficient depth, with a median of 7 million microbial read pairs after filtering for human reads and before removal of environmental contaminants (Extended Data Fig. 1d). Rarefaction analysis confirmed that libraries were typically sequenced at sufficient depths for representing active microbial functions/orthologous groups (>1 million read pairs; Supplementary Fig. 2). The proportion of non-human reads was significantly higher in metatranscriptomes versus metagenomes (98% versus 10%, Wilcoxon signed-rank P < 0.05; Extended Data Fig. 1e), underscoring the feasibility of skin metatranscriptomic sequencing. Microbial rRNAs were effectively depleted (2–25% of remaining RNA compared to 80–90% in a typical cell28) during library preparation (Fig. 1e). Metatranscriptomic reads exhibited even coverage across bacterial (Fig. 1f) and fungal (Fig. 1g) genes for different sites and read depths. Read annotation rates were again moderate to high (median = 69–80%; Extended Data Fig. 1f). Most (>80%) reads mapping to the genome of the common skin fungi Malassezia globosa were in mRNA coding regions, with a minority mapping to intergenic regions and introns, indicating that our metatranscriptome libraries achieved DNA depletion levels similar to other high-quality M. globosa RNA-seq datasets derived from in vitro cultures29 (Fig. 1h). Overall, these results emphasize that the proposed skin metatranscriptomics protocol enables robust, reproducible microbial mRNA profiling across diverse body sites and individuals.
Skin metatranscriptomics identifies niche-specific active species and functions distinct from metagenomes
Different skin microenvironments greatly influence microbial composition4, but much remains unknown about species or gene activities in vivo. Although some studies have reported discordance between RNA and DNA abundances for species and gene families in gut and environmental microbiomes14,27, the scale and extent to which gene abundances and transcript levels correlate in the skin microbiome is unknown. We observed a marked disparity between the most active species in skin metatranscriptomes versus the most highly abundant species in skin metagenomes (Fig. 2a). For example, C. acnes dominates the metagenomes at most sites (46–90% median relative abundance; except toe webs) but has relatively modest contribution to metatranscriptomes (2–31%). By contrast, although the skin fungi M. restricta and M. globosa were present at low metagenomic abundances relative to bacteria (3–8% and 0.1–12%), they contribute substantially to metatranscriptomes in a niche-dependent manner. M. restricta RNAs are heavily represented in cheek and scalp sites (23–30%), whereas M. globosa RNAs predominate on scalp, antecubital fossae and volar forearms (21–81%). Toe webs were the exception, with distinct microbial compositions dominated by Staphylococcus hominis and Staphylococcus epidermidis in metagenomes and metatranscriptomes (Fig. 2a). Although normalized RNA counts of many skin microbes varied positively with genomic DNA counts, several species had large differences (≥4-fold) in RNA and DNA abundances, with disproportionately higher contributions to transcriptional activity (Extended Data Fig. 2). The outsized contribution of M. restricta and M. globosa to the active biomass is likely driven by larger cell volume of eukaryotes versus prokaryotes30 and less so by genome size, where, despite having larger genomes (>2–3× than skin bacteria), their contribution to the metagenome is 10-fold lower than bacteria. Staphylococcus and Corynebacterium species also contributed proportionally more RNAs at the scalp, cheek, volar forearm and antecubital fossa after accounting for metagenomic abundances (Extended Data Fig. 2). The opposite was true for C. acnes and Micrococcus luteus, likely reflecting lower proportion of transcriptionally active cells. A small percentage (median 0.01–0.07%) of metagenomic and metatranscriptomic microbial reads were classified as viral. The most prevalent and abundant belong to the Pahexavirus genus, which includes diverse species of Propionibacterium phages (Supplementary Note 1). Overall, the stark differences between DNA and RNA abundances for skin microbes underlines the need for skin metatranscriptomic measurements to characterize in vivo activity.
a, Bar plots showing median relative abundances (total sum scaled) of reads assigned to various skin commensals across different skin sites presented in log2 scale. Abundances for metagenomes and metatranscriptomes are shown. b, Bubble plot showing core, common and variable components of skin metagenomes (DNA) and metatranscriptomes (RNA) across different body sites. A species was called present in a metagenome if it had ≥0.1% relative abundance. A species was called present in a metatranscriptome if it had ≥0.1% relative abundance and was also detected in the metagenome. Core species in metagenomes and metatranscriptomes were defined as those present across more than 75% of samples at a given skin site. Common species for a skin site were defined as those present in between 50% and 75% of samples. Variable species for a given skin site refer to other species that do not fall into the previous two categories but that are present in three or more individuals. For RNA, bubbles are also shaded according to median transcriptional activity, defined as the ratio of normalized RNA counts of a species to that of their DNA. For visual clarity, only species that were core in at least one skin site are represented here. c, Scatter plot of mean beta diversity (Bray–Curtis dissimilarity) against mean alpha diversity (Simpson index) of core microbial pathways. Core microbial pathway expression was computed using HUMAnN3. Core microbial pathways were defined as those that were present (non-zero expression) at a skin site in more than 75% of individuals and with less than 25% unclassified reads at species level. d, Stacked bar plots for species-level pathway contributions at the RNA level, estimated with HUMAnN3 for staphyloferrin A biosynthesis and with Kraken 2 for community-level relative abundances at the RNA level. For all subfigures, Sc, Ch, Ac, Vf and Tw indicate the skin sites scalp, cheek, antecubital fossa, volar forearm and toe web, respectively. Alpha or beta diversity scores >0.5 and ≤0.5 were considered ‘high’ and ‘low’ diversity, respectively.
To profile activity across different niches, we categorized skin microbial species based on presence in samples as ‘core’ (>75% of samples), ‘common’ (50–75%) and ‘variable’ (<50%) components of metagenomes and metatranscriptomes at each site. The median ratio of RNA/DNA levels per species was computed by summing species-level transcripts per million (TPM) values for RNA and dividing them by species-level counts per million (CPM) values for DNA (Methods). This quantifies transcriptional activity while accounting for variations in genomic abundances and sequencing depth (Fig. 2b). Corynebacterium kefirresidentii, C. acnes, M. globosa, M. restricta, Staphyloccocus capitis and S. epidermidis were core members of metagenomes across all or most surveyed sites (Fig. 2b). Fewer species were core in metatranscriptomes across sites (for example, C. acnes, M. globosa and M. restricta), with Staphylococcus species exhibiting site-specific activity (for example, S. epidermidis on cheek, S. hominis on volar forearm and S. capitis on scalp with median activity >17, >9.5 and >3.5, respectively) and variable prevalence across sites (for example, in antecubital fossae) (Fig. 2b).
The same species also exhibited differences in overall activity associated with different skin microenvironments. Microbes prevalent across multiple sites, such as C. acnes, S. capitis and S. epidermidis, showed higher transcriptional activity in sebaceous (scalp and cheek) versus non-sebaceous (antecubital fossae, volar forearm and toe web) sites, suggesting a general preference for environments rich in lipids31,32 (Extended Data Fig. 3). Although M. globosa and M. restricta are closely related lipophilic fungi, the latter showed larger variation in transcriptional activity between sebaceous and non-sebaceous sites, indicating that M. restricta has greater sensitivity to lipid availability (Fig. 2b and Extended Data Fig. 4). This is consistent with reports demonstrating that M. globosa is isolated from facial skin, the upper trunk and arms, whereas M. restricta recovery is mostly restricted to scalp and face33. Within sites, individual heterogeneity is also evident in species that dominate in metatranscriptomes. For instance, although Malassezia spp. transcripts dominate metatranscriptomes in antecubital fossae (median relative abundance 98%), two individuals exhibited metatranscriptomes dominated by S. hominis and C. acnes, respectively (SMT003_Ac and SMT011_Ac; Supplementary Fig. 3). Hence, the distribution of active microbes is shaped by both site-specific factors and individual-specific variations in skin microenvironment.
To determine important functions for microbiomes in distinct niches, differential expression analysis was performed for orthologous groups of genes, accounting for DNA abundance variations (Methods). Compared to sebum-rich cheek, bacteria colonizing drier volar forearms upregulated orthologous groups important for glucose catabolism, energy generation and protein synthesis, reflecting differences in resource availability (Extended Data Fig. 5a and Supplementary Fig. 4a). Malassezia fungi colonizing cheeks upregulated orthologous groups involved in mitotic growth, aromatic compound biosynthesis and protein modification relative to volar forearms, consistent with increased lipid availability on the cheek for fungal growth and metabolism30,34 (Extended Data Fig. 5a and Supplementary Fig. 4a). Orthologous groups involved in citric acid cycle, amino acid metabolism, heme biosynthesis, RNA modification and kinase activities were upregulated in toe webs versus volar forearms, indicative of adaptation to amino-acid-rich environments (Extended Data Fig. 5b). Amino acid metabolic pathways are noteworthy because sweat is a rich source of free amino acids35, and toe webs are high in sweat content relative to volar forearms. Heme biosynthesis, important for Staphylococcal colonization, was associated with increased amino acid availability36,37. Consequently, most enzymes in the heme biosynthetic pathway from glutamate were upregulated ≥2-fold in toe webs versus volar forearms (Supplementary Fig. 4b,c). Consistent with the exposed nature of volar forearms and lower concentrations of protective secretions compared to cheeks38, microbes on volar forearms upregulated genes important for antioxidant protection relative to cheeks (Extended Data Fig. 5a) and toe webs (Extended Data Fig. 5b). These examples highlight the utility of metatranscriptomics for identifying key niche-specific functions in the skin microbiome.
We evaluated the contribution of different species to core (present in >75% of individuals) biochemical pathways across sites as well as within-sample (alpha) and between-sample (beta) diversities per pathway9 (Methods). At all sites except toe webs, Malassezia were predominant (>50% contribution) effectors of a substantial fraction (median 56%) of core pathways compared to bacteria (Wilcoxon P < 0.05; Extended Data Fig. 6a). Staphylococcus was the major contributor to more core pathways than Cutibacterium at moist environments, such as antecubital fossae and toe webs (Wilcoxon P < 0.05; Extended Data Fig. 6a). At any site, some pathways were expressed by multiple microbes, whereas others were driven by one or a few species. Most core pathways were expressed by a few species within an individual but by diverse species across individuals, indicating high functional plasticity in skin microbiomes (Fig. 2c). For example, staphyloferrin A biosynthesis from L-ornithine is usually expressed by 1–2 Staphylococcus species in individual toe web communities, but multiple Staphylococcus species can drive expression across individuals (Fig. 2d). Some functions were solely represented by fungi, such as pathways for beta-oxidation of very long chain fatty acids being expressed by Malassezia on both sebaceous (scalp) and non-sebaceous (antecubital fossae) sites (Extended Data Fig. 6b). Other functions, such as galactose degradation or arginine biosynthesis, were shared between bacteria and fungi, especially in nutrient-rich niches such as the scalp or cheeks (Extended Data Fig. 6c,d). Although bacterial arginine is a natural moisturizing factor for skin health39, our dataset indicates that fungi may be substantial contributors to the arginine pool. Overall, these results showcase how skin metatranscriptomics is complementary to metagenomics and demonstrates how different genera/species act as major effectors of niche-specific metabolic functions.
Species-level analysis identifies signatures of adaptation to in vivo nutrient availability
Given niche-specific metatranscriptomic signatures, we conducted species-level differential gene expression analyses to uncover pathways essential for supporting in vivo colonization. We focused on M. restricta, M. globosa, C. acnes and S. epidermidis because of their abundant transcript levels and importance for skin health30,40. Malassezia are highly represented in metatranscriptomes of sebaceous sites (scalp and cheeks), whereas S. epidermidis colonizes sebaceous and non-sebaceous sites (especially toe webs) and is readily cultured. In M. restricta, genes and pathways associated with fructose and mannose metabolism, metabolism in diverse environments (for example, acu-7_1 and acuE of the glyoxylate cycle; FBA1 and KGD1 of the citric acid cycle; TLK1 of the pentose phosphate pathway (PPP)) and free fatty acid breakdown in peroxisomes (for example, POX2, POT1, PEX11B and ACOT8) were upregulated on scalp versus cheeks (Fig. 3a,b and Supplementary Data 2). Conversely, genes and pathways associated with breakdown of glycerides, such as ether lipid and glycerophospholipid metabolism, and various secreted phospholipase C enzymes were upregulated on cheeks (PLC_1–7; Fig. 3a,b and Supplementary Data 2). This is consistent with distinct scalp and cheek lipid profiles, with the former being richer in nutritionally complex apocrine secretions and free fatty acids and the latter being richer in glycerides41,42. Although M. globosa had similar metatranscriptomic abundances to M. restricta on sebaceous sites (Fig. 2a), it had fewer differences in gene expression between scalp and cheek sites (<30; Fig. 3c). C. acnes had a similar number of differentially expressed features between the scalp and cheeks as M. restricta but showed fewer significantly enriched pathways, with only those related to ribosome biogenesis, translation and ATP generation being upregulated on scalp (Fig. 3c and Supplementary Data 2). These observations highlight how skin commensals employ different gene expression strategies to thrive across sites, including using specialized nutrient sources (M. restricta) and responding to energetic considerations (C. acnes).
a, Volcano plot of differentially expressed genes for M. restricta colonizing scalp versus cheek samples (≥200,000 M. restricta reads per library). Genes involved in peroxisomal activity or ether lipid/glycerophospholipid/sphingolipid metabolism are colored blue or yellow, respectively. The genes PLCN_1–7 encode secreted phosopholipase C enzymes. The genes POX2, POT1, PEX11B and ACOT8 encode proteins for peroxisomal function. Plotted P values are from the two-sided Wald test from DESeq2 and are FDR adjusted. b, Bar plots of normalized enrichment scores (NESs) from GSEA for M. restricta genes differentially expressed in scalp versus cheek samples. c, Bar plots showing the number of differentially expressed genes between scalp and cheek sites for different species. d, Schematic showing experimental setup for in vitro cultures of S. epidermidis under different growth conditions and stress exposures. Created with BioRender.com. e, Left, PCA plot for S. epidermidis gene expression profiles from various in vitro and in vivo samples after batch correction using limma; right, bar plots showing the number of differentially expressed genes for S. epidermidis between in vivo transcriptomes and transcriptomes for different in vitro conditions. ‘Sebaceous sites’ refers to both scalp and cheek samples with ≥200,000 S. epidermidis reads per library (n = 6). ‘Toe webs’ refers to toe web samples with ≥200,000 S. epidermidis reads per library (n = 12). PC, principal component.
The transcriptomes of skin microbes such as S. epidermidis are well studied in vitro43,44 but not in vivo, raising questions about how closely culture models reflect behavior on human skin. S. epidermidis gene expression in vivo was compared to three in vitro conditions: log phase, stationary phase and osmotic stress (Fig. 3d). Gene expression profiles between in vivo and in vitro conditions were clearly separated within the same experimental batch or across studies (generated in-house, Avican et al.44 and Wang et al.43) after batch correction (Fig. 3e and Supplementary Fig. 5). There were many more differentially expressed genes between skin versus laboratory conditions (n = 1,108–1,664; Fig. 3e and Supplementary Data 3 and 6), relative to modest changes between moist toe webs and sebaceous cheek/scalp sites in vivo (n = 64; Supplementary Data 4). More than 300 gene sets were significantly enriched in vivo or in vitro, with most differences observed between sebaceous sites and in vitro conditions (Supplementary Fig. 6 and Supplementary Data 5). Although S. epidermidis cultures upregulated genes for carbohydrate metabolism to utilize sugars in rich media, sebaceous sites skewed toward sulphur metabolism, peptide and vitamin biosynthesis, reflecting the more complex nutritional landscape in vivo (Supplementary Fig. 6). In contrast, there were far fewer significantly enriched gene sets/pathways in toe webs versus in vitro conditions, despite similar numbers of differentially expressed genes as sebaceous versus in vitro comparisons (Fig. 3e and Supplementary Data 3 and 6). Nonetheless, sets of genes involved in metal ion homeostasis (GO:0030001) and copper ion binding (GO:0005507) were consistently upregulated in S. epidermidis colonizing toe webs versus cultures at log phase or under osmotic stress (Supplementary Fig. 7a and Supplementary Data 7). Upregulated genes at toe webs included pumps (for example, P-type ATPases45) controlling influx and efflux of different substrates and metal ions such as copper and zinc (Supplementary Fig. 7b). These pumps could be crucial for survival in toe webs, a ‘closed’, sweat-rich environment containing host-excreted trace metals46,47, where maintaining balance of intracellular metal levels is essential48.
To contextualize overall differences in metabolism between in vivo and in vitro conditions, flux balance analysis (FBA) was conducted using genome-scale metabolic models constrained with metatranscriptomic data49. S. epidermidis showed distinct metabolic fluxes under all tested conditions (83 metabolic reactions differing across conditions, PERMANOVA Adonis R2 = 0.54, P < 0.001), with clear distinction observed for in vitro conditions and even between in vivo conditions (sebaceous and toe web; Extended Data Fig. 7a and Supplementary Data 8). Fluxes associated with production and export of propionate, a short chain fatty acid impacting skin barrier function and immunity50, were higher in vivo for S. epidermidis versus in vitro (Fig. 4a). FBA of glycolysis and pyruvate metabolism revealed that, unlike in vitro, S. epidermidis predominantly generates glyceraldehyde-3-phosphate (G3P) in vivo via the PPP (TALA, PFK_3) and L-lactate uptake (L_LACD2) (Fig. 4b). Pyruvate production in vivo is further supported by an active cataplerosis reaction (PPCKr), which generates intermediates for its synthesis (Fig. 4b). The NADH5 reaction, which regenerates NAD, showed no flux in vivo (Extended Data Fig. 7b). Instead, NAD regeneration might be driven by other compensatory cyclic reactions, such as those involving 4-aminobutyrate consumption (ABUTR) and production (ABUTD) (Extended Data Fig. 7c). Although significant differences were identified in 58 reactions between in vivo conditions, C. acnes exhibited lesser variation in metabolic flux between scalp and cheek (PERMANOVA Adonis R2 = 0.053, P < 0.001; Supplementary Fig. 8 and Supplementary Data 9). Notably, C. acnes displayed higher propionic acid production and export on cheek versus scalp via an active Wood–Werkman cycle51 (generalized linear model (GLM), adjusted P < 0.001; Fig. 4c). In vivo site-specific adaptations of C. acnes were also observed in metabolism of specific amino acids. Glutamate, integral to linking key carbon and nitrogen pathways52, was primarily produced via distinct metabolic routes—that is, histidine metabolism on scalp versus proline metabolism on cheek (Fig. 4d)—underscoring niche-specific adaptation to available resources. Thus, species-specific metatranscriptomic analysis can be invaluable for identifying metabolic requirements for survival on skin that may not be reflected in in vitro cultures.
a, Bar plots showing mean flux values of various reactions associated with propionate production during cysteine and methionine metabolism in different in vivo and in vitro growth conditions for S. epidermidis. Flux values were estimated using transcriptome-aware genome-scale metabolic models for S. epidermidis. EX_ppae(e) represents the reaction exporting propionate out of bacterial cells. All reaction fluxes are significantly higher for in vivo relative to in vitro conditions (GLM, family = ‘Gaussian’, two-sided t-test; FDR-adjusted P < 0.001). b, Same as a, except showing various reactions involved in pyruvate generation during glycolysis. All reaction fluxes are significantly different between in vivo and in vitro conditions. Reaction names highlighted in red are alternative means of generating pyruvate that have higher mean fluxes for in vivo versus in vitro conditions, consistent with in vivo metabolic dependency. c, Bar plots showing mean flux values of various reactions associated with propionate production during cysteine and methionine metabolism in different in vivo growth conditions (scalp or cheek) for C. acnes. Flux values were estimated using transcriptome-aware genome-scale metabolic models for C. acnes. d, Same as c, except showing various reactions involved in histidine and proline/glutamate metabolism. EX_his_L(e) and EX_pro_L(e) refer to transport reactions exporting histidine and proline, respectively. Cys, cysteine; met, methionine; gDCW hr, grams of dry cell weight multiplied by hour.
Gene-level analysis identifies key antimicrobial functions and interactions in vivo
As antimicrobial peptides (AMPs) and proteins can further shape niche adaptation and interspecies interactions, we searched for expressed genes involved in microbial competition in vivo. Candidate hits were detected using profile hidden Markov models (HMMs) (Methods). Skin microbes were found to express a diversity of bacteriocins, phenol-soluble modulins, enzymes that generate free radicals and auto-inducing peptides (AIPs) in vivo, representing many in vitro and ex vivo validated classes of antimicrobial products7,8 (Fig. 5a). Motivated by the importance of shortlisting microbes/bioactives for experimental follow-up, the species expressing different classes of antimicrobials were determined (Fig. 5b). AIPs were expressed by different Staphylococcus species, consistent with their roles in quorum sensing and inter-staphylococcal competition8. Of particular interest are bacteriocins, peptides secreted by bacteria and archaea to inhibit the growth or activities of other microbes in the community53. Some individuals harbored S. hominis and S. epidermidis strains expressing peptides of the lacticin 481 family reported to have broad activity against Gram positives54 (Fig. 5b). By contrast, relatively few individuals harbored Staphylococcal species with measurable expression of the well-characterized gallidermin/nisin family of bacteriocins, which could be due to strain differences or its documented autotoxicity55.
a, Heatmap of various classes of antimicrobial genes with detected expression (≥5 reads) in individual metatranscriptomes grouped by skin site. Blue tiles denote detected expression in each metatranscriptome. Red labels denote genes with site-specific enrichments based on two-sided Fisher’s exact tests, with FDR-adjusted P values. Bacterial genes were grouped into categories of antimicrobial features based on information from profile HMMs downloaded from the NCBI. b, Heatmap of various classes of antimicrobial genes with expression associated with species-level classified metatranscriptomic reads. The number of metatranscriptomes with detected expression (≥5 reads) is shown in the figure. c, Overlaid protein structures of the M. restricta protein encoded by DNF11_2196 and a Streptomyces papain inhibitor. Structures were obtained from the publicly available AlphaFold database. Structural similarity scores from Foldseek (TM-score and RMSD) and DALI (Z-score) are shown. Primary amino acid sequence identity between the two proteins is also given. d, Left, box plots of pro-IL-1B levels from human keratinocytes as measured by HiBiT and averaged across multiple strains of S. capitis (n = 4), S. epidermidis (n = 4), S. hominis (n = 8) and Staphylococcus warneri (n = 4). Data are from two biological repeats, each comprising three technical replicates. A Kruskal–Wallis test was conducted to assess whether levels differed significantly between groups (Kruskal–Wallis chi-square = 19.854, df = 5, P < 0.0013), and the adjusted P values from post hoc Dunn’s tests are shown. Right, box plots of cleaved IL-1B levels from human keratinocytes as measured by ELISA and averaged across multiple strains of S. capitis, S. epidermidis, S. hominis and S. warneri. Data are from two biological repeats, each comprising three technical replicates. A Kruskal–Wallis test was conducted to assess whether levels differed significantly between groups (Kruskal–Wallis chi-square = 33.864, df = 5, P < 2.6 × 10−6), and the adjusted P values from post hoc Dunn’s tests are shown. For all box plots, the bolded center line denotes the median value, and the lower and upper bounds of the boxes denote the first and third quartiles, respectively. The upper and lower whiskers extend up to 1.5× the interquartile range in their corresponding directions. Data beyond the whiskers are outlier points, which are individually plotted. df, degrees of freedom; NS, not significant.
There were notable examples of site-specific distribution of antimicrobial products. Transcripts belonging to cyclic lactone AIPs, thiazolylpeptides and their precursors were frequently detected on sebaceous sites (scalp and cheeks) compared to other sites (Fisher’s exact test, adjusted P < 0.001; Fig. 5a). In some cases, site-specific expression was detected despite similar abundances of host genomes. For example, expression of lacticin 481 family of lantibiotics by S. hominis and S. epidermidis was not evenly distributed across sites, with higher frequencies at volar forearm and toe web (Fisher’s exact test, adjusted P < 0.05; Fig. 5a). However, there were no significant differences in metagenomic and metatranscriptomic abundances of these species between lacticin 481 expressors versus non-expressors (Wilcoxon rank-sum test, P > 0.05; Extended Data Fig. 8a,b). Such expression variability could be due to strain or environmentally driven differences. In contrast, thiopeptide expression on skin was associated with increased metagenomic and metatranscriptomic abundances of C. acnes, implicating them as an important source of these AMPs (Wilcoxon rank-sum test, P < 0.05; Extended Data Fig. 8c,d).
Several bacteriocins detected in this study remain uncharacterized, representing an untapped source for development of new antimicrobials. Thiopeptides are important bioactives that can shape skin microbiomes—for example, cutimycin secreted by C. acnes can inhibit the growth of Staphylococcus species2. Two different thiopeptides (MET_03151623 and MET_02967399) were expressed by C. acnes in vivo (Supplementary Fig. 9). Although MET_03151623 was identical to cutimycin, MET_02967399 represents a putative bacteriocin with uncharacterized properties, sharing less than 50% amino acid identity with known members of the thiopeptide family (Supplementary Fig. 9). Some Cutibacterium and Corynebacterium strains also expressed peptides of the lactococcin 972 family, but most of their primary sequences differed from those characterized in bacilli such as Lactococcus (<40% identity; Fig. 5b and Supplementary Fig. 10). Unexpectedly, several individuals had strains of S. epidermidis and Staphylococcus pettenkoferi colonizing the toe webs, expressing putative bacteriocins homologous to the halocin family (Fig. 5b and Supplementary Fig. 11). Previously, family members were described only in halophilic archaea and bacteria56,57. These examples highlight the unexplored landscape of antimicrobial mechanisms deployed by skin microbial species and strains to thrive in vivo.
Finally, we leveraged paired metatranscriptomics and metagenomics to identify putative host–microbe and microbe–microbe interactions. Despite the relatively low fraction of human reads (median <3%), a median of 0.8 × 106 human reads per library was obtained, similar to the sequencing depth seen in single-cell transcriptomics studies58,59 (Supplementary Fig. 12a). Comparing our swab-based data to biopsy-based studies60,61 highlighted that a substantial proportion of human reads were exonic (median 31% versus less than 15% in biopsy-based study61) and the proportion of intergenic reads was low (median 18%) relative to transcriptomic data generated in-house from skin biopsies (median 32%; Supplementary Fig. 12b). Gene set variation analysis (GSVA)62 was used to estimate sample-specific immune pathway activities to find associations between host markers and metagenomic abundances of microbes. This analysis revealed three significant relationships, all involving S. capitis (false discovery rate (FDR) < 0.25; Supplementary Data 10). The IL-6/JAK/STAT3 and Toll-like receptor signaling pathways involved in Th17 function63 and microbial detection by the immune system, respectively, were positively correlated with S. capitis abundances on cheeks (Supplementary Fig. 13). This is consistent with observations associating high S. capitis abundances with IL-17-dominated immune profiles in skin disease64. Exposure to supernatants of different S. capitis strains led to consistently higher levels of pro-IL-1B and cleaved IL-1B in human keratinocytes versus other commensal Staphylococcal species, in line with our findings associating S. capitis with specific immune responses on skin (post hoc Dunn’s test, adjusted P < 0.05; Fig. 5d).
We further explored how our dataset could be mined to identify putative microbe–microbe interactions. AIPs and bacteriocins were typically expressed at low levels (median read count 42 and 11 for thiopeptides and AIPs, respectively), and, hence, samples were categorized as either expressors or non-expressors for these genes. We recapitulated several known associations, such as thiopeptides suppressing S. epidermidis relative to C. acnes2 and Staphylococcal AIPs having the opposite effect65,66 (Wilcoxon rank-sum test, P < 0.05; Extended Data Fig. 9). For other genes with larger dynamic range of expression, we correlated transcript abundances of secreted proteins in one species with DNA abundances of other species within the same site (Methods). Thirteen combinations of common skin commensals were tested transcriptome wide, and more than 30 significant associations were identified (FDR < 0.1; Supplementary Data 10). Notably, transcript abundances of an M. restricta protein (DNF11_2196) were strongly negatively correlated with C. acnes abundances on scalp (Spearman’s ρ < −0.7, adjusted P < 0.05; Extended Data Fig. 10a). DNF11_2196 is a poorly characterized gene (https://alphafold.ebi.ac.uk/entry/A0A3G2S5R5) with low identity to structure database hits (28.9% identity, 81.2% query sequence coverage; Foldseek web server, 3Di/AA mode). This makes it challenging to infer function with sequence-based homology alone. However, the availability of accurate protein structure folding and searching algorithms enables functional inference based on similarities in three-dimensional structure67,68. There was greater similarity at the protein structural level indicated by both Foldseek (TM-score = 0.77894, root mean square deviation (RMSD) = 2.36) and DALI (Z-score = 14.5) between the protein encoded by DNF11_2196 and a Streptomyces papain inhibitor (5ntb-B), which has antimicrobial properties attributed to inhibition of bacterial cysteine proteases69 (Fig. 5c). The negative correlation between DNF11_2196 and C. acnes abundances could not be explained by an inverse relationship between organismal abundances of M. restricta and C. acnes (Extended Data Fig. 10b–d). Similar analysis in cheeks showed that Cutibacterium granulosum abundances were positively correlated with expression of a C. acnes triacylglycerol lipase, independent of relationships between organismal abundances, indicating a potential symbiotic interaction (Spearman’s ρ > 0.7, adjusted P < 0.05; Extended Data Fig. 10e–g). Overall, these results demonstrate that integrating skin metatranscriptomic and metagenomic data can identify genes mediating host–microbe and microbe–microbe interactions in vivo, guiding prioritization of bioactive molecules or pathways for experimental and clinical validation.
Discussion
We developed a robust workflow for skin metatranscriptomics that addresses low biomass challenges by optimizing sampling, extraction, rRNA depletion, contamination removal and functional classification (Fig. 1a–c and Supplementary Fig. 1). Its joint application with shotgun metagenomics enables normalization of transcriptional activities by gene abundances and capturing strain-level differences. By contrast, 16S and ITS profiling, particularly with short-read sequencing, has limited taxonomic resolution and may not distinguish between strains with differing functional potential70. Applying skin metatranscriptomics across multiple sites in healthy individuals revealed active species, niche-specific adaptations and several key microbe–microbe and host–microbe interactions in vivo. Unlike human stool metatranscriptomes13 and skin metagenomes, stratum corneum metatranscriptomes are dominated by microbes, allowing detection of community-level functions at modest sequencing depths (5–10 million paired-end reads; Supplementary Fig. 2). The abundance of microbial reads means that skin metatranscriptomics can be feasibly deployed for population-scale or longitudinal studies, as sequencing costs continue to decrease. Our protocol is compatible with diverse sites and uses non-invasive sampling with commercially available swabs, making it clinically practical. The moderate-to-high success rate of library construction from limited input, strong technical reproducibility and even read coverage across genes enables robust in vivo expression profiling across heterogeneous cohorts and timepoints. Given the very low biomass at antecubital fossae and volar forearms relative to other sites, further improvements could be explored to decrease nucleic acid loss, such as using depletable carrier RNAs and single-cell RNA-seq kits that tolerate lower inputs. Altogether, our metatranscriptomics workflow is a widely applicable and clinically tractable approach to profile skin microbial activities in vivo.
Our computational workflow yielded important functional insights into the skin microbiome but requires further benchmarking against tools such as HuMAnN3 (ref. 22), MetaPro71 and SAMSA2 (ref. 72) using defined mock communities and metatranscriptomic datasets. Crucial metrics include specificity and sensitivity of read annotation, runtime and workflow customizability. Several areas also remain for further improvement. First, although most metatranscriptomic pipelines remove host reads by mapping to a single reference (for example, hg38), this can still lead to misclassification of residual host reads to microbes73. We found that a two-stage filtering approach using a Kraken 2 database containing the human genome mitigated this problem (Supplementary Note 2). Nonetheless, future workflows would benefit from more comprehensive host read removal by mapping to pangenome graphs74,75 and through splice-aware mapping of RNA-seq reads to pan-transcriptomes76. Second, our current workflow and that of others71 perform read deduplication conservatively by collapsing read pairs with identical 5′ and 3′ ends, thus removing a substantial proportion of reads. More precise deduplication based on unique molecular identifiers (UMIs) can better distinguish polymerase chain reaction (PCR) and biological duplicates77. Third, although we optimized workflows for swab-derived stratum corneum samples, they need further testing on deeper skin layers (for example, from biopsies, pore strips17 or follicular extracts78). Such studies are necessary to enrich for immune cell signatures that reside beneath the stratum corneum79 or for studying Cutibacterium strains colonizing individual pores80. Finally, although using a skin-specific microbial catalog increased annotation rates, it also limits detection of microbial functions unrepresented on skin. As is, our computational pipeline is a useful reference point for further development and benchmarking.
A key observation here is that skin metatranscriptomes provide distinct insights about microbial activities compared to matched metagenomes. There have been similar findings for human stool13 and ocean water14. Core microbial pathways on skin were expressed by relatively few species (low metatranscriptomic alpha diversity; Fig. 2c), unlike stool metatranscriptomes, which harbor numerous ‘housekeeping’ pathways expressed by most species13. This implies that a smaller fraction of skin microbes are active, with the remainder being quiescent or dead. Low metatranscriptomic alpha diversity of microbial pathways was previously described in nasal cavities and vaginal surfaces9, possibly reflecting nutrient scarcity and robust growth of a limited number of species. Malassezia and Staphyloccocus species had outsized contributions to skin metatranscriptomes, with disproportionally higher RNA versus DNA abundances. This can be from a combination of factors, such as cell size and bioactivity. As eukaryotic fungi, Malassezia cells are at least two orders of magnitude greater in volume and biomass than the average bacterial cell30. Hence, metagenomes that measure genome copy numbers likely underestimate the contribution of Malassezia to the functional potential of the skin microbiome. Malassezia species are important sources of secreted lipases, proteases and metabolites that can shape host and microbial activities on skin7,34. For example, Malassezia phospholipases can metabolize host lipids to generate polyunsaturated fatty acids such as arachidonic acid, which are potent inflammatory mediators81. The relatively high activity of various Staphylococcus species was unexpected, as they usually make up a small proportion of bacterial communities on skin, except in sites such as toe webs. This reflects the metabolic versatility of Staphylococcus spp., which utilize diverse carbon sources, amino acids and lipids to survive across different microenvironments82,83,84. Certain staphylococci such as S. epidermidis have also evolved a range of strategies for immune evasion, such as intracellular localization85, or by expressing antigens that lead to commensal-specific T cell responses86. Our workflow will, thus, be useful for studying how staphylococci contribute to skin health, especially given their reported associations with skin phenotypes such as malodor87, itch6 and eczema10.
To our knowledge, our work represents the first attempt to use metatranscriptomics to characterize metabolic pathways that enable skin microbes to thrive in their in vivo niches. Beyond identifying differentially expressed pathways, we integrated transcriptomic data with FBA49. FBA models organismal activity as a system of all predicted metabolic reactions instead of pathways in isolation. This identifies metabolic requirements or dependencies for maximizing biomass in vivo, which cannot be determined from in vitro models88. Although M. restricta was present on facial skin (cheeks) and scalp, metatranscriptomics revealed differential enrichment of pathways for metabolizing distinct classes of host lipids between the sites, underscoring the importance of expression-based functional profiling even when no significant differences in organismal abundances exist. When applied to microbes growing in vivo versus in vitro, transcriptome-aware FBA revealed that S. epidermidis relies more on the PPP and lactate metabolism to generate pyruvate in vivo. This is consistent with previous reports showing that PPP is crucial for energy production, biofilm formation and virulence in staphylococci89. Our results suggest that, besides C. acnes, commensal S. epidermidis may be another in vivo exporter of the short-chain fatty acid (SCFA) propionate, which has been linked to immunomodulation in keratinocyte and sebocyte cells90. Further studies are needed to determine if propionate mediates crosstalk between skin commensals and host immune cells in vivo and whether enhanced propionate export by S. epidermidis arises from glucose limitation and/or cysteine/methionine bioavailability (Fig. 4a). Differing fluxes through pathways for energy generation and metabolite export under in vivo versus in vitro conditions can help design strategies to coax microbes to export desirable metabolites, such as SCFAs. This also highlights the need for model systems that better recapitulate the cutaneous microenvironment32 when investigating how commensals contribute to skin health and disease.
Our workflows91 generate in vivo gene expression data92 across microbes, individuals and skin sites, identifying key antimicrobial functions and interactions. Most previous reports of skin commensal-derived peptides or enzymes against pathogens such as S. aureus were based on in vitro or ex vivo models7,8. We confirmed that many of these products, such as the thiopeptide cutimycin2 and peptides of the lacticin 481 family93, were expressed in vivo on some individuals by C. acnes and commensal staphylococci, respectively. Notably, our dataset shows that the skin microbiome is a rich source of antimicrobial proteins/peptides whose activities and specificities remain uncharacterized. One intriguing example involves proteins harboring a C-terminal cysteine-rich region homologous to halocins. Halocins have largely been characterized in archaea, but homologous, hypothetical proteins are encoded in genomes of many Staphylococcus strains (https://www.ncbi.nlm.nih.gov/Structure/cdd/TIGR04449). Our data indicate that genes encoding these proteins with halocin C8-like domains are expressed by staphylococci in vivo (toe webs), supporting further investigation into their antimicrobial properties. Because our workflow captures host and microbial gene expression, correlating with DNA abundances and integrating with other functional annotations is a promising approach to identifying novel candidates mediating microbe–microbe and host–microbe interactions in vivo. The M. restricta gene DNF11_2196 is one such candidate owing to the strong negative correlation between transcript levels and C. acnes abundance, its predicted entry into the secretory pathway and its structural similarity to known protease inhibitors (Fig. 5). Applying this approach to population-scale cohorts could facilitate systematic discovery of physiologically relevant interactions between skin microbes and host cells.
In conclusion, we developed a systematic workflow to sample and analyze skin metatranscriptomes and have shown its utility for capturing in vivo microbial activities distinct from DNA abundances and for identifying interactions that can shape microbial communities and host responses. Our data serve as a baseline for healthy individuals in future comparisons with disease datasets to gain deeper insights into microbial and host pathways that could be leveraged for diagnostic and therapeutic applications.
Methods
Participant recruitment
This study was conducted in the Genome Institute of Singapore. All associated protocols for this study were approved by the Agency for Science, Technology and Research Institutional Review Board (A*STAR IRB reference no. 2021-094) on 8 September 2021 and renewed until 7 September 2025. All participants recruited in this study were of Singaporean nationality or permanent residents, aged 21–65 years, gave informed consent and reported no skin disease at the time of sampling. Five participants were recruited for cross-sectional analysis and three for longitudinal analysis in a pilot cohort, respectively. For longitudinal analysis, participants were sampled for three visits over three consecutive days with a minimum interval of 24 hours. For cross-sectional analysis, 27 participants were recruited for the full cohort. All participants were required to abstain from showering for at least 12 hours before sampling. Human skin biopsies from A*STAR Skin Research Laboratories (A*SRL) of 6 mm and 8 mm were obtained from healthy skin donors, with approval from the National Healthcare Group (NHG) domain-specific review board (NHG DSRB 2017/00224 and NHG DSRB 2018/00945) at the National University Health System (NUHS) and the National Skin Centre (NSC) domain-specific review board (NSC DSRB 2019/00806) at the NSC.
Optimization of RNA extraction with pilot cohort
Different sample collection methods, bead tubes, purification methods and various combinations of each were tested and are detailed in Supplementary Data 1 and the Supplementary Methods.
Sample collection for full cohort
Skin samples were collected using FLOQSwabs (Copan Diagnostics, cat. no. 502CS01) from five different skin sites (scalp, cheek, antecubital fossa, volar forearm and toe web) from each participant (n = 135 metagenomes and n = 135 metatranscriptomes). For each skin site, three swabs were collected from the left and right side of the body and combined in a tube, except for the scalp, where only three swabs were used in total. Each swab was submerged in 1× PBS, and the excess solution was removed by pressing the swab against the tube wall. For each skin site, the moistened swab was rotated and rubbed, with constant pressure applied, in a zig-zag pattern and was repeated at an angle of 90° to the first rub, for a total of 1 minute. The contents of the swab were dislodged by stirring vigorously in either 800 µl or 1,300 µl of DNA/RNA Shield (Zymo Research, cat. no. ZYR.R1100) for scalp and other skin sites, respectively. Swabs were submerged in DNA/RNA Shield for 5 minutes at room temperature and stirred vigorously again. Excess solution on each swab was collected by pressing the swab against the wall of the tube. This was repeated for the remaining swabs for each site. Negative controls for each batch were collected by dislodging three swabs in DNA/RNA Shield (Zymo Research, cat. no. ZYR.R1100) without sampling the skin. Each sample or control was split into two portions for RNA (approximately 600 µl) and DNA (approximately 200 µl) extraction. All samples were stored at −80 °C prior to nucleic acid extraction.
RNA extraction for metatranscriptomics (Direct-zol method)
RNA was extracted using the Direct-zol RNA MicroPrep kit (Zymo Research, cat. no. ZYR.R2063). Compared to other extraction methods and kits, this approach was found to be the best performing in terms of RNA yield and RNA integrity (Supplementary Data 1). Bead beating of samples in TRI Reagent (Zymo Research, cat. no. R2050-1-200) was done using ZR BashingBead Lysis Tubes (0.5 mm and 0.1 mm; Zymo Research, cat. no. S6012) and a FastPrep-24 Instrument (MP Biomedicals) at 6.0 m s−1 for a total of 3 minutes, in 1-minute intervals, with 5-minute incubation on ice between each interval. Samples were DNase treated and purified according to kit instructions. An additional DNase treatment was carried out by adding 2.5 µl of DNase I and 10 µl of RDD buffer from RNase-Free DNase Set (Qiagen, cat. no. 79254) and 1 µl of Recombinant RNasin RNase Inhibitor (10,000 U; Promega, cat. no. N2515) in a total volume of 100 µl. The mixture was incubated at 37 °C for 30 minutes and purified with RNeasy MinElute Cleanup Kit (Qiagen, cat. no. 74204) in an elution volume of 14 µl of RNase-free water. High Sensitivity RNA ScreenTape analysis (Agilent Technologies, cat. nos. 5067-5579 and 5067-5580) was used to assess the quality of RNA, and extracted RNAs were stored at −80 °C.
DNA extraction for metagenomics (EZ1 method)
This approach (EZ1 method) was used for DNA extraction due to relatively poor DNA yields from the Direct-zol method, which is optimized for RNA extraction. Spike-ins of the metagenomic mock community (1.5 × 104 colony-forming units) were introduced to each sample prior to DNA extraction. Lysis of samples was carried out by adding 500 µl of ATL Buffer (Qiagen, cat. no. 19076) to the sample and homogenization in Lysing Matrix E tubes (MP Biomedicals, cat. no. 116914500) with a FastPrep-24 Instrument at a speed of 6.0 m s−1 for 40 seconds, done twice in total. Cell debris was pelleted at 16,000g for 5 minutes, and the supernatant was treated with 12 µl of Proteinase K at 56 °C for 15 minutes prior to purification with EZ1 DNA Tissue Kit (Qiagen, cat. no. 953034) using an EZ1 Advanced XL machine (Qiagen). A Qubit fluorimeter was used to quantify the amount of DNA.
Preparation of rRNA depletion mix
Oligo probe pools (desalted) were ordered from Integrated DNA Technologies targeting either Malassezia rRNA or other fungal rRNAs (Supplementary Table 3). Oligo pools were resuspended to a concentration of 2 µM per probe and then mixed together (that is, Malassezia:other fungi, 4:1) to form a pan-fungal rRNA depletion probe pool. The final rRNA depletion probe pool was made by combining the NEBNext rRNA Depletion Solution from NEBNext rRNA Depletion Kit V2 (Human/Mouse/Rat; New England Biolabs, cat. no. E7405) and NEBNext rRNA Depletion Kit V2 (Bacteria; New England Biolabs, cat. no. E7850) with the custom pan-fungal rRNA depletion probe pool at a volume ratio of 40:9:1, with 2 µl of this final probe mix used per sample for rRNA depletion
RNA library preparation
Human and microbial rRNAs were depleted from 5–10 ng of total RNA or the entire volume of eluted total RNA using custom oligonucleotides (Supplementary Table 3). Libraries were prepared according to the manufacturer’s instructions using NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (New England Biolabs, cat. no. E7760), with random primers used for reverse transcription. Depending on the RNA integrity number value of RNA, either section 2 or section 3 of the protocol was used. Library enrichment was carried out using NEBNext Multiplex Oligos for Illumina (96 Unique Dual Index Primer Pairs; New England Biolabs, cat. no. E6440) or NEBNext Multiplex Oligos for Illumina (Unique Dual Index UMI Adaptors RNA Set 1; New England Biolabs, cat. no. E7416) with 14 cycles of enrichment PCR. The quality of libraries was assessed by High Sensitivity D1000 ScreenTape Assay (Agilent Technologies, cat. no. 5067-558). Libraries were pooled in equimolar proportions and sequenced on an Illumina HiSeq X Ten system (approximately 35 million 2 × 150-bp read pairs per library).
DNA library preparation
NEBNext Ultra II FS DNA Library Prep Kit for Illumina (New England Biolabs, cat. no. E7805) was used according to the manufacturer’s instructions with some modifications. A volume of 26 µl of DNA was used as input and subjected to 10 minutes of fragmentation at 37 °C. Fragmented DNA was used for adapter ligation and was cleaned up using 0.6× volume of AMPure XP Reagent (Beckman Coulter, cat. no. A63882). Adaptor-ligated DNA was amplified for 12 cycles using NEBNext Multiplex Oligos for Illumina (96 Unique Dual Index Primer Pairs; New England Biolabs, cat. no. E6440) and cleaned up with 0.7× volume of AMPure XP Reagent (Beckman Coulter, cat. no. A63882). The final library was eluted in 20 µl of EB Buffer (Qiagen, cat. no. 19086). Quality of libraries was assessed by the High Sensitivity D1000 ScreenTape Assay (Agilent Technologies, cat. no. 5067-558). Libraries were pooled in equimolar proportions and sequenced on an Illumina HiSeq X Ten system (approximatery 25 million 2 × 150-bp read pairs per library). Sequence data from DNA extracted using the EZ1 method were compared with data obtained from DNA recovered via the Direct-zol method to demonstrate consistency between protocols (Supplementary Methods and Supplementary Fig. 14).
Data preprocessing and quality control
Short reads from metagenomic and metatranscriptomic libraries were processed using the Nextflow94 pipeline (https://github.com/Chiamh/meta-omics-nf, version 1.0.0, 642435a). Quality control and adapter trimming were done using fastp95 (version 0.22.0) with default settings. Metagenomes were further preprocessed by mapping to the hg38 human reference genome using BWA-MEM96 (version 0.7.10-r789), and reads that failed to map to hg38 were extracted using SAMtools97 (version 1.13) with parameters −f12 − F256. Human RNA was removed from metatranscriptomes by mapping to hg38 using STAR98 (2.7.9a). Reads originating from microbial rRNAs were computationally removed from metatranscriptomes using bbduk.sh (BBMap version38.93) and a k-mer database for rRNAs. Subsequently, microbial RNA reads were deduplicated using clumpify.sh (BBMap version 38.93) with parameters dedupe = t and optical = f. We further verified that differences in taxonomic composition between metatranscriptomes and metagenomes were not due to these workflow differences (Supplementary Note 3). These steps were run as part of the Nextflow pipeline.
Taxonomic classification
Metagenomic reads were classified using the Nextflow pipeline (https://github.com/Chiamh/meta-omics-nf, version 1.0.0, 642435a) with Kraken 2 (version 2.1.2) and Bracken99 (version 2.6.1) with parameters –use-names, –paired and –report-minimizer-data. Metatranscriptomic reads were classified with Kraken 2 using the same parameters in the pipeline. A 50-Gbp Kraken 2 database built from RefSeq bacterial, archaeal, viral, fungal and human (hg38) genomes, as well as plasmid sequences, was used. This database also contains additional Malassezia assemblies downloaded from the National Center for Biotechnology Information (NCBI) (Supplementary Data 11). Samples with at least 10,000 paired reads were retained, and reads that were still taxonomically assigned to Homo sapiens were removed. Microbial reads were defined as the sum of reads classified as bacteria (taxid 2), archaea (taxid 2157), virus (taxid 10239) and fungi (taxid 4751). The two-stage filtering approach—combining read mapping and Kraken 2 classification—effectively excluded most human signal, with less than 1% of microbial reads per species being misclassified from human reads (Supplementary Note 2).
False-positive species assignments for metagenomic reads were identified and removed as follows. Species assignments of metagenomic reads were considered true positives if there were 2,000 or more unique Kraken 2 minimizers per 1 million microbial reads or had 10 or more read pairs for the species with ≥10× more unique minimizers than read pairs. False-positive species assignments for metatranscriptomic reads were identified and removed using a similar approach, with empirically determined minimizer thresholds (Supplementary Methods).
Kitome removal
Potential reagent and laboratory contamination-associated species (the ‘kitome’) were identified and removed via a multistep process (Supplementary Fig. 1a and Supplementary Methods). Swab extraction controls (n = 7 negative handling controls) were sequenced from fresh swabs unexposed to human skin. Species-level taxonomic abundances of sequenced negative controls and the list of contaminant genera and species identified in this paper can be found in Supplementary Data 1. Note that the signals from these handling controls would vary across laboratories and are not exhaustive.
For metagenomes and metatranscriptomes of the full cohort, a median of 8.29% and 0.6% of microbial reads were identified as contaminants, respectively. A median of 5.5 million and 3.7 million read pairs from non-contaminant microbes remained in metagenomes and metatranscriptomes, respectively, after filtering for environmental contaminants.
Functional classification
Metagenomic and metatranscriptomic reads were functionally classified based on a similar strategy to that of HUMAnN3, using a custom Nextflow pipeline (https://github.com/Chiamh/meta-omics-nf, version 1.0.0, 642435a). Reads were first aligned in single-end mode using Bowtie 2 (ref. 100) (version 2.4.4) in –very-sensitive mode to the iHSMGC (https://ftp.cngb.org/pub/SciRAID/Microbiome/humanSkin_10.9M/GeneCatalog/IHSMGC.fa.gz) comprising approximately 10.9 million non-redundant genes. A coverage filter of 50% across the length of any given hit pangene sequence was used. Reads that failed to align to the iHSMGC were mapped against the UniRef90 (ref. 101) database (downloaded 9 June 2021) using DIAMOND102 (version 2.0.12) with parameters blastx, –id 80–query-cover 90 and –max-target-seqs 1. For searches against the UniRef90 database, only alignments with ≥80% sequence identity, ≥90% query (read) coverage and ≥50% subject (UniRef90 representative sequence) coverage were considered as hits. These thresholds were the same as the defaults used by HUMAnN3. The annotation rates for metatranscriptomes (median 66–80%) and metagenomes (median 64–87%) remained moderate to high across various body sites even after accounting for microbial contaminants.
Pangenes and UniRef90 clusters with valid hits after mapping were annotated and grouped into orthologous groups using eggNOG mapper103 (version 2.1.6) and the eggNOG 5.0 database104 with parameters -m diamond and –go_evidence all. Gene-level analysis was done for orthologous groups by summarizing the read counts at the level of bacteria (taxid 2) or fungi (taxid 4751). Rarefaction analysis for bacterial and fungal orthologous groups was conducted in R using the ‘rarecurve’ function from the ‘vegan’ package (version 2.6-6.1).
Pathway abundance and contributional diversity analysis
Pathway abundances were computed using HUMAnN3 (version 3.8), using a custom structured Kyoto Encyclopedia of Genes and Genomes (KEGG) module definition file (https://github.com/CSB5/skin_metatranscriptome/blob/main/metadata/KEGG_module_to_pathway_definition.tsv), with each definition retrieved using the KEGG REST API (for example, https://rest.kegg.jp/get/M00357). Microbial alpha (Simpson) and beta (Bray–Curtis dissimilarity) diversity for each KEGG module was calculated. Only modules that were core to a skin site (non-zero counts in more than 75% of individuals) with largely known microbial provenance (less than 25% unclassified species-level reads per module) were used for this analysis. Alpha or beta diversity scores greater than 0.5 were considered ‘high’ diversity.
S. epidermidis culture experiments
Overnight cultures of S. epidermidis (American Type Culture Collection (ATCC), 12228) were diluted into three volumetric flasks of culture at optical density at 600 nm (OD600) 0.01 and incubated at 37 °C for 6 hours to log phase (OD600 0.4–0.5). Each flask was split into three tubes (biological triplicates) and subjected to different stress exposures (Supplementary Table 2). Cell pellets were stored at −80 °C prior to RNA extraction.
Alignment to species-specific pangenomes
Bacterial species-specific analyses was done by mapping metagenomic and metatranscriptomic reads to curated pangenomes105 (https://ngdc.cncb.ac.cn/propan/) of eight commonly found skin microbes (S. aureus, S. epidermidis, S. hominis, S. capitis, C. acnes, Cutibacterium modestum, Corynebacterium tuberculostearicum and Corynebacterium ureicelerivorans), together with decoy genomes of the non-skin microbes Achromobacter xylosoxidans, Plesiomonas shigelloides, Vibrio vulnificus and Listeria monocytogenes. All genes in the eight skin species pangenomes were further clustered using CD-HIT (version 4.8.1) at ≥95% protein sequence identity and mutual alignment coverage ≥90%. Reads were aligned to the dereplicated eight skin species pangenomes in single-end mode using Bowtie 2 (version 2.4.4) in –very-sensitive mode (similar to HUMAnN3). A coverage filter of 50% across the length of any given hit pangene sequence was used. Fungal species-specific analyses were done by pseudo-alignment of metatranscriptomic reads to the reference transcriptomes of multiple Malassezia species, with their reference genomes as decoys to minimize occurrences of non-transcriptomic reads being erroneously counted due to similarities to the annotated transcriptome, following recommendations for Salmon106 (version 1.10.1). Unlike fungal reads, bacterial reads were mapped in single-end mode to account for the organization of open reading frames in polycistronic mRNAs, similar to the approach adopted by HUMAnN3. Single-ended read coverage over bacterial coding sequences was computed using Picard (version 3.1.1) CollectRnaSeqMetrics with arguments -STRAND FIRST_READ_TRANSCRIPTION_STRAND. Paired-end read coverage over fungal transcripts was computed using Picard CollectRnaSeqMetrics with arguments -STRAND SECOND_READ_TRANSCRIPTION_STRAND.
Transcriptional activity analysis
Metagenomic or metatranscriptomic reads were mapped to the iHSMGC pangene catalog and the UniRef90 database. Read counts for any given species (including all subspecies) were then divided by the length of each feature in kilobases to obtain the number of reads per kilobase (RPK). Reads belonging to a species that could not be mapped to pangene or UniRef90 features were assumed to belong to an unknown gene of length 1 kilobase for computing RPK. RPKs were summed and divided by 106 to obtain a per-sample scaling factor. Metatranscriptomic TPM or metagenomic CPM values were computed by dividing a feature’s RPK with the per-sample scaling factor, which normalizes for sequencing depth. The transcriptional activity of a species was estimated by summing species-level TPM values (RNA) and dividing them by species-level CPM (DNA) values.
Differential expression analysis
Raw counts were summarized at the level of bacterial (taxid 2) and fungal (taxid 4751) orthologous groups for all gene-level differential expression analysis. For differential expression analysis using MaAsLin2 (ref. 107) (version 1.18.0), inputs to the generalized linear mixed model were the centered log ratio108 transformed counts (averaged across 1,000 Monte Carlo instances) of microbial gene abundances, computed from aldex2 (version 1.28.1). The fixed effects were skin site (Sc, Ch, Ac, Vf or Tw), assay (RNA or DNA) and skin site:assay, and the random effect was participant ID. The coefficient of the interaction term represents the log2 fold change in gene expression between skin sites while accounting for within-individual dependencies and variation in gene copy numbers in the metagenomes. For differential expression analysis using DESeq2 (ref. 109) (version 1.36.0), the inputs were raw counts, and the design formula was ‘~ subject ID + skin site (Sc, Ch, Ac, Vf or Tw) + assay (RNA or DNA) + skin site:assay’. This design formula accounts for within-individual dependencies and variations in gene copy numbers in metagenomes while testing for differences in microbial gene expression between skin sites. Only features (rows) with median read count ≥10 for both DNA and RNA were kept. Size factors were estimated separately for the metagenomic and metatranscriptomic count matrices using the ‘poscounts’ function to account for data sparsity. Differential expression analysis at individual species level between two in vivo conditions was similarly done, except that only metatranscriptomic read counts were used from libraries with 200,000 or more species-specific reads, and the design formula was ‘~ subject + skin site’. Differential expression analysis for S. epidermidis comparing in vivo versus in vitro growth conditions was similarly done, except that the design formula was ‘~ batch + growth condition’ to account for experimental batch effects when using in vitro RNA-seq data from different sources. Batch-corrected principal component analysis (PCA) plots were derived from inputs processed with the removeBatchEffect function from limma110 (version 3.60.4).
Gene set enrichment analysis
Gene set enrichment analysis (GSEA) for differentially expressed microbial features was done using clusterProfiler111 (version 4.4.4) with the following arguments: eps = 0, nPermSimple = 10000 and seed = TRUE.
Integration of metatranscriptomics data with metabolic models
Genome-scale metabolic models (GSMMs) for S. epidermidis (ATCC, 12228) and Propionibacterium acnes (KPA171202) were obtained from the AGORA database112. These models were constrained using COBRApy113 based on specific exchange flux values corresponding to the conditions under which the simulations were performed. Genes in the metabolic models were mapped based on transcript levels, calculated as the geometric mean of transcript abundance (TPMs) across replicates or samples under the same growth condition. The integration of these TPM values into genome-scale models and the subsequent FBA was carried out in Python (version 3.12) using RIPTiDe49 (version 3.4.81). In brief, RIPTiDe incorporates gene expression data using reaction parsimony, generating context-specific GSMMs. The context-specific models were simulated using FBA to identify flux distributions within the organism under each condition separately. Non-metric multidimensional scaling (NMDS) of the Bray–Curtis distances between flux distributions was performed using the ‘vegan’ package (version 2.6-4) in R (version 4.3.0) to compare the flux profiles across conditions. Differentially abundant reactions were identified using a GLM (reactions ~ group), with reactions showing adjusted P values below 0.001 and absolute estimate values greater than 2 being considered significantly different between conditions. These reactions were plotted using ggplot2 (version 3.5.1) in R.
Identification and analysis of antimicrobial genes
HMMs of various classes of antimicrobial proteins were downloaded from the NCBI (https://www.ncbi.nlm.nih.gov/protfam; Supplementary Data 13). Microbial pangenes and representative sequences from UniRef90 gene clusters were searched against these HMMs using hmmscan from hmmer114 (version 3.3.2). Hits to an HMM were kept only if both the ‘seq’ and ‘best one domain’ scores were greater than or equal to the sequence and domain cutoffs given by the NCBI. Antimicrobial genes were considered ‘present’ in metatranscriptomes if their read counts were ≥5, with ≥50% coverage over the gene body. Multiple sequence alignment (MSA) of microbial sequences and representative sequences from HMMs was done using Muscle115 (version 5.1). Matrices of pairwise percentage identities were computed from the MSAs, and coverage statistics were computed using a custom Python script (https://github.com/CSB5/skin_metatranscriptome/blob/main/scripts/MSA_pairwise_coverage.ipynb).
Identification of microbe–gene associations
For association analyses involving genes with relatively low expression, samples were divided into positive and negative expressors based on a threshold of fewer than five reads, followed by Wilcoxon rank-sum tests to compare normalized species abundances between the two groups. For correlation analysis, SignalP116 (version 6.0) was used to classify microbial proteins (features) that can enter the secretory pathway. Fungal and bacterial proteins were analyzed in ‘fast’ mode with the options –organism ‘eukarya’ or ‘other’, respectively. Microbial features predicted to enter the secretory pathway were shortlisted for correlation analysis. For a given pair of microbes at a skin site, pairwise Spearman correlations were computed between the variance stabilized counts (RNA) of microbial features computed from DESeq2 and the centered log ratio108 transformed counts (averaged across 1,000 Monte Carlo instances) of microbial abundances (DNA) computed from aldex2 (version 1.28.1) and the aldex2propr function from propr (version 2.1.2). Features were considered significantly correlated only with Spearman’s ρ ≥ 0.7 and FDR-adjusted P ≤ 0.05.
Structural similarity searches
Protein Data Bank (PDB) files of selected microbial proteins were downloaded from the AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/). Structural similarity searches were done using the Foldseek Search server (https://search.foldseek.com/search) in 3Di/AA mode and the DALI server (http://ekhidna2.biocenter.helsinki.fi/dali/) in PDB search mode.
Microbial strain isolation
Staphylococcal strains were inoculated into 4 ml of BHI broth and grown overnight for 16 hours at 230 r.p.m. and 37 °C. OD was measured with a SpectraMax M5 Microplate Reader (Molecular Devices). Overnight cultures were then grown from OD 0.1 in fresh BHI for 26 hours at 230 r.p.m. and 37 °C. Final OD measurements for all strains after 26 hours were normalized to OD 2.75. Bacterial cultures were spun down at 5,000 r.p.m. for 5 minutes, pellets were discarded and supernatants were stored at 4 °C.
Human keratinocyte cell culture
A N/TERT keratinocyte cell line was used for ELISA experiments. A genetically modified N/TERT keratinocyte cell line was used for Nano-Glo HiBiT experiments, which contained a 33-nucleotide HiBiT tag directly adjacent to the IL1B start codon. Keratinocytes were cultured in keratinocyte serum-free medium (KSFM; Gibco) supplemented with bovine pituitary extract (20 μg ml−1), epidermal growth factor (0.2 ng ml−1), calcium chloride (300 μM) and 1:1,000 penicillin–streptomycin (Gibco). Keratinocytes were seeded onto a 96-well cell culture plate (Greiner Bio-One) at a density of 20,000 cells per well and grown for 24 hours in 100 μl of KSFM. After 24 hours, KSFM was removed, and the keratinocytes were cultured in 100 μl of KSFM with 1 μM anisomycin (Merck), 0.2% Triton X (Merck), 5% BHI or 5% Staphylococcal supernatants.
Nano-Glo HiBiT for pro-IL-1B measurements
After 16 hours of treatment of the HiBiT N/TERT keratinocytes, 50 μl of the KSFM was transferred to a white-bottom 96-well plate (Sigma-Aldrich) and mixed with 50 μl of Nano-Glo HiBiT extracellular reagent. Plates were mixed on an orbital shaker for 1 minute and incubated for 10 minutes at room temperature before luminescence was measured with a SpectraMax M5 Microplate Reader.
ELISA for cleaved IL-1B
After 16 hours of treatment of the N/TERT keratinocytes, 100 μl of the KSFM was aliquoted and stored at −80 °C until required. Human IL-1B ELISA was performed in accordance with the manufacturer’s instructions (FineTest). OD measurements were performed with a Spark Multimode Microplate Reader (Tecan) at 450 nm and corrected against 570 nm.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Shotgun metagenomic and metatranscriptomic sequencing data after human read removal are available from the European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/home) under project accession number PRJEB89273 (ref. 117). All large datasets, Foldseek webserver outputs, DALI webserver outputs, the Kraken 2 database used for taxonomic classification and other gene annotation databases are available on figshare at https://doi.org/10.6084/m9.figshare.c.7902962.v1 (ref. 91). Source data are provided with this paper.
Code availability
Source code for scripts used to analyze the data is available at https://github.com/CSB5/skin_metatranscriptome (ref. 92). Code for the Nextflow pipeline is available at https://github.com/Chiamh/meta-omics-nf.
References
Byrd, A. L., Belkaid, Y. & Segre, J. A. The human skin microbiome. Nat. Rev. Microbiol. 16, 143–155 (2018).
Claesen, J. et al. A cutibacterium acnes antibiotic modulates human skin microbiota composition in hair follicles. Sci. Transl. Med. 12, eaay5445 (2020).
Tay, A. S. L. et al. Atopic dermatitis microbiomes stratify into ecologic dermotypes enabling microbial virulence and disease severity. J. Allergy Clin. Immunol. 147, 1329–1340 (2020).
Oh, J. et al. Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014).
Uberoi, A. et al. Commensal microbiota regulates skin barrier function and repair via signaling through the aryl hydrocarbon receptor. Cell Host Microbe 29, 1235–1248 (2021).
Deng, L. et al. S. aureus drives itch and scratch-induced skin damage through a V8 protease-PAR1 axis. Cell 186, 5375–5393 (2023).
Li, H. et al. Skin commensal Malassezia globosa secreted protease attenuates Staphylococcus aureus biofilm formation. J. Invest. Dermatol. 138, 1137–1145 (2018).
Williams, M. R. et al. Quorum sensing between bacterial species on the skin protects against epidermal injury in atopic dermatitis. Sci. Transl. Med. 11, eaat8329 (2019).
Franzosa, E. A. et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat. Methods 15, 962–968 (2018).
Poh, S. E. et al. Expression of Staphylococcus aureus virulence factors in atopic dermatitis. JID Innov. 2, 100130 (2022).
Chua, W. et al. A Malassezia pseudoprotease dominates the secreted hydrolase landscape and is a potential allergen on skin. Biochimie 216, 181–193 (2024).
Chaves-Moreno, D. et al. Exploring the transcriptome of Staphylococcus aureus in its natural niche. Sci. Rep. 6, 33174 (2016).
Abu-Ali, G. S. et al. Metatranscriptome of human faecal microbial communities in a cohort of adult men. Nat. Microbiol. 3, 356–366 (2018).
Salazar, G. et al. Gene expression changes and community turnover differentially shape the global ocean metatranscriptome. Cell 179, 1068–1083 (2019).
Ziebandt, A. K. et al. Proteomics uncovers extreme heterogeneity in the Staphylococcus aureus exoproteome due to genomic plasticity and variant gene regulation. Proteomics 10, 1634–1644 (2010).
Both, A. et al. Genomics of invasive Cutibacterium acnes isolates from deep-seated infections. Microbiol. Spectr. 11, e0474022 (2023).
Kang, D., Shi, B., Erfe, M. C., Craft, N. & Li, H. Vitamin B12 modulates the transcriptome of the skin microbiota in acne pathogenesis. Sci. Transl. Med. 7, 293ra103 (2015).
Whitman, W. B., Coleman, D. C. & Wiebe, W. J. Prokaryotes: the unseen majority. Proc. Natl Acad. Sci. USA 95, 6578–6583 (1998).
Kong, H. H. et al. Performing skin microbiome research: a method to the madness. J. Invest. Dermatol. 137, 561–568 (2017).
Santiago-Rodriguez, T. M., Le François, B., Macklaim, J. M., Doukhanine, E. & Hollister, E. B. The skin microbiome: current techniques, challenges, and future directions. Microorganisms 11, 1222 (2023).
Li, Z. et al. Characterization of the human skin resistome and identification of two microbiota cutotypes. Microbiome 9, 47 (2021).
Beghini, F. et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 10, e65088 (2021).
Tan, C. C. S. et al. No evidence for a common blood microbiome based on a population study of 9,770 healthy humans. Nat. Microbiol. 8, 973–985 (2023).
de Goffau, M. C. et al. Human placenta has no microbiome but can contain potential pathogens. Nature 572, 329–334 (2019).
Breitwieser, F. P., Baker, D. N. & Salzberg, S. L. KrakenUniq: confident and fast metagenomics classification using unique k-mer counts. Genome Biol. 19, 198 (2018).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Franzosa, E. A. et al. Relating the metatranscriptome and metagenome of the human gut. Proc. Natl Acad. Sci. USA 111, E2329–E2338 (2014).
Deng, Z. L., Münch, P. C., Mreches, R. & McHardy, A. C. Rapid and accurate identification of ribosomal RNA sequences via deep learning. Nucleic Acids Res. 50, e60 (2022).
Wu, G. et al. Genus-wide comparative genomics of Malassezia delineates its phylogeny, physiology, and niche adaptation on human skin. PLoS Genet. 11, 1005614 (2015).
Vijaya Chandra, S. H., Srinivas, R., Dawson, T. L. & Common, J. E. Cutaneous Malassezia: commensal, pathogen, or protector? Front. Cell. Infect. Microbiol. 10, 614446 (2021).
Chong, C. E., Bengtsson, R. J. & Horsburgh, M. J. Comparative genomics of Staphylococcus capitis reveals species determinants. Front. Microbiol. 13, 1005949 (2022).
Swaney, M. H., Nelsen, A., Sandstrom, S. & Kalan, L. R. Sweat and sebum preferences of the human skin microbiota. Microbiol. Spectr. 11, e0418022 (2023).
Prohic, A., Jovovic Sadikovic, T., Krupalija-Fazlic, M. & Kuskunovic-Vlahovljak, S. Malassezia species in healthy skin and in dermatological conditions. Int. J. Dermatol. 55, 494–504 (2016).
Magiatis, P. et al. Malassezia yeasts produce a collection of exceptionally potent activators of the Ah (dioxin) receptor detected in diseased human skin. J. Invest. Dermatol. 133, 2023–2030 (2013).
Murphy, G. R. et al. Relationships between electrolyte and amino acid compositions in sweat during exercise suggest a role for amino acids and K+ in reabsorption of Na+ and Cl− from sweat. PLoS ONE 14, e0223381 (2019).
Imi, Y. & Shibata, K. Nutritional factors that affect the formation of 5-aminolevulinic acid, a key intermediate of heme biosynthesis. J. Nutr. Sci. Vitaminol. (Tokyo) 67, 339–350 (2021).
Leasure, C. S., Grunenwald, C. M., Choby, J. E., Sauer, J. D. & Skaar, E. P. Maintenance of heme homeostasis in Staphylococcus aureus through post-translational regulation of glutamyl-tRNA reductase. J. Bacteriol. 205, e0017123 (2023).
Thiele, J. J., Weber, S. U. & Packer, L. Sebaceous gland secretion is a major physiologic route of vitamin E delivery to skin. J. Invest. Dermatol. 113, 1006–1010 (1999).
Leung, M. H. Y. et al. Skin microbiome differentiates into distinct cutotypes with unique metabolic functions upon exposure to polycyclic aromatic hydrocarbons. Microbiome 11, 124 (2023).
Zheng, Y. et al. Commensal Staphylococcus epidermidis contributes to skin barrier homeostasis by generating protective ceramides. Cell Host Microbe 30, 301–313 (2022).
Jourdain, R. et al. Exploration of scalp surface lipids reveals squalene peroxide as a potential actor in dandruff condition. Arch. Dermatol. Res. 308, 153–163 (2016).
Baker, L. B. Physiology of sweat gland function: the roles of sweating and sweat composition in human health. Temperature (Austin) 6, 211–259 (2019).
Wang, X. et al. PhoU2 but not PhoU1 as an important regulator of biofilm formation and tolerance to multiple stresses by participating in various fundamental metabolic processes in Staphylococcus epidermidis. J. Bacteriol. 199, e00219-17 (2017).
Avican, K. et al. RNA atlas of human bacterial pathogens uncovers stress dynamics linked to infection. Nat. Commun. 12, 3282 (2021).
Palmgren, M. P-type ATPases: many more enigmas left to solve. J. Biol. Chem. 299, 105352 (2023).
Cohn, J. R. & Emmett, E. A. The excretion of trace metals in human sweat. Ann. Clin. Lab. Sci. 8, 270–275 (1978).
Kuan, W. H., Chen, Y. L. & Liu, C. L. Excretion of Ni, Pb, Cu, As, and Hg in sweat under two sweating conditions. Int. J. Environ. Res. Public Health 19, 4323 (2022).
Chandrangsu, P., Rensing, C. & Helmann, J. D. Metal homeostasis and resistance in bacteria. Nat. Rev. Microbiol. 15, 338–350 (2017).
Jenior, M. L., Moutinho, T. J., Dougherty, B. V. & Papin, J. A. Transcriptome-guided parsimonious flux analysis improves predictions with metabolic networks in complex environments. PLoS Comput. Biol. 16, e1007099 (2020).
Xiao, X. et al. The role of short-chain fatty acids in inflammatory skin diseases. Front. Microbiol. 13, 1083432 (2023).
Gonzalez-Garcia, R. A., McCubbin, T., Turner, M. S., Nielsen, L. K. & Marcellin, E. Engineering Escherichia coli for propionic acid production through the Wood–Werkman cycle. Biotechnol. Bioeng. 117, 167–183 (2020).
Walker, M. C. & van der Donk, W. A. The many roles of glutamate in metabolism. J. Ind. Microbiol. Biotechnol. 43, 419–430 (2016).
Heilbronner, S., Krismer, B., Brötz-Oesterhelt, H. & Peschel, A. The microbiome-shaping roles of bacteriocins. Nat. Rev. Microbiol. 19, 726–739 (2021).
Nakazono, K. et al. Complete sequences of epidermin and nukacin encoding plasmids from oral-derived Staphylococcus epidermidis and their antibacterial activity. PLoS ONE 17, e0258283 (2022).
Ebner, P. et al. Lantibiotic production is a burden for the producing staphylococci. Sci. Rep. 8, 7471 (2018).
Atanasova, N. S., Pietilä, M. K. & Oksanen, H. M. Diverse antimicrobial interactions of halophilic archaea and bacteria extend over geographical distances and cross the domain barrier. Microbiologyopen 2, 811–825 (2013).
Makarova, K. S. et al. Antimicrobial peptides, polymorphic toxins, and self-nonself recognition systems in archaea: an untapped armory for intermicrobial conflicts. mBio 10, e00715-19 (2019).
Ziegenhain, C. et al. Comparative analysis of single-cell RNA sequencing methods. Mol. Cell 65, 631–643 (2017).
Wu, A. R. et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods 11, 41–46 (2013).
Sølberg, J. et al. The stratum corneum transcriptome in atopic dermatitis can be assessed by tape stripping. J. Dermatol. Sci. 101, 14–21 (2021).
Amorim, C. F. et al. Variable gene expression and parasite load predict treatment outcome in cutaneous leishmaniasis. Sci. Transl. Med. 11, eaax4204 (2019).
Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).
O’Shea, J. J. et al. Signal transduction and Th17 cell differentiation. Microbes Infect. 11, 599–611 (2009).
Tham, K. C. et al. Distinct skin microbiome community structures in congenital ichthyosis. Br. J. Dermatol. 187, 557–570 (2022).
O’Neill, A. M. et al. Identification of a human skin commensal bacterium that selectively kills Cutibacterium acnes. J. Invest. Dermatol. 140, 1619–1628 (2020).
Peschel, A. & Otto, M. Phenol-soluble modulins and staphylococcal infection. Nat. Rev. Microbiol. 11, 667–673 (2013).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2023).
Zindel, S. et al. The papain inhibitor (SPI) of Streptomyces mobaraensis inhibits bacterial cysteine proteases and is an antagonist of bacterial growth. Antimicrob. Agents Chemother. 57, 3388–3391 (2013).
Janda, J. M. & Abbott, S. L. 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J. Clin. Microbiol. 45, 2761–2764 (2007).
Taj, B. et al. MetaPro: a scalable and reproducible data processing and analysis pipeline for metatranscriptomic investigation of microbial communities. Microbiome 11, 143 (2023).
Westreich, S. T., Treiber, M. L., Mills, D. A., Korf, I. & Lemay, D. G. SAMSA2: a standalone metatranscriptome analysis pipeline. BMC Bioinformatics 19, 175 (2018).
Gihawi, A. et al. Major data analysis errors invalidate cancer microbiome findings. mBio 14, e0160723 (2023).
Liao, W. W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Sirén, J. et al. Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg8871 (2021).
Sibbesen, J. A. et al. Haplotype-aware pantranscriptome analyses using spliced pangenome graphs. Nat. Methods 20, 239–247 (2023).
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
Hall, J. B. et al. Isolation and identification of the follicular microbiome: implications for acne research. J. Invest. Dermatol. 138, 2033–2040 (2018).
Nestle, F. O., Di Meglio, P., Qin, J. Z. & Nickoloff, B. J. Skin immune sentinels in health and disease. Nat. Rev. Immunol. 9, 679–691 (2009).
Conwill, A. et al. Anatomy promotes neutral coexistence of strains in the human skin microbiome. Cell Host Microbe 30, 171–182 (2022).
Plotkin, L. I., Mathov, I., Squiquera, L. & Leoni, J. Arachidonic acid released from epithelial cells by Malassezia furfur phospholipase A2: a potential pathophysiologic mechanism. Mycologia 90, 163–169 (1998).
Potter, A. D. et al. Host nutrient milieu drives an essential role for aspartate biosynthesis during invasive Staphylococcus aureus infection. Proc. Natl Acad. Sci. USA 117, 12394–12401 (2020).
DeMars, Z., Singh, V. K. & Bose, J. L. Exogenous fatty acids remodel Staphylococcus aureus lipid composition through fatty acid kinase. J. Bacteriol. 202, e00128-20 (2020).
Onyango, L. A. & Alreshidi, M. M. Adaptive metabolism in staphylococci: survival and persistence in environmental and clinical settings. J. Pathog. 2018, 1092632 (2018).
Perez, K. & Patel, R. Survival of Staphylococcus epidermidis in fibroblasts and osteoblasts. Infect. Immun. 86, e00237-18 (2018).
Naik, S. et al. Commensal–dendritic-cell interaction specifies a unique protective skin immune signature. Nature 520, 104–108 (2015).
Lam, T. H. et al. Understanding the microbial basis of body odor in pre-pubescent children and teenagers. Microbiome 6, 213 (2018).
Bordbar, A., Monk, J. M., King, Z. A. & Palsson, B. O. Constraint-based models predict metabolic and associated cellular functions. Nat. Rev. Genet. 15, 107–120 (2014).
Kim, J., Kim, G. L., Norambuena, J., Boyd, J. M. & Parker, D. Impact of the pentose phosphate pathway on metabolism and pathogenesis of Staphylococcus aureus. PLoS Pathog. 19, e1011531 (2023).
Sanford, J. A., O’Neill, A. M., Zouboulis, C. C. & Gallo, R. L. Short-chain fatty acids from Cutibacterium acnes activate both a canonical and epigenetic inflammatory response in human sebocytes. J. Immunol. 202, 1767–1776 (2019).
Chia, M. et al. Skin metatranscriptomics datasets and large files for Chia et al., 2025. figshare https://doi.org/10.6084/m9.figshare.c.7902962.v1 (2025).
Chia, M. Shotgun metagenomics & metatranscriptomics Nextflow pipeline. GitHub https://github.com/Chiamh/meta-omics-nf (2025).
Sashihara, T. et al. A novel lantibiotic, nukacin ISK-1, of Staphylococcus warneri ISK-1: cloning of the structural gene and identification of the structure. Biosci. Biotechnol. Biochem. 64, 2420–2428 (2000).
DI Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 2017, e104 (2017).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Suzek, B. E., Wang, Y., Huang, H., McGarvey, P. B. & Wu, C. H. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).
Buchfink, B., Reuter, K. & Drost, H. G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).
Cantalapiedra, C. P., Hern̗andez-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
Zhang, Y. et al. ProPan: a comprehensive database for profiling prokaryotic pan-genome dynamics. Nucleic Acids Res. 51, D767–D776 (2023).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon: fast and bias-aware quantification of transcript expression using dual-phase inference. Nat. Methods 14, 417–419 (2017).
Mallick, H. et al. Multivariable association discovery in population-scale meta-omics studies. PLoS Comput. Biol. 17, e1009442 (2021).
Fernandes, A. D. et al. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome 2, 15 (2014).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284 (2012).
Magnúsdóttir, S. et al. Generation of genome-scale metabolic reconstructions for 773 members of the human gut microbiota. Nat. Biotechnol. 35, 81–89 (2016).
Ebrahim, A., Lerman, J. A., Palsson, B. O. & Hyduke, D. R. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst. Biol. 7, 74 (2013).
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Edgar, R. C. Muscle5: high-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat. Commun. 13, 6968 (2022).
Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022).
Chia, M. et al. Skin metatranscriptomics reveals landscape of variation in microbial activity and gene expression across the human body. European Nucleotide Archive. https://www.ebi.ac.uk/ena/browser/view/PRJEB89273 (2025).
Acknowledgements
The authors would like to thank L. T. Chye and N. Teo from NUHS and S. Thng and Y. Y. Weng from NSC for collecting patient skin biopsies. This work is supported by the Asian Skin Microbiome Programme 2.0 (Industry Alignment Fund Pre-Positioning; H22J1a0040); the Agency for Science, Technology and Research (BMRC EDB IAF-PP grants – H17/01/a0/004 and BMRC Central Research Funds); and a National Medical Research Council Clinician Scientist-Individual Research Grant (CIRG23jul-0018) to M.C. and N.N. This work was also supported by the A*STAR Computational Resource Centre through the use of its high-performance computing facilities and a National Research Foundation Investigatorship grant (NRFI09-0015) to N.N. The N/TERT keratinocyte cell lines were provided by the Zhong Laboratory from the Lee Kong Chian School of Medicine to A*SRL.
Author information
Authors and Affiliations
Contributions
M.C. and N.N. conceived and designed the study. N.N. supervised the project. J.C. and N.N. gave conceptual advice and feedback. A.H.Q.N. and A.N.M.N. optimized skin metatranscriptomic experiments and organized participant recruitment. A.H.Q.N. conducted all experiments for the full cohort. M.C. wrote the analysis pipelines and interpreted all sequencing data. A.R. performed transcriptome-aware flux balance analysis. S.W. performed keratinocyte cultures and all IL-1B laboratory experiments. The manuscript was written by M.C. and N.N., with contributions from all other authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks Elizabeth Grice and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Sequencing statistics and quality control metrics.
(a) Boxplot showing Sorensen Similarity Indices (1 – Bray Curtis dissimilarity) of species-level relative abundances and Pearson correlation of expressed pathways from metatranscriptomes of two different microbial mock communities (n = 3 repeats for each). (b) Boxplot of percentage of reads functionally annotated by the custom workflow in this paper or by HUMAnN3 on default settings for a pilot cohort (n = 21 libraries). The p-value for the paired Wilcoxon signed rank test is shown. (c) Boxplot of RNA quality for the full cohort (n = 24, 23, 19, 18, 18 for scalp [Sc], cheek [Ch], antecubital fossa [Ac], volar forearm [Vf] and toe web [Tw] respectively). DV200 represents the percentage of RNA fragments > 200 nucleotides. (d) Boxplot of all reads and microbial read pairs for metagenomes for the full cohort. (e) Boxplot of the percentage of reads that do not map to the human genome (hg38) for skin metatranscriptomes (before de-duplication and microbial contaminant removal) and metagenomes for the full cohort. The p-value for the paired Wilcoxon signed rank test is shown. (f) Boxplot showing the proportion of microbial reads functionally annotated by our custom workflow for the full cohort.
Extended Data Fig. 2 Comparison of metatranscriptomic and metagenomic counts for various species across skin sites.
Scatter plots of species level metatranscriptomic (RNA) counts expressed as transcripts per million (TPM) against species level metagenomic (DNA) counts expressed as counts per million (CPM). Points on the solid line represent equal proportions of RNA to DNA for a species. The areas beyond the dotted lines represent ≥4-fold differences between RNA and DNA counts. Plots represent data from all 102 paired metagenomes and metatranscriptomes in this study. Note that Staphylococcus (dark blue) and Malassezia (lilac) species have an outsized contribution to metatranscriptomes at most sites despite their lower representation in metagenomes. Data is from the full cohort (n = 24, 23, 19, 18, 18 for scalp, cheek, antecubital fossa, volar forearm and toe web respectively).
Extended Data Fig. 3 Transcriptional activity of selected prevalent bacteria across skin sites.
Boxplots of transcriptional activity (RNA/DNA) for each species, normalized by gene lengths and library sizes. Each site was represented by at least 5 libraries for which DNA reads for the given species were present at ≥ 0.1% relative abundance. Pairwise Wilcoxon rank sum tests were conducted and adjusted p-values are shown in the corresponding heatmaps. Data is from the full cohort (n = 24, 23, 19, 18, 18 for scalp [Sc], cheek [Ch], antecubital fossa [Ac], volar forearm [Vf] and toe web [Tw] respectively).
Extended Data Fig. 4 Transcriptional activity of selected prevalent fungi across skin sites.
Boxplots of transcriptional activity for each species (RNA/DNA), normalized by gene lengths and library sizes. Each site was represented by at least 5 libraries for which DNA reads for the given species were present at ≥0.1% relative abundance. Pairwise Wilcoxon rank sum tests were conducted and adjusted p-values are shown in the corresponding heatmaps. Data is from the full cohort (n = 24, 23, 19, 18, 18 for scalp [Sc], cheek [Ch], antecubital fossa [Ac], volar forearm [Vf] and toe web [Tw] respectively).
Extended Data Fig. 5 Functional enrichment of bacterial and fungal differentially expressed genes across skin sites.
(a) Barplots of normalized enrichment scores (NES) computed from gene set enrichment analyses of bacterial or fungal orthologous groups that are differentially enriched between cheek (n = 22) and volar forearm (n = 18) metatranscriptomes. (b) Same as (a) but computed between toe web (n = 18) and volar forearm (n = 18) metatranscriptomes. Pathways which are statistically enriched from both DESeq2 and Maaslin2 analysis are bolded with asterisks for clarity.
Extended Data Fig. 6 Pathway and community level contributions in skin metatranscriptomes.
(a) Boxplots of core microbial pathways dominated ( > 50% contribution) by Malassezia, Staphylococcus or Cutibacterium. Adjusted p-values for pair-wise Wilcoxon ranked sum tests are shown. Stacked bar plots for species level pathway contributions at RNA level were estimated with HUMAnN3 for (b) beta oxidation of fatty acids, (c) galactose degradation by the Leloir pathway and (d) arginine biosynthesis. Community level relative abundances at RNA level were estimated with Kraken2. Alpha or beta diversity scores≤0.5 were considered “low” diversity.
Extended Data Fig. 7 Comparison of metabolic fluxes for Staphylococcus epidermidis under in vivo and in vitro conditions.
(a) MDS plot showing variation in metabolic flux between three in vitro (n = 9, 6, 9 for log phase, osmotic stress & stationary phase respectively) and two in vivo (n = 6, 12 for sebaceous sites and toe webs respectively) conditions. (b) Barplots showing mean flux values for the NADH5 reaction in Staphylococcus epidermidis which regenerates NAD+ via NADH dehydrogenase. (c) Barplots showing mean flux values for 4-aminobutyrate consumption (ABUTR) and production (ABUTD), which can also regenerate NAD + . Libraries for in vivo conditions had ≥200,000 S. epidermidis reads each.
Extended Data Fig. 8 Variability of species and RNA abundances as a function of expression of selected antimicrobials.
(a) Boxplots of centered log ratio (CLR) DNA abundances of S. hominis & S. epidermidis from volar forearms or toe webs. Comparisons are between those with and without expression of lacticin 481 family peptides. (b) Same as (a), except showing the RNA counts for the species. (c) Boxplots of DNA abundances (CLR) of C. acnes from sebaceous or non-sebaceous sites, excluding toe webs. Comparisons are between those with and without expression of thiopeptides. (d) Same as (c), except showing the RNA counts for the species. Samples with no detected RNA counts for C. acnes were excluded. In all subfigures, the number of libraries in each category is given below its label (n).
Extended Data Fig. 9 Skin metatranscriptomics recapitulates known associations between anti-microbial genes and their targets.
(a) Boxplots of the log10 ratios of Cutibacterium acnes to Staphylococcus epidermidis or Staphylococcus capitis DNA abundances from sebaceous (cheek and scalp) or non-sebaceous (antecubital fossa, volar forearm and toe web) sites. Comparisons are between those with and without expression of thiopeptides ( < 5 RNA reads). (b) Same as (a), except that the comparisons are between those with and without expression of Staphylococcal auto-inducing peptides (AIPs). The number (n) of samples for each group are shown.
Extended Data Fig. 10 Correlations between microbial transcript and organism abundances of two distinct species on the same site.
(a) Scatterplot of adjusted p-values and Spearman Rho coefficients derived from correlating Malassezia restricta transcript abundances with Cutibacterium acnes metagenomic abundances across individuals on scalp sites. (b) Scatterplot showing negative correlation between variance stabilized (vst) levels of a transcript from a Barwin domain-containing protein (DNF11_2196) expressed in Malassezia restricta and centered log ratio (clr) abundances of Cutibacterium acnes. (c) Same as (b) but showing the relationship between the transcript and clr abundances of M. restricta. (d) Scatterplot of clr abundances of M. restricta against C. acnes. (e) Volcano plot showing correlation and adjusted p-values of transcripts of C. acnes proteins in the secretory pathway against clr abundances of Cutibacterium granulosum. Statistically significant correlations are highlighted in red. (f) Scatterplot showing negative correlation between levels of a transcript (AAT83849.1) expressed in C. acnes and clr abundances of C. granulosum. (g) Same as (f) but showing the relationship between the transcript and clr abundances of C. acnes.
Supplementary information
Supplementary Information
Supplementary Notes 1–3 (with references), Supplementary Figs. 1–14 and Supplementary Methods (with references).
Supplementary Table 1
Sample breakdowns for the pilot and full cohorts.
Supplementary Table 2
Culture conditions for S. epidermidis in vitro experiments and composition of mock communities used for benchmarking in this study.
Supplementary Table 3
rRNA depletion oligos for fungi and quantitative PCR primers for Staphylococcus species used in this study.
Supplementary Data
Statistical source data for supplementary figures.
Supplementary Data 1
Excel file with tabs detailing RNA extraction optimization, pilot cohort quality control, full cohort quality control and taxonomic profiles for negative controls and the full cohort and the list of environmental contaminants identified in this study.
Supplementary Data 2
Species-specific differential expression results (DESeq2) and GSEA results.
Supplementary Data 3
Differential expression results (DESeq2) for S. epidermidis from sebaceous sites versus three different in vitro conditions (log phase, osmotic stress and stationary phase).
Supplementary Data 4
Differential expression results (DESeq2) for S. epidermidis from toe webs versus sebaceous sites.
Supplementary Data 5
GSEA results after DESeq2 for S. epidermidis from sebaceous sites versus three different in vitro conditions (log phase, osmotic stress and stationary phase).
Supplementary Data 6
Differential expression results (DESeq2) for S. epidermidis from toe webs versus three different in vitro conditions (log phase, osmotic stress and stationary phase).
Supplementary Data 7
GSEA results after DESeq2 for S. epidermidis from toe webs versus three different in vitro conditions (log phase, osmotic stress and stationary phase).
Supplementary Data 8
FBA results for S. epidermidis across various in vivo and in vitro conditions.
Supplementary Data 9
FBA results for C. acnes on scalp.
Supplementary Data 10
GSVA for expressed host pathways on scalp and cheeks, MaAsLin2 results for host–microbe associations and correlation analyses for microbe–microbe associations.
Supplementary Data 11
Malassezia assemblies used in the custom Kraken 2 database for this study.
Supplementary Data 12
Matrix of Pearsonʼs correlation coefficients of species abundances for contaminant identification.
Supplementary Data 13
Sequence and domain cutoffs for HMMs of various antimicrobial peptides/proteins.
Source data
Source Data Fig. 1
Statistical source data.
Source Data Fig. 2
Statistical source data.
Source Data Fig. 3
Statistical source data.
Source Data Fig. 4
Statistical source data.
Source Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 1
Statistical source data.
Source Data Extended Data Fig. 2
Statistical source data.
Source Data Extended Data Fig. 3
Statistical source data.
Source Data Extended Data Fig. 4
Statistical source data.
Source Data Extended Data Fig. 5
Statistical source data.
Source Data Extended Data Fig. 6
Statistical source data.
Source Data Extended Data Fig. 7
Statistical source data.
Source Data Extended Data Fig. 8
Statistical source data.
Source Data Extended Data Fig. 9
Statistical source data.
Source Data Extended Data Fig. 10
Statistical source data.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Chia, M., Ng, A.H.Q., Ravikrishnan, A. et al. Skin metatranscriptomics reveals a landscape of variation in microbial activity and gene expression across the human body. Nat Biotechnol (2025). https://doi.org/10.1038/s41587-025-02797-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41587-025-02797-4