Switch-like gene expression modulates disease risk

Aqil, Alber; Li, Yanyan; Wang, Zhiliang; Islam, Saiful; Russell, Madison; Kallak, Theodora Kunovac; Saitou, Marie; Gokcumen, Omer; Masuda, Naoki

doi:10.1038/s41467-025-60513-x

Download PDF

Article
Open access
Published: 18 June 2025

Switch-like gene expression modulates disease risk

Nature Communications volume 16, Article number: 5323 (2025) Cite this article

7819 Accesses
1 Citations
105 Altmetric
Metrics details

Subjects

Abstract

While switch-like gene expression (“on” in some individuals and “off” in others) has been linked to biological variation and disease susceptibility, a systematic analysis across tissues is lacking. Here, we analyze genomes, transcriptomes, and methylomes from 943 individuals across 27 tissues, identifying 473 switch-like genes. The identified genes are enriched for associations with cancers and immune, metabolic, and skin diseases. Only 40 (8.5%) switch-like genes show genetically controlled switch-like expression in all tissues, i.e., universally switch-like expression. The rest show switch-like expression in specific tissues. Methylation analysis suggests that genetically driven epigenetic silencing explains the universally switch-like pattern, whereas hormone-driven epigenetic modification likely underlies the tissue-specific pattern. Notably, tissue-specific switch-like genes tend to be switched on or off in unison within individuals, driven by tissue-specific master regulators. In the vagina, we identified seven concordantly switched-off genes linked to vaginal atrophy in females. Experimental analysis of vaginal tissues shows that low estrogen levels lead to decreased epithelial thickness and ALOX12 expression. We propose that switched-off driver genes in basal and parabasal epithelia suppress cell proliferation, leading to epithelial thinning and vaginal atrophy. Our findings underscore the implications of switch-like genes for diagnostic and personalized therapeutic applications.

Integrating genetic regulation and schizophrenia-specific splicing quantitative expression with GWAS prioritizes novel risk genes for schizophrenia

Article Open access 06 October 2025

Engineered poly(A)-surrogates for translational regulation and therapeutic biocomputation in mammalian cells

Article Open access 04 January 2024

Integrative epigenomics in Sjögren´s syndrome reveals novel pathways and a strong interaction between the HLA, autoantibodies and the interferon signature

Article Open access 02 December 2021

Introduction

The study of gene expression began in earnest with the characterization of lactose-metabolizing on-versus-off switch-like genes in E. coli¹. The presence of lactose triggered the production of enzymes needed to metabolize it, while the absence of lactose was accompanied by the absence of these enzymes. These genes acted like switches, toggling between “on” and “off” states based on the presence or absence of lactose, respectively. In subsequent decades, the discovery of enhancer elements^2,3,4, epigenetic modifications^5,6,7,8, and transcription factor dynamics⁹ revealed that gene expression in humans is more nuanced, resembling a dimmer more often than a simple on-and-off mechanism. Consequently, the study of switch-like genes in humans was largely relegated to the narrow realm of Mendelian diseases^10,11,12.

The recent availability of population-level RNA-sequencing data from humans has made it possible to systematically identify switch-like versus dimmer-like genes. For dimmer-like genes in a given tissue, we expect expression levels across individuals to be continuously distributed with a single mode, i.e., a unimodal distribution. In contrast, expression levels of switch-like genes in a given tissue are expected to exhibit a bimodal distribution, with one mode representing the “off” state and the other representing the “on” state.

As we will detail, bimodal expression across individuals is a characteristic of a gene in a specific tissue, referred to as a gene-tissue pair. We define a gene as switch-like if it exhibits bimodal expression in at least one tissue. Most of the recent studies on bimodal gene expression are related to cancer biology, associating on and off states to different disease phenotypes and their prognoses^13,14,15. These cancer studies have already produced promising results for personalized medicine¹⁶. However, to our knowledge, the only study focusing on switch-like genes in non-cancerous tissues across individuals restricted their analysis to muscle tissue¹⁷. As a result, the dynamics of switch-like expression across the multi-tissue landscape remain unknown. We hypothesize that switch-like expression is ubiquitous but often tissue-specific. We further hypothesize that these tissue-specific expression trends underlie common disease states. Therefore, the analysis of switch-like genes across tissues and individuals may provide a means for early diagnosis and prediction of human disease.

Here, we systematically identified switch-like genes across individuals in 27 tissues. We find that these switch-like genes are enriched for disease associations. Moreover, our results explain the genetic and epigenetic regulatory bases of switch-like expression in humans, highlighting genomic structural variation as a major source of switch-like expression of genes in multiple tissues. We also find that sex-bias is ubiquitous among breast-specific switch-like genes. Furthermore, we identified a group of switch-like genes in the vagina wherein the “off” state predisposes individuals to vaginal atrophy. Overall, these findings improve our understanding of the regulation of switch-like genes in humans. They also suggest promising future paths for preventative biomedical interventions.

Results

Phenotypic relevance of switch-like genes

The misregulation of highly expressed genes often has consequences for health and fitness. To systematically identify biomedically relevant switch-like genes in humans, we focused on 19,121 genes that are highly expressed (mean TPM > 10) in at least one of the 27 tissues represented in the GTEx database Fig. 1A, B; and Supplementary Table 1). For each of the 516,267 gene-tissue pairs (19,121 genes x 27 tissues), we applied the dip test of unimodality¹⁸ to the expression level distribution across individuals (Fig. 1C). Controlling for confounders, using $p$-value recalibration for the dip test, and correcting for multiple hypotheses, we identified 473 switch-like genes (Fig. 1C; and Supplementary Data 1 and 2; Methods; Supplementary Notes 1–3; Supplementary Figs. 1–4). The expression of these genes is bimodally distributed in at least one tissue, such that it is switched “off” for one subset of individuals and switched “on” for the rest of the individuals.

To gauge the phenotypic effects of the 473 switch-like genes, we performed two enrichment analyses: (1) gene ontology (GO) enrichment for biological processes using GREAT¹⁹ and (2) disease ontology enrichment²⁰. In the GO analysis, the switch-like genes were significantly enriched in biological processes related to skin, immunity, and metabolism, including keratinization (fold enrichment = 5.4; hypergeometric test $q$-value = 9.6 × 10⁻¹⁰), the B-cell receptor signaling pathway (fold enrichment = 4.6; hypergeometric test $q$-value = 1.2 × 10⁻²), and the D-ribose metabolic process (fold enrichment = 9.6; hypergeometric test $q$-value = 2.0 × 10⁻³) (Fig. 1D). Similarly, disease ontology analysis revealed enrichment in integumentary, immune-related, and metabolic diseases (Fig. 1E). Notably, switch-like genes were also enriched in diseases related to cell proliferation, including pre-malignant neoplasms (fold enrichment = 3.46; $q$-value < 10^-4), benign neoplasms (fold enrichment = 1.49; $q$-value = 4.4 × 10⁻³), and cancers (fold enrichment = 1.26; $q$-value = 1.8 × 10⁻²). These findings align with previous studies that have linked switch-like genes to cancer^13,14,15,16, further highlighting their potential role in oncogenesis.

Tissue-specificity of bimodal expression

Expression of different switch-like genes may be bimodally distributed in different numbers of tissues. We contend that genes that are bimodally expressed across all tissues are likely so due to a germline genetic polymorphism driving switch-like expression across tissues. If this is the case, the expression of these genes would be highly correlated in most pairs of tissues. Given this insight, discovering universally bimodal genes is more tractable using tissue-to-tissue co-expression of each gene. Therefore, for each gene, we calculated pairwise tissue-to-tissue correlation of expression levels for all tissue pairs (Methods; Supplementary Data 3; Supplementary Note 4; Supplementary Fig. 5). To visualize tissue-to-tissue co-expression patterns of genes, we performed principal component analysis (PCA) on the tissue-to-tissue gene co-expression data (Supplementary Data 4). We emphasize that we are referring to the co-expression of the same gene across pairs of tissues instead of the co-expression of pairs of genes in the same tissue. In the space spanned by the first two principal components (explaining 40.6% and 2.98% of the variance, respectively), switch-like genes form two major clusters (cluster 1 and cluster 2; Methods), dividing along PC1 (Fig. 2A). Applying PCA exclusively to switch-like genes reveals the further division of cluster 2 into two distinct subclusters – cluster 2 A and cluster 2B – in the space spanned by the first two principal components (explaining 74.5% and 5.31% of the variance, respectively) (Fig. 2B; and Supplementary Data 5).

**Fig. 2: Categorization of switch-like genes.**

Manual inspection reveals that cluster 1, which contains 432 genes, represents genes, such as KRT17, with bimodal expression in a small subset of tissues (Fig. 2C; Supplementary Note 5). Cluster 2 A consists of 32 genes, such as GPX1P1, with bimodal expression in all tissues (Fig. 2D; and Supplementary Fig. 6; Supplementary Note 6). Lastly, cluster 2B represents eight genes, such as EIF1AY, with bimodal expression in all non-sex-specific tissues but not in sex-specific tissues (Fig. 2E; and Supplementary Fig. 7). We will refer to genes in cluster 1 as “tissue-specific switch-like genes.” Although some of them are bimodally expressed in more than one tissue, these genes tend to exhibit high tissue specificity in their bimodal expression. Genes in cluster 2 will be referred to as “universally switch-like genes.”

Genetic variation underlies universally switch-like genes

We found that 8.5% of all switch-like genes (i.e., the proportion of switch-like genes that are in cluster 2) show clear bimodal expression, at least in all tissues common to both sexes. We contend that germline genetic variation across individuals likely drives this universally switch-like gene expression, specifically due to four major types of genetic variants. Firstly, we expect genes on the Y chromosome to show bimodal expression in all tissues common to both sexes since these genes are present in males and absent in females (Fig. 3A). Consistent with this reasoning, seven out of the eight genes in cluster 2B lie within the male-specific region of the Y-chromosome²¹; the remaining gene in cluster 2B is XIST, showing female-specific expression. Secondly, structural variation affecting a gene could lead to universally bimodal expression for that gene; for example, a homozygous gene deletion would result in the gene being switched off (Fig. 3B). We found eight such genes in cluster 2 A for which genomic structural variants likely underlie the observed universally switch-like expression; four genes are affected by gene deletions, two by polymorphic duplication of the gene, and two by polymorphic insertion into the gene. Thirdly, structural variation in a regulatory element can also switch off a gene in a part of the population (Fig. 3C). FAM118A is likely an instance of this scenario. Lastly, a loss-of-function single-nucleotide variant (SNV) or short indel, which disrupts gene function, can switch off the gene (Fig. 3D). We identified 13 genes in cluster 2 A where such SNVs cause universal bimodality.

**Fig. 3: Genetic bases of universally switch-like gene expression (cluster 2).**

Collectively, we could genetically explain the expression of 22 out of 32 (69%) cases in cluster 2 A despite the small number of genes fitting our conservative definition for universally switch-like genes. SNVs underlie 13 of these cases (Fig. 3B), while structural variants underlie the remaining nine cases (Fig. 3D). Thus, out of the 22 cases where we can explain the genetic underpinnings of switch-like expression, 41% involve genomic structural variation, highlighting the importance of this type of genetic variation (see Supplementary Note 7 and Supplementary Figs. 8–10 for a discussion on why an expression distribution driven by three genotypes at a polymorphic site might still appear bimodal). Although we could not identify the genetic variation underlying the bimodal expression of the remaining 10 genes in cluster 2 A, their consistent and highly correlated switch-like expression across all tissues strongly suggests a genetic basis. We anticipate that better resolution assemblies and detailed regulatory sequence annotations will help identify the genetic variants responsible for the remaining universally switch-like genes.

We highlight a clear example of a common structural variant leading to universally switch-like expression (Fig. 3B). USP32P2 and FAM106A – both universally switch-like genes – are bimodally expressed in all 27 tissues. Both genes show high levels of tissue-to-tissue co-expression. A common 46 kb deletion (esv3640153), with a global allele frequency of ~25%, completely deletes both genes (Fig. 4A, B). We propose that this deletion accounts for the universal switch-like expression of both USP32P2 and FAM106A in all tissues. For illustration, we show the expression level distributions of USP32P2 and FAM106A in the cerebellum (Fig. 4C, D). Indeed, the haplotype harboring this deletion is strongly associated with the downregulation of both genes in all 27 tissues ($p$ < 10^-5 for every single gene-tissue pair, Methods). We note that the under-expression of USP32P2 in sperm is associated with male infertility²², and plausibly, homozygous males for the deletion may be prone to infertility. Additionally, FAM106A interacts with SARS-CoV-2 and is downregulated after infection, at least in lung-epithelial cells^23,24,25. Individuals with FAM106A already switched off may develop more severe COVID-19 symptoms upon infection, though further investigation is needed. The case of FAM106A and USP32P2 exemplifies the link between disease and bimodal gene expression.

**Fig. 4: An example of a polymorphic gene deletion resulting in universally switch-like gene expression.**

Sex is a frequent modulator of tissue-specific switch-like gene expression

Tissue-specific expression patterns are crucial for tissue function. Thus, we now turn our attention to tissue-specific switch-like genes. We found that the breast, colon, intestine, kidney, and vagina show a higher number of tissue-specific switch-like genes than other tissues (Fig. 5A). Moreover, within these tissues, the expression of switch-like genes is not independent; instead, at least a subset of switch-like genes exhibit high pairwise gene-to-gene co-expression (Fig. 5B, C; and Supplementary Data 6). This pattern is most pronounced in the breast and colon, where tissue-specific switch-like genes tend to be either all switched off or switched on within an individual. A distinct co-expression pattern emerges in the vagina, where switch-like genes form two communities. Within each community, genes exhibit positively correlated expression levels, meaning they tend to be switched on or off in unison in a given individual. In contrast, genes from different communities show negatively correlated expression. The presence of these communities is consistent with synchronized hormonal regulation, likely driven by estrogen, as discussed later in the manuscript. Specifically, genes in one community are switched on when systemic hormone levels exceed a certain threshold, while genes in the other community are simultaneously switched off under the same conditions. Overall, these results suggest a shared regulatory mechanism for the expression of these genes in each tissue. Given that hormonal regulation plays a substantial role in shaping tissue-specific expression patterns^26,27, we hypothesize that hormones may regulate genes that are bimodally expressed in specific tissues (cluster 1; Fig. 2B).

**Fig. 5: Characterization of tissue-specific switch-like genes (cluster 1).**

Sexual differences in hormonal activity are well documented^28,29. To explore this further, we investigated whether hormone-mediated sex-biased expression underlies the co-expression of tissue-specific switch-like genes within tissues. Under this scenario, a gene would be largely switched on in one sex and off in the other in a given tissue. Among tissue-specific switch-like genes, we identified 163 gene-tissue pairs with sex-biased bimodal expression (Fig. 6A; and Supplementary Data 7). These instances are biologically relevant; for example, we found switch-like immunoglobulin genes with female-biased expression in the muscle (IGKJ5) and adrenal gland (IGHJ5), and male-biased expression in the minor salivary gland (IGHV5-10-1). This observation may relate to previous findings^30,31 of higher antibody responses to diverse antigens in females than in males. We note that the X-chromosome gene FAM9C shows female-biased, bimodal expression in the spleen. This female bias results from its escape from X-chromosome inactivation^32,33, leading to a higher dosage in females. Additionally, the rs3037872 (C/CTG) variant switches the gene off specifically in haploid C (males) or diploid C/C (females), but not in other genotypes, contributing to bimodality. Despite this double dosage being present across tissues, FAM9C clusters with tissue-specific switch-like genes (Cluster 1) rather than universally sex-biased switch-like genes (Cluster 2B). This is because it is switched off in all individuals across most tissues, precluding high tissue-to-tissue co-expression.

**Fig. 6: Sex- and age-biased expression of tissue-specific switch-like genes (cluster 1).**

More dramatically, we found that 157 out of 158 tissue-specific switch-like genes (cluster 1) in the breast tissue are female-biased (Fig. 6A). This bias may be a result of differential hormonal milieu in males versus females. However, the sex-based disparity in the on-versus-off states of these genes is not absolute, but rather a statistical tendency. In other words, the gene is not switched off in all males and switched on in all females. Instead, the proportion of individuals with the gene switched on significantly differs between sexes; almost all females have it switched on, while about half of the males have it switched off (Fig. 6B). Notably, multiple sex-biased switch-like genes—including SPINT1 and SPINT2³⁴, multiple keratin genes³⁵, and the oxytocin receptor gene^36,37 (OXTR; Fig. 6C)—in the breast tissue are differentially expressed in breast cancers relative to matched non-cancerous tissues. Future investigations could reveal whether the toggling of these genetic switches affects breast cancer risk in females. We caution that sex-biased switch-like expression in the breast may result from differences in cell-type abundance between females and males. Nevertheless, the differential expression of some genes between sexes might developmentally drive such differences in cell-type abundance. In summary, our results indicate that sex is a major contributor to bimodal gene expression, with breast tissue standing out as particularly sex-biased in this context.

In addition to sex, other factors may influence the coordinated expression of switch-like genes within tissues. To explore this, we tested for correlations between the expression levels of tissue-specific switch-like genes and two factors: body mass index (BMI) and age. Using an FDR threshold of 5% and |$r$| > 0.3, we assessed whether gene expression levels were associated with either variable. (Supplementary Data 7). We observed no significant correlation between body mass index (BMI) and the expression of these genes. However, the expression levels of two genes, CDYL2 and TP53INP2, both switch-like in the uterus, were significantly correlated with age (Fig. 6D). Notably, TP53INP2 is differentially expressed in endometrial glandular cells, residing in the inner lining of the uterus, in women with recurrent implantation failure³⁸. This differential expression suggests TP53INP2’s involvement in implantation. Since endometrial aging is linked to decreased fertility³⁹, these findings indicate that age-driven shifts in TP53INP2 expression may contribute to the uterine environment becoming less receptive to implantation with increasing age.

Regardless of whether it is associated with sex or age, the aforementioned intra-tissue co-expression of switch-like genes implies an upstream regulatory mechanism shared among switch-like genes in a given tissue. We hypothesize that these genes are regulated by a master regulator, likely a hormone or an external environmental factor. If regulated hormonally, multiple genes may be switched off simultaneously when systemic hormone levels drop below a critical threshold. Such master regulators could coordinate gene expression within a given tissue through two potential mechanisms: (1) by simultaneously inducing epigenetic modifications, such as DNA methylation, that synchronously affect gene activity, or (2) by modulating the expression of a transcription factor that subsequently regulates multiple genes. Next, we investigate whether upstream master regulators result in switch-like gene expression by inducing differential epigenetic modification across individuals.

Epigenetic regulation of bimodal gene expression by DNA methylation

DNA methylation at CpG sites, particularly in promoter regions, is a key epigenetic mechanism regulating gene expression. Hypermethylation in a promoter is typically associated with transcriptional repression, while hypomethylation is linked to gene activation^40,41,42 (Fig. 7A). To test whether differential methylation across individuals contributes to switch-like expression, we analyzed methylation at CpG sites in the vicinity of switch-like genes using GTEx data⁴³ (Methods). Our analysis focused on six tissues (prostate, lung, muscle, ovary, colon, and breast) with at least 30 individuals having both gene expression and methylation data. Among 247 tissue-specific switch-like genes (cluster 1) in this analysis, 73 (30%) exhibited a significant negative correlation between expression and methylation (FDR < 5%; |$r$| > 0.5) in the tissue where they were bimodally expressed (Supplementary Data 8). Two out of the 73 genes exhibited a significant correlation between expression and methylation in the prostate (e.g., TBX4; Fig. 7B). Strikingly, the remaining 71 genes were all bimodally expressed in the breast (e.g., KRT6B; Fig. 7C), suggesting that differential methylation across individuals plays a key role in regulating switch-like expression in this tissue.

**Fig. 7: Correlation between expression of switch-like genes and methylation at nearby CpG sites.**

We also tested whether the expression of universally switch-like genes (cluster 2 A) correlated with methylation at CpG sites in their vicinity using the same thresholds (FDR < 5%; |$r$| > 0.5; Supplementary Data 9). If methylation contributes to their bimodal expression across multiple tissues, we expect to see significant correlations between expression and methylation in multiple tissues. We observed this pattern for 9 out of 32 cluster-2A genes. Notably, five (GSTM1, USP32P1, USP32P2, FAM106A, KANSL1-AS1) of these nine genes are linked to copy number variation, likely driving both expression and methylation differences across individuals. This result is expected given how methylation levels (β-values) are measured⁴⁴. β is defined as the hybridization intensity of the probe for the unmethylated CpG site divided by the total intensity of both probes (methylated and unmethylated). In individuals homozygous for a deletion affecting both a gene and a nearby unmethylated CpG site, the hybridization intensities of both probes are near background levels. Consequently, the β-value is intermediate, and the expression is low. In individuals without the deletion, the CpG site is intact, and the hybridization signal for the unmethylated probe is much stronger than the background signal of the methylated probe, yielding a low β-value coupled with high expression. As a result, expression levels and methylation show a negative correlation (Fig. 7D). Indeed, the cluster-2A gene GSTM1 and the nearby CpG site cg16180556, both affected by a polymorphic deletion, show this pattern in all six tissues analyzed (Fig. 7E). Importantly, we observe these patterns despite that the methylation data had already been normalized using the normal-exponential-out-of-band (Noob) method to correct for background fluorescence. We emphasize that such cases are not instances of epigenetic silencing by methylation; instead, they are examples of copy number variation affecting both expression and methylation measurements.

In contrast, the remaining four genes unlinked to copy number variation represent cases of genuine epigenetic regulation of transcription, specifically through genetic variants that influence methylation^45,46. For example, we identify an intronic variant (rs393329) in NPIPB2 that functions as both an expression quantitative trait locus (eQTL) and a methylation quantitative trait locus (mQTL; Fig. 7F). Across all tissues studied, this variant consistently represses NPIPB2 expression while simultaneously increasing methylation at multiple CpG sites in the vicinity of NPIPB2. One of these CpG sites is cg16495772, located 593 bases upstream of the variant. The increased methylation at this site likely facilitates gene silencing by recruiting transcriptional repressors or other regulatory machinery. Such epigenetic modifications are genetically codified and thus transmitted faithfully from one generation to another. These findings illustrate how common genetic variation can drive universally bimodal gene expression by orchestrating methylation differences between individuals.

Concordantly switched-off genes in females result in vaginal atrophy

We found that among switch-like genes in the vagina, genes linked to vaginal atrophy in postmenopausal women are significantly overrepresented (86-fold enrichment; $p$ < 10⁻⁴; “see Methods”). Vaginal atrophy, affecting nearly half of postmenopausal women, is triggered by sustained low levels of systemic estrogen and is marked by increased microbial diversity, higher pH, and thinning of the epithelial layer in the vagina^47,48. It is also known as atrophic vaginitis, vulvovaginal atrophy, estrogen-deficient vaginitis, urogenital atrophy, or genitourinary syndrome of menopause, depending on the specialty of the researchers. Symptoms experienced by women include dryness, soreness, burning, decreased arousal, pain during intercourse, and incontinence⁴⁹. Our analysis of switch-like genes in the vagina provides insights into the development of vaginal atrophy.

Specifically, we overlapped a previously published list⁵⁰ of the top 40 differentially expressed genes in vaginal atrophy with our list of bimodally expressed genes in the vagina. We found that the genes ALOX12, DSG1, KRT1, SPINK7, CRISP3, SBSN, and CRCT1 are both bimodally expressed in the vagina (Fig. 8A) and among the top transcriptionally downregulated genes in women with vaginal atrophy. We refer to these genes as “atrophy-linked switch-like genes.” Indeed, these seven genes are either all switched on, or all switched off concordantly in 73% (113/156) of the vaginal samples we studied. The pairwise concordance rates (percentage of individuals with both genes switched on or both genes switched off) for these genes are shown in Fig. 8B. Among postmenopausal women with this concordant gene expression, 50% are in the “off” state – a fraction that closely matches the prevalence of vaginal atrophy in postmenopausal women^47,51. Therefore, our data suggest that estrogen-dependent transcription underlies concordant expression of atrophy-linked switch-like genes, with the “off” state of these genes associated with vaginal atrophy.

**Fig. 8: Atrophy-linked genes being switched off in the vagina are associated with vaginal atrophy.**

For background, the vaginal epithelial layers are differentiated from the inside out. The basal and parabasal layers of the epithelium consist of mitotic progenitor cells with differentiation potential, while the outermost layer comprises the most differentiated cells^52,53. When basal and parabasal cells stop proliferating, the death of mature cells leads to a thin epithelium, causing vaginal atrophy. Given this background, atrophy-linked switch-like genes may either be a cause or a consequence of vaginal atrophy. In particular, if an atrophy-linked switch-like gene encodes a protein necessary for the continued proliferation and differentiation of basal and parabasal cells, we call it a “driver” gene. In the absence of the driver gene’s protein, cell differentiation ceases, and the outer layer gradually disappears, resulting in vaginal atrophy (Fig. 8C). On the other hand, if the product of an atrophy-linked switch-like gene is not required for basal and parabasal cell proliferation, we refer to it as a “passenger” gene, borrowing the terminology from cancer literature⁵⁴. In healthy vaginas with a thick epithelium, there are more cells in which passenger genes would be expressed. By contrast, in atrophic vaginas, the epithelium thins, resulting in fewer cells where these genes can be expressed. This contrast would lead to the bimodal expression of passenger genes across vagina samples in whole-tissue RNA-sequencing datasets. We hypothesize that at least some of the atrophy-linked switch-like genes are driver genes.

Two key findings allowed us to construct this hypothesis. Firstly, switch-like genes in the vagina show an 8-fold ontological enrichment for the skin development (FDR = 0.016) and a 13-fold enrichment for keratinocyte differentiation (FDR = 0.013), both related to epithelial thickness and differentiation. Notably, two atrophy-linked switch-like genes in the vagina that we identified, DSG1 and KRT1, are involved in the differentiation of epithelial cells in the vagina^55,56. Protein stainings available through Human Protein Atlas⁵⁷ show that all seven atrophy-linked switch-like genes are expressed at the protein level, predominantly in the vaginal epithelium. Secondly, administering 17β-estradiol (a type of estrogen) to postmenopausal women with vaginal atrophy leads to the upregulation of five (ALOX12, DSG1, KRT1, SPINK7, and CRISP3) of these seven genes, causing symptoms to subside⁵⁸. According to our hypothesis, administering estrogen activates the expression of the driver switch-like genes in the vagina, resuming the proliferation of basal and parabasal cells in the epithelium. This process leads to the reformation of a thick and healthy vaginal mucosa, thereby alleviating the symptoms of vaginal atrophy.

Thus, it is essential to distinguish driver genes from passenger genes to understand the etiology of vaginal atrophy. However, we expect driver and passenger genes to show the same expression patterns in healthy versus atrophic vaginas using bulk RNA-sequencing data. In order to make this distinction, we need comparative expression data, specifically from the basal and parabasal epithelium from healthy versus atrophic vaginas. We expect driver genes to be differentially expressed in the basal and parabasal layers of the epithelium. By contrast, we expect passenger genes to show no differential expression in the basal and parabasal layers between healthy and atrophic vaginas.

To look at the expression levels in the basal and parabasal layers of the epithelium, we arbitrarily chose ALOX12 from the seven atrophy-linked switch-like genes for immunohistochemical staining of its protein product in the vaginal mucosa (which includes the epithelium and the underlying connective tissue). We found that the ALOX12 protein is present in the epithelial cells, and its abundance directly correlates with epithelial thickness, as expected from our transcriptomic analysis. However, we found no significant difference in the staining of the ALOX12 protein in the basal or parabasal epithelial layers between healthy and atrophic samples (Fig. 8D). This suggests that the gene is not differentially expressed in the basal or parabasal layers of the vaginal epithelium between healthy and atrophic vaginas. Therefore, ALOX12 is a passenger gene for vaginal atrophy. Comparative immunohistochemical staining of the protein product of the other switch-like genes in the vagina, especially the remaining six atrophy-linked switch-like genes, may identify the driver gene in the future. Indeed, the KRT1 protein is recognized as a marker of basal cell differentiation in mouse vaginas⁵⁹, a finding that may also be true for humans. Overall, our results open up several paths for potential pre-menopausal risk assessment and intervention frameworks targeting cell differentiation pathways in the clinical setting.

Discussion

In this study, we investigated factors underlying switch-like gene expression and its functional consequences by integrating genomic, methylomic, and transcriptomic data from 943 individuals. Our systematic analysis revealed 473 switch-like genes, with 6.8% of them bimodally expressed in all tissues, suggesting a genetic basis, while the majority display tissue-specific bimodality. Notably, tissue-specific switch-like genes tend to be concordantly switched on or off within individuals, implicating tissue-specific master regulators such as hormonal signals. Our methylation analysis suggests that hormonally driven epigenetic modification is a key source for achieving breast-specific bimodal expression. In contrast, genetically driven epigenetic silencing leads to universally bimodal expression for genes unlinked to structural variation. We also found that a drop in systemic estrogen levels in the vagina triggers the coordinated silencing of seven vagina-specific switch-like genes, resulting in vaginal atrophy. More broadly, our findings connect switch-like genes to cancers, infections, endocrine disorders, and skin-related conditions. Furthermore, this study has paved two major paths forward toward early medical interventions, as we discuss below.

First, we emphasize that bimodal expression that is correlated across all tissues is driven by genetic polymorphisms. However, the genetic bases for 13/23 universally switch-like genes remain elusive. We propose that the underlying genetic bases for these universally switch-like genes are structural variants, which are not easily captured by short-read DNA sequencing. These structural variants may be discovered in the future as population-level long-read sequencing becomes more common. The first biomedical path forward is to use long-read DNA sequencing to pinpoint the genetic polymorphisms responsible for the bimodal expression of disease-related genes. Of particular interest are the genes CYP4F24P and GPX1P1, both long non-coding RNAs, which are implicated in nasopharyngeal cancer. While we have identified the genetic basis for the bimodal expression of CYP4F24P, that of GPX1P1 remains unknown. CYP4F24P is significantly downregulated in nasopharyngeal cancer tissues⁶⁰, while GPX1P1 is significantly upregulated in nasopharyngeal carcinomas treated with the potential anticancer drug THZ1⁶¹. Investigating whether individuals with naturally switched-off GPX1P1 and CYP4F24P are at a higher risk of nasopharyngeal cancer will enable genotyping to identify individuals at elevated risk for nasopharyngeal cancer, facilitating early interventions and improving patient outcomes.

Secondly, switch-like genes present a promising avenue for exploring gene-environment interactions, an area of growing interest. Recent studies indicate that environmental factors can significantly modulate genetic associations^62,63. Polymorphisms that result in switch-like gene expression have already been linked to several diseases within specific environmental contexts⁶⁴. For instance, the deletion of GSTM1 has been associated with an increased risk of childhood asthma, but only in cases where the mother smoked during pregnancy⁶⁵. Even more critically, switch-like genes can create different cellular environments depending on whether they are switched on or off. These environments may, in turn, influence how genetic variants affect disease risk. Thus, much like current gene-environment association studies that control for factors such as birthplace, geography, and behaviors like smoking, it is conceivable that controlling for switch-like gene expression states could enhance the power of such studies. By cataloging these switch-like genes and developing a framework to classify them as “on” or “off” in various samples, our work lays the groundwork for more robust association studies in future research.

While our findings regarding switch-like genes provide exciting topics for exploration, we caution that we base our results on expression at the RNA level. The bimodal expression of genes across individuals at the RNA level may not necessarily lead to bimodal expression at the protein level. For example, the universally switch-like expression of RPS26 at the RNA level can be explained by a single-nucleotide variant (rs1131017) in the gene’s 5’-untranslated region (UTR). In particular, RPS26 has three transcription states based on the SNV genotypes. The ancestral homozygote C/C corresponds to a high transcription state, the heterozygote C/G to a medium state, and the derived homozygote G/G to a low state. Remarkably, this pattern is reversed at the translation level⁶⁶: Messenger RNA carrying the derived G allele produces significantly more protein. This reversal may be due to a SNV in the 5’-UTR that can abolish a translation-initiation codon⁶⁷. This finding demonstrates how the same SNV can regulate a gene’s expression level in opposite directions during transcription and translation. This multi-level regulation in opposite directions likely serves to dampen protein expression variability. It has been shown previously that RNA variability is greater than protein variability in primates^68,69; the presence of dampening variants discussed here may be one reason behind these findings. Such compensatory mechanisms for gene expression remain fascinating areas for future research.

In summary, our study has significant implications for understanding the fundamental biology of gene expression regulation and the biomedical impact of switch-like genes. Specifically, it contributes to the growing repertoire of methods for determining individual susceptibility to diseases, facilitating early therapeutic interventions. By providing an approach to studying gene expression states, our study will enhance the predictive accuracy of disease susceptibility and improve patient outcomes.

Methods

Ethics statement

This study complies with all relevant ethical regulations. We obtained the RNA-seq data from GTEx, which obtained institutional review board approval for all collected samples. All GTEx samples from the original study were collected with informed consent (from next-of-kin) under protocols approved by the GTEx consortium, as detailed in the original project documentation. Vaginal tissue sections reused in this work were collected and stained in a previous study⁷⁰ under approval from the Regional Ethical Review Board, Uppsala, Sweden (record-/diary-number 2008/076). No new human samples were obtained for the present study; only previously approved, de-identified material was reanalyzed. The research conformed to the criteria set by the Declaration of Helsinki.

Data

The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. The data we use are transcripts per million (TPM) obtained from human samples across 54 tissues and 56,200 genes (as of December 1st, 2023). We excluded laboratory-grown cell lines from our analysis. Since we need a reasonable number of individuals from each tissue, we excluded tissues with fewer than 50 individuals for our calculations. Of the remaining tissues, there were instances of multiple tissues from the same organ. In such cases, we arbitrarily chose one tissue per organ. We thus focus our analysis on 27 tissues (Fig. 1). Additionally, we retained only those genes for which the mean TPM across individuals was greater than 10 in at least one of the 27 focal tissues. This filter was applied because the analysis of lowly expressed genes may lead to false-positive calls for bimodal expression and, as a result, to assigning biological significance to cases where there is none. After these filtering steps, we are left with TPM data from 19,121 genes in each of the 27 tissues. We will refer to this set of 19,121 genes as G in our equations and the rest of the methods. We note that each tissue contains data from a different number of samples (individuals), totaling 943 across tissues: 312 females and 631 males.

We accessed GTEx protected data from dbGaP (phs000424.v10.p2) and the anvil platform (https://anvil.terra.bio/#workspaces/anvil-datastorage/AnVIL_GTEx_V9_hg38). For DNA methylation (DNAm) data, we used the file GTEx_v9_EPIC_data_noob_final_BMIQ_all_tissues_987.txt. This data, containing CpG site IDs and beta values for methylation, had undergone normal-exponential out-of-band normalization to correct for background fluorescence noise. Additionally, it had also undergone beta mixture quantile normalization to correct for the bias between Type I and Type II probes on Illumina methylation arrays.

Notations

We denote by ${S}_{i}$ the number of samples (individuals) available for tissue $i$. We also denote by ${x}_{g,i,s}$ the TPM value for gene g in tissue i, for sample $s$ ∈ {1, …, ${S}_{i}$} and $g$ $\in$ $G$. According to convention, we log-transform the TPM, specifically by log(${x}_{g,i,s}$ + 1)⁷¹ to suppress the effect of outliers; TPM is extremely large for some samples. Note that log(${x}_{g,i,s}$ + 1) conveniently maps ${x}_{g,i,s}\,$= 0 to 0.

Correcting expression levels for confounders

To account for confounding variables, we adopted an approach wherein we use confounders to predict expression values via regression and then treat the residuals as the corrected expression. Regression models may perform poorly when predictors are highly correlated (see Supplementary Note 1 and Supplementary Fig. 1), a phenomenon known as multicollinearity. Using principal component analysis (PCA) to transform a set of intercorrelated confounders into an orthogonal set of principal components (PCs) ensures that the predictors used in the regression model are uncorrelated⁷². This orthogonality leads to more stable coefficient estimates, which in turn supports a more reliable calculation of the fitted values and thus the residuals. We combined up to six confounder vectors (corresponding to DTHHRDY, SMRIN, SMTSISCH, SMTSPAX, SMNABTCHT, and COHORT; the number varying by tissue) to form a confounder matrix for each tissue. COHORT and DTHHRDY are characteristics of individuals as opposed to that of tissue-individual pairs. Thus, for a given individual s, the values of COHORT and DTHHRDY were identical for every tissue-individual pair ($i,s$). Since only 14 individuals (12 males and 2 females) belonged to the surgical cohort, we removed them from all downstream analysis to prevent complications in confounder-correction. Then, we performed PCA on these matrices. The obtained PC score PC_k,i,s represents the value of the k^th PC in tissue $i$ for individual $s$. We then fitted the following generalized additive model (GAM) to regress out the effects of the confounders from the expression data:

$$\log \left({x}_{g,i,s}+1\right)={\alpha }_{g,i}+\mathop{\sum }\limits_{k=1}^{{K}_{i}}{f}_{k,g,i}\left({\mathrm{PC}}_{k,i,s}\right)+{\varepsilon }_{g,i,s},$$

(1)

where ${\alpha }_{g,i}$ is the intercept term for gene g in tissue i representing the expected log-expression when the effects of all confounders are zero; ${f}_{k,g,i}$(⋅) is a smooth spline function for gene g in tissue $i$ applied to k^th principal component; K_i is the number of confounders considered for tissue i. ${\varepsilon }_{g,i,s}$ represents the random error term for gene $g$ in tissue $i$ for sample $s$. The fitted values provided by the model are

$$\overline{\log \left({x}_{g,i,s}+1\right)}={\hat{\alpha} }_{g,i}+\mathop{\sum }\limits_{k=1}^{{K}_{i}}{\hat{f}}_{k,g,i}\left({{{\rm{PC}}}}_{k,i,s}\right).$$

(2)

We define the residual as

$${\hat{\varepsilon }}_{g,i,s}=\log \left({x}_{g,i,s}+1\right)-\overline{\log \left({x}_{g,i,s}+1\right)}.$$

(3)

Here, ${\hat{\varepsilon }}_{g,i,s}$ represents the confounder-corrected expression value for gene $g$ in tissue $i$ for individual $s$; that is, it reflects the component of the log-transformed expression that remains after removing the systematic effects of the technical confounders.

Dip test

We sought to determine whether a given gene–tissue pair exhibits bimodal expression across individuals. Although several tests of bimodality exist^16,73, we employed the dip test via the R function dip.test() from the “diptest” package. Under the null hypothesis of unimodality, a well-calibrated dip test should yield $p$-values that are uniformly distributed over the interval [0,1].

To assess whether the test is well-calibrated, we examined a set of housekeeping genes, which are expected to exhibit unimodal expression, in each tissue. For housekeeping gene $g$ in tissue $i$, we applied the dip test to the distribution of confounder-corrected expression values ${\hat{\varepsilon }}_{g,i,s}$ across the ${S}_{i}$ samples. We observed that the resulting $p$-values were heavily skewed toward 1, indicating that the dip test is more conservative than implied by the raw $p$-values—a finding also confirmed using simulated unimodal distributions (Supplementary Note 2; Supplementary Fig. 2). To address this bias, we constructed an empirical null mapping from raw to calibrated $p$-values (calibration curve) by rank ordering the raw housekeeping-gene $p$-values (Supplementary Fig. 2C). We denote by $n$ the number of housekeeping-gene $p$-values. We sorted the $n$ $p$-values in ascending order and assigned the $k$^th smallest $p$-value a recalibrated value of $k$/$n$. We applied this calibration curve to recalibrate the raw $p$-values for all dip tests performed in this study before multiple-hypothesis correction, ensuring that the reported false discovery rates are accurate.

Specifically, we applied a two-round dip test procedure to identify gene-tissue pairs exhibiting bimodal expression across individuals: one round on raw expression data and another on confounder-corrected expression data. This two-round approach was carried out to explicitly gauge the impact of technical confounders on bimodality detection (Supplementary Note 3; and Supplementary Fig. 3), rather than simply relying on corrected data for analysis.

In the first round, for each pair of gene g and tissue $i$, we carried out a dip test on the distribution of log(${x}_{g,i,s}$ + 1) across the ${S}_{i}$ samples with 10,000 bootstrap samples. The resulting $p$-values were recalibrated using the empirical mapping derived from the housekeeping genes. Following recalibration, we applied the Benjamini–Hochberg procedure at a 5% false discovery rate and imposed an effect-size threshold on the dip statistic $D$ to reduce false positives. We set this threshold based on the housekeeping null: specifically, we chose the threshold as the mean $D$ plus three standard deviations from the dip test on housekeeping gene-tissue pairs, which yielded 0.033. This threshold ensures that the effect size is greater than that expected under unimodality. In addition, to penalize gene–tissue pairs with low mean expression, we defined our threshold as

$$D > {\max \left(0.033,\frac{0.033}{\log \left({\bar{x}}_{g,i}+1\right)}\right)},$$

(4)

where

$${\bar{x}}_{g,i}=\frac{1}{{S}_{i}}\mathop{\sum }\limits_{s=1}^{{S}_{i}}{x}_{g,i,s}.$$

(5)

We applied this penalization to eliminate cases of noise-driven bimodality since, in cases with low mean expression levels, noise tends to drown out the true biological signal.

In the second round, we reassessed these candidate genes using the confounder-corrected expression values ${\hat{\varepsilon }}_{g,i,s}$ in place of the raw log(${x}_{g,i,s}\,$+ 1) data. In this round, we applied the same dip-test procedure with 10,000 bootstrap iterations, followed by empirical $p$-value recalibration and Benjamini–Hochberg correction (5% FDR). However, because the residuals ${\hat{\varepsilon }}_{g,i,s}$ are centered around zero, the additional penalization for effect size applied in the first round of dip test was not appropriate. Therefore, we used a simplified criterion of $D$ > 0.033. The genes that were thus identified as having bimodal distribution in at least one tissue are referred to as “switch-like genes”. We note that, to display gene expression levels in the figures, instead of using ${\hat{\varepsilon }}_{g,i,s}$, we use:

$${\log \left({x}_{g,i,s}+1\right)}_{{\mathrm{C}}}=\left\{\begin{array}{c}{\hat{\varepsilon }}_{g,i,s}+{\bar{x}}_{g,i},\quad{\hat{\varepsilon }}_{g,i,s}+{\bar{x}}_{g,i} > 0,\\ 0,\quad\quad {\mbox{otherwise.}}\end{array}\right.$$

(6)

The quantity on the left-hand side of this equation is denoted as log(TPM + 1)_C in the figures. Here the subscript ${{\rm{C}}}$ stands for “corrected.” We perform this transformation to ensure that 1) the average corrected levels roughly match the average raw expression levels, and 2) the corrected expression levels are not negative. The values of log(TPM + 1)_C for all 473 switch-like genes in all tissue-individual pairs used in this study are available in Supplementary Data 10.

Functional enrichment among switch-like genes

We performed two enrichment analyses to investigate the biological roles and disease associations of switch-like genes. For Gene Ontology (GO) enrichment, we used the Genomic Regions Enrichment of Annotations Tool (GREAT)¹⁹ to evaluate overrepresentation of various biological processes among switch-like genes. GREAT provides both effect sizes and $q$-values for enriched GO terms.

For disease enrichment, we leveraged the Disease Ontology resource²⁰ using the R package “ontologyIndex” to parse the disease ontology file. This file organizes diseases into a hierarchical structure with top-level (primary) categories and their corresponding daughter (secondary) categories. Our analysis focused on diseases in the seven top-level categories: (1) Disease by infectious agent; (2) Disease of anatomical entity; (3) Disease of cellular proliferation; (4) Disease of mental health; (5) Disease of metabolism; (6) Genetic disorder; and (7) Physical disorder. Each disease was further assigned to a secondary category based on its hierarchical structure (i.e., the daughter terms of the top-level categories). We then determined, for each secondary category, the number of genes in the complete set G and the number of switch-like genes associated with that category.

Fold enrichment for each secondary disease category among switch-like genes was calculated as the ratio of the proportion of switch-like genes (319 of the 473 switch-like genes were represented in this dataset) linked to that category to the corresponding proportion among all genes in $G$ (18,342 out of the 19,121 genes in $G$ were present in this dataset). To assess statistical significance, we generated 10,000 random gene sets of size 319 (which is the number of switch-like genes represented in the disease dataset) from 18,342 and computed the fold enrichment for each set. The $p$-value for a given secondary category was defined as the fraction of these random sets for which the fold enrichment was equal to or exceeded the value observed among the switch-like genes. Finally, we applied Benjamini–Hochberg correction to control the false discovery rate at 5%, and retained only those secondary categories for which at least 10 switch-like genes were linked.

Tissue-to-tissue co-expression of genes

We sought to identify switch-like genes exhibiting bimodal expression in all tissues. One seemingly straightforward approach is to count the number of tissues showing a bimodal distribution of expression levels for each gene. However, even if a gene genuinely exhibits bimodal expression across all tissues, our methodology may fail to recognize it as such if the mean expression levels (${\bar{x}}_{g,i}$) of the gene are low in some tissues. This is because our effect size threshold in the first round of dip tests penalizes gene-tissue pairs with low ${\bar{x}}_{g,i}$. Moreover, if gene expression follows a bimodal distribution across all tissues, then it does so likely due to a genetic polymorphism affecting expression. Thus, the expression of such genes would be highly correlated between tissues. Given this insight, discovering universally bimodal genes is more tractable using tissue-to-tissue co-expression of each gene.

For each gene $g$, we construct the co-expression matrix ${{{\bf{R}}}}_{g}$, wherein each element is the Pearson's correlation coefficient of the expression levels of the gene between two tissues. To calculate the co-expression between a pair of tissues, we need to use the samples for which the TPM is measured for both tissues⁷⁴. In general, even if the number of samples is large for both tissues, it does not imply that there are sufficiently many common samples. Therefore, using the sample information described in GTEx_Analysis_v8_Annotations_SampleAttributesDD.xlsx in the GTEx data portal, we counted the number of samples shared by each tissue pair and excluded the 41 tissue pairs that share fewer than 40 samples. We calculated the co-expression for each of the remaining 27 × 26/2 − 41 = 310 tissue pairs. We denote by ${S}_{i,j}$ the number of samples shared by the two tissues $i$ and $j$. Then, we calculated the Pearson correlation coefficient between ${\hat{\varepsilon }}_{g,i,s}$ and ${\hat{\varepsilon }}_{g,j,s}$ across the ${S}_{i,j}$ samples. Specifically, we calculate

$${r}_{g}\left(i,j\right)=\frac{{\sum }_{s=1}^{{S}_{i,j}}\left[{\hat{\varepsilon }}_{g,i,s}-{m}_{g,i}\right]\left[{\hat{\varepsilon }}_{g,j,s}-{m}_{g,j}\right]}{\sqrt{{\sum }_{s=1}^{{S}_{i,j}}{\left[{\hat{\varepsilon }}_{g,i,s}-{m}_{g,i}\right]}^{2}{\sum }_{s=1}^{{S}_{i,j}}{\left[{\hat{\varepsilon }}_{g,j,s}-{m}_{g,j}\right]}^{2}}},$$

(7)

where

$${m}_{g,i}=\frac{1}{{S}_{i,j}}\mathop{\sum }\limits_{s=1}^{{S}_{i,j}}{\hat{\varepsilon }}_{g,i,s} $$

(8)

and

$${m}_{g,j}=\frac{1}{{S}_{i,j}}\mathop{\sum }\limits_{s=1}^{{S}_{i,j}}{\hat{\varepsilon }}_{g,j,s}.$$

(9)

We note that both ${m}_{g,i}$ and ${m}_{g,j}$ are close to 0 since they represent means of residuals from the generalized additive models. For each gene g, we define matrix ${{{\bf{R}}}}_{g}$ as a 27 × 27 tissue-to-tissue co-expression matrix with the $(i,j)$-th element equal to ${r}_{g}\left(i,j\right)$ if ${S}_{i,j}$ ≥ 40; otherwise, that entry is recorded as missing (NA). We then vectorize the correlation matrix, ${{{\bf{R}}}}_{g}$, into the vector ${{{\bf{r}}}}_{g}$. Since the correlation matrix is symmetric, we only vectorize the strict upper-triangular part of ${{{\bf{R}}}}_{g}$. This region contains all unique off-diagonal correlations (351 in total). 41 tissue pairs have ${S}_{i,j}$ < 40, so their entries are discarded. Thus, ${{{\bf{r}}}}_{g}$ is a 310-dimensional vector. A gene, g, was removed from the downstream analysis if ${\hat{\varepsilon }}_{g,i,s}$ or ${\hat{\varepsilon }}_{g,j,s}$ was 0 across all ${S}_{i,j}$ samples for any of the 310 tissue pairs. In this process, one out of 473 switch-like genes was removed. We ran a PCA, using the prcomp() function in R, on vectors ${{{\bf{r}}}}_{g}$, for all genes for which we could calculate ${r}_{g}\left(i,j\right)$ for all 310 tissue pairs. In parallel, we ran PCA on only the set of vectors (genes) characterizing only the 472 ( = 473 − 1) switch-like genes.

In the space spanned by the first two principal components, we calculated the pairwise distance between genes using the dist() function in R with method = “euclidean”. We then performed hierarchical clustering using the hclust() function with method = “complete”. Finally, we used the cuttree() function with $k$=2 and $k$=3 to obtain two and three clusters, respectively.

Identifying the genetic basis of universal bimodality

To identify the genetic basis of bimodality for switch-like genes in cluster 2 A, we obtained the coordinates of the genes for both hg19 and hg38 using their Ensembl IDs as keys through Ensembl BioMart. We obtained coordinates of common structural variants using both the 1000 genomes project (hg19)⁷⁵ and the HGSV2 dataset (hg38)⁷⁶. We performed an overlap analysis using BedTools⁷⁷ to identify polymorphic deletions or insertions into these genes. Additionally, we obtained the GTEx dataset for the expression quantitative trait loci (eQTL). We identified genes in cluster 2 A that had at least one eQTL, which was consistently associated with either increased or decreased expression of a given gene across all 27 tissues analyzed.

Gene-to-gene co-expression within tissues

We performed gene-to-gene co-expression analysis within the breast, colon, intestine, kidney and vagina tissues. In a given tissue $i$, we denote the set of cluster-1 (tissue-specific switch-like) genes by ${C}_{i}$. Then, for $i$ ∈ {breast, colon, intestine, kidney, vagina}, we calculated the Pearson correlation, across the ${S}_{i}$ samples, between ${\hat{\varepsilon }}_{g,i,s}$ and ${\hat{\varepsilon }}_{h,i,s}$ for every $g$, $h$ ∈ ${C}_{i}$ where $g$ $\ne$ $h$.

Quantifying the effect of sex, age, and BMI on cluster-1 gene expression

For every gene-tissue pair ($g,i$), where $g$ is a switch-like gene, and $i$ is a tissue common to both sexes, we tested the hypothesis that the distribution of ${\hat{\varepsilon }}_{g,i,s}$ across male samples differed from that across female samples using the Wilcoxon rank-sum test. We applied the Benjamini-Hochberg procedure of multiple hypotheses correction with FDR = 5%. We quantified the effect size of the sex bias using Cohen’s d, using the threshold ${|d|}$ > 0.2⁷⁸. Additionally, for every gene-tissue pair ($g,i$), where $g$ is a switch-like gene, and $i$ can be any tissue, we tested the two hypotheses that ${\hat{\varepsilon }}_{g,i,s}$ is correlated with age and that ${\hat{\varepsilon }}_{g,i,s}$ is correlated with BMI. We applied the Benjamini-Hochberg procedure of multiple hypotheses correction with FDR = 5% to results from both analyses separately. We classified gene-tissue pairs as correlated with age or BMI using thresholds FDR < 5% and |$r$| > 0.3, where $r$ is the Pearson correlation coefficient.

Analysis of methylation levels

Methylation data were annotated using the IlluminaHumanMethylationEPICanno.ilm10b4.hg19 library in R⁷⁹. We retrieved CpG annotation information by invoking the getAnnotation() function, which provided genomic coordinates, probe IDs, and regulatory feature information for each CpG site for which we had methylation data available from GTEx. The extracted annotation included fields such as Name, chromosome (chr), genomic position (pos), rsID of SNPs at the CpG site (CpG_rs), UCSC_RefGene_Name, UCSC_RefGene_Group, and various enhancer and regulatory feature annotations.

We then performed correlation analyses between switch-like gene expression and methylation levels within a 50 kb window surrounding each gene, separately for cluster-1 and cluster-2 genes. Our analysis focused on six tissues (prostate, lung, muscle, ovary, colon, and breast) with at least 30 individuals having both gene expression and methylation data. For each analysis, we computed the Pearson correlation coefficient between the gene’s confounder-corrected expression values and the methylation levels at nearby CpG sites. We applied stringent thresholds (FDR < 5% and |$r$| > 0.5). For the analysis of cluster-2A genes in particular, we further required that a gene show a significant correlation with methylation in at least two tissues before concluding that methylation contributes to its multi-tissue bimodality.

Enrichment of switch-like genes among disease-linked genes

We performed enrichment analysis to test whether the top 40 genes differentially expressed in vaginal atrophy are overrepresented among cluster-1 vagina-specific switch-like genes. We denote the set of genes differentially expressed in vaginal atrophy as $V$ and set of cluster-1 switch-like genes bimodally expressed in the vagina as $C$_vagina. We calculated the fold enrichment of genes downregulated in vaginal atrophy among cluster-1 genes in the vagina by $\frac{|{C}_{{{\rm{vagina}}}}\, \cap \, {V|}/|{C}_{{{\rm{vagina}}}}|}{{|G}\, \cap \,{V|}/{|G|}}$.

To calculate the $p$-values associated with these enrichments, we obtained 10,000 uniformly random samples (with replacement) of size |$C$_vagina| from $G$. The $p$-value for the enrichment of genes linked to vaginal atrophy among cluster-1 switch-like genes in the vagina is then given by the fraction of random samples among the 10,000 samples for which $|{q}_{j}\cap {V|}\ge |{C}_{{\mathrm{vagina}}}\cap {V|}$. Here, ${q}_{j}$ is the set of genes in random sample $j$ where $j$ ∈ {1, …, 10000}.

Discretizing expression levels

We performed kernel density estimation using the density() function in R on the distributions of $\log (1+{x}_{g,{\mathrm{vagina}},s})$ across the ${S}_{{\mathrm{vagina}}}$ samples for $g$ ∈ ${C}_{{\mathrm{vagina}}}$ $\cap$ $V$.

We used the minimum of the estimated density as the switching threshold; if an individual had an expression level above the threshold in a given tissue, the gene was considered “on” in the individual in that tissue. The gene was considered “off” otherwise. We then calculate the concordance of expression among genes in the set ${C}_{{\mathrm{vagina}}}$ $\cap$ $V$ in the vagina as follows:

$$\frac{1}{S_{\rm{vagina}}} \displaystyle\sum_{s=1}^{S_{\mathrm{vagina}}} {\left[\displaystyle\prod_{g \, \in \, C_{\mathrm{vagina}} \cap V} {{\bf{1}}}_{\left(g\,{\rm{is}}\; {\mbox{`}}{\mbox{`}}{\rm{on}}{\mbox{'}}{\mbox{'}}\; {\rm{in}}\, {\rm{sample}}\, {\rm{s}}\, {\rm{in}}\, {\rm{vagina}}\right)}+\prod _{g\,\in \,{C}_{\rm{vagina}}\cap V}{{\bf{1}}}_{(g\,{\rm{is}} \; {\mbox{`}}{\mbox{`}}{\rm{off}}{\mbox{'}}{\mbox{'}}\; {\rm{in}}\, {\rm{sample}}\, {\rm{s}}\, {\rm{in}} \, {\rm{vagina}})}\right]},$$

where 1_(⋅) is the indicator function.

Immunohistochemistry

Vaginal biopsies, taken by use of punch biopsies from postmenopausal women, were fixed and stained by use of anti-ALOX12 antibodies^70,80. This polyclonal rabbit anti-ALOX12 antibody (catalog no. HPA010691; Prestige Antibodies® powered by Atlas Antibodies, Sigma-Aldrich) was diluted 1:50 in 0.1% bovine serum albumin. Sections were incubated overnight at 4 °C. Negative controls, prepared by omitting the primary antibody, were included for all samples.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Bulk RNA-seq expression data (TPM, GTEx v8) is available from the GTEx open-access portal (https://gtexportal.org/home/downloads/adult-gtex/bulk_tissue_expression). Controlled-access GTEx data, including full metadata, can be requested through dbGaP accession phs000424.v10.p2 at [https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000424.v10.p2]. DNA-methylation data were obtained from GTEx_v9_EPIC_data_noob_final_BMIQ_all_tissues_987.txt in the AnVIL workspace AnVIL_GTEx_V9_hg38 [https://anvil.terra.bio/#workspaces/anvil-datastorage/AnVIL_GTEx_V9_hg38]. All results generated for this study are provided as Supplementary Data 1–10 and archived on Figshare [https://doi.org/10.6084/m9.figshare.28848356], where they are freely available without restriction⁸¹. Source data are provided with this paper.

Code availability

The code used to generate the results in this study is publicly accessible on GitHub [https://github.com/AlberAqil/Switch_like_gene_expression_modulates_disease_risk_2025/]. A permanent version of all code is deposited on Zenodo [https://doi.org/10.5281/zenodo.15270596]⁸².

References

Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).
Article CAS PubMed Google Scholar
Banerji, J., Rusconi, S. & Schaffner, W. Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. Cell 27, 299–308 (1981).
Article CAS PubMed Google Scholar
Gillies, S. D., Morrison, S. L., Oi, V. T. & Tonegawa, S. A tissue-specific transcription enhancer element is located in the major intron of a rearranged immunoglobulin heavy chain gene. Cell 33, 717–728 (1983).
Article CAS PubMed Google Scholar
Serfling, E., Jasin, M. & Schaffner, W. Enhancers and eukaryotic gene transcription. Trends Genet 1, 224–230 (1985).
Article CAS Google Scholar
Allfrey, V. G., Faulkner, R. & Mirsky, A. E. Acetylation and methylation of histones and their possible role in the regulation of RNA Synthesis. Proc. Natl. Acad. Sci. USA 51, 786–794 (1964).
Article CAS PubMed PubMed Central ADS Google Scholar
Riggs, A. D. & Jones, P. A. 5-methylcytosine, gene regulation, and cancer. Adv. Cancer Res. 40, 1–30 (1983).
Article CAS PubMed Google Scholar
Bestor, T., Laudano, A., Mattaliano, R. & Ingram, V. Cloning and sequencing of a cDNA encoding DNA methyltransferase of mouse cells. The carboxyl-terminal domain of the mammalian enzymes is related to bacterial restriction methyltransferases. J. Mol. Biol. 203, 971–983 (1988).
Article CAS PubMed Google Scholar
Jenuwein, T. & Allis, C. D. Translating the histone code. Science 293, 1074–1080 (2001).
Article CAS PubMed Google Scholar
Levine, M. & Tjian, R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003).
Article CAS PubMed ADS Google Scholar
Muntoni, F., Torelli, S. & Ferlini, A. Dystrophin and mutations: one gene, several proteins, multiple phenotypes. Lancet Neurol. 2, 731–740 (2003).
Article CAS PubMed Google Scholar
Sakai, T. et al. Allele-specific hypermethylation of the retinoblastoma tumor-suppressor gene. Am. J. Hum. Genet. 48, 880–888 (1991).
CAS PubMed PubMed Central Google Scholar
Cutting, G. R. Cystic fibrosis genetics: from molecular understanding to clinical application. Nat. Rev. Genet. 16, 45–56 (2015).
Article CAS PubMed Google Scholar
Ertel, A. Bimodal gene expression and biomarker discovery. Cancer Inf. 9, 11–14 (2010).
ADS Google Scholar
Bessarabova, M. et al. Bimodal gene expression patterns in breast cancer. BMC Genomics 11, S8 (2010).
Article PubMed PubMed Central Google Scholar
Justino, J. R. et al. An integrated approach to identify bimodal genes associated with prognosis in câncer. Genet. Mol. Biol. 44, e20210109 (2021).
Article CAS PubMed PubMed Central Google Scholar
Moody, L., Mantha, S., Chen, H. & Pan, Y.-X. Computational methods to identify bimodal gene expression and facilitate personalized treatment in cancer patients. J. Biomed. Inform. 100S, 100001 (2019).
Article PubMed Google Scholar
Mason, C. C. et al. Bimodal distribution of RNA expression levels in human skeletal muscle tissue. BMC Genomics 12, 98 (2011).
Article CAS PubMed PubMed Central Google Scholar
Hartigan, J. A. & Hartigan, P. M. The dip test of unimodality. Ann. Stat. 13, 70–84 (1985).
Article MathSciNet MATH Google Scholar
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Article CAS PubMed PubMed Central Google Scholar
Schriml, L. M. et al. Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res 47, D955–D962 (2019).
Article CAS PubMed Google Scholar
Jangravi, Z. et al. A fresh look at the male-specific region of the human Y chromosome. J. Proteome Res. 12, 6–22 (2013).
Article CAS PubMed Google Scholar
Cheung, S., Parrella, A., Rosenwaks, Z. & Palermo, G. D. Genetic and epigenetic profiling of the infertile male. PLoS One 14, e0214275 (2019).
Article CAS PubMed PubMed Central Google Scholar
Turjya, R. R., Khan, M. A.-A. -K. & Mir Md Khademul Islam, A. B. Perversely expressed long noncoding RNAs can alter host response and viral proliferation in SARS-CoV-2 infection. Future Virol. 15, 577–593 (2020).
Article CAS PubMed PubMed Central Google Scholar
Talotta, R., Bahrami, S. & Laska, M. J. Sequence complementarity between human noncoding RNAs and SARS-CoV-2 genes: What are the implications for human health?. Biochim. Biophys. Acta Mol. Basis Dis. 1868, 166291 (2022).
Article CAS PubMed Google Scholar
Arman, K., Dalloul, Z. & Bozgeyik, E. Emerging role of microRNAs and long non-coding RNAs in COVID-19 with implications to therapeutics. Gene 861, 147232 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chia, D. J. Minireview: mechanisms of growth hormone-mediated gene regulation. Mol. Endocrinol. 28, 1012–1025 (2014).
Article PubMed PubMed Central Google Scholar
Mayne, B. T. et al. Large scale gene expression meta-analysis reveals tissue-specific, sex-biased gene expression in humans. Front. Genet. 7, 183 (2016).
Article PubMed PubMed Central Google Scholar
McEwen, B. S. & Milner, T. A. Understanding the broad influence of sex hormones and sex differences in the brain. J. Neurosci. Res. 95, 24–39 (2017).
Article CAS PubMed PubMed Central Google Scholar
Goel, N., Workman, J. L., Lee, T. T., Innala, L. & Viau, V. Sex differences in the HPA axis. Compr. Physiol. 4, 1121–1155 (2014).
Article PubMed Google Scholar
Fink, A. L. & Klein, S. L. The evolution of greater humoral immunity in females than males: implications for vaccine efficacy. Curr. Opin. Physiol. 6, 16–20 (2018).
Article PubMed PubMed Central Google Scholar
Laffont, S. & Guéry, J.-C. Deconstructing the sex bias in allergy and autoimmunity: from sex hormones and beyond. Adv. Immunol. 142, 35–64 (2019).
Article CAS PubMed Google Scholar
Carrel, L. & Willard, H. F. X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 434, 400–404 (2005).
Article CAS PubMed ADS Google Scholar
Tukiainen, T. et al. Landscape of X chromosome inactivation across human tissues. Nature 550, 244–248 (2017).
Article PubMed PubMed Central ADS Google Scholar
Wu, Q. et al. Comprehensive analysis of the expression and prognostic value of SPINT1/2 in breast carcinoma. Front. Endocrinol. 12, 665666 (2021).
Article Google Scholar
Takan, I., Karakülah, G., Louka, A. & Pavlopoulou, A. In the light of evolution:” Keratins as exceptional tumor biomarkers. PeerJ 11, e15099 (2023).
Article PubMed PubMed Central Google Scholar
Behtaji, S. et al. Identification of oxytocin-related lncRNAs and assessment of their expression in breast cancer. Sci. Rep. 11, 6471 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Fiaz, T. et al. Peripheral mRNA expression and prognostic significance of emotional stress biomarkers in metastatic breast cancer patients. Int. J. Mol. Sci. 23, 14097 (2022).
Zhang, H., Zhang, C. & Zhang, S. Single-cell RNA transcriptome of the human endometrium reveals epithelial characterizations associated with recurrent implantation failure. Adv. Biol. (Weinh.) 8, e2300110 (2024).
Article PubMed Google Scholar
Pathare, A. D. S. et al. Endometrial receptivity in women of advanced age: an underrated factor in infertility. Hum. Reprod. Update 29, 773–793 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yeivin, A. & Razin, A. Gene methylation patterns and expression. EXS 64, 523–568 (1993).
CAS PubMed Google Scholar
Naveh-Many, T. & Cedar, H. Active gene sequences are undermethylated. Proc. Natl. Acad. Sci. USA 78, 4246–4250 (1981).
Article CAS PubMed PubMed Central ADS Google Scholar
Bird, A. P. CpG-rich islands and the function of DNA methylation. Nature 321, 209–213 (1986).
Article CAS PubMed ADS Google Scholar
Oliva, M. et al. DNA methylation QTL mapping across diverse human tissues provides molecular links between genetic variation and complex traits. Nat. Genet. 55, 112–122 (2023).
Article CAS PubMed Google Scholar
Zhou, W., Triche, T., Laird, P. & Shen, H. SeSAMe: reducing artifactual detection of DNA methylation by Infinium BeadChips in genomic deletions. Nucleic Acids Res 46, e123 (2018).
PubMed PubMed Central Google Scholar
Hawe, J. S. et al. Genetic variation influencing DNA methylation provides insights into molecular mechanisms regulating genomic function. Nat. Genet. 54, 18–29 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chundru, V. K. et al. Rare genetic variants underlie outlying levels of DNA methylation and gene-expression. Hum. Mol. Genet. 32, 1912–1921 (2023).
Article CAS PubMed PubMed Central Google Scholar
Goldstein, I., Dicks, B., Kim, N. N. & Hartzell, R. Multidisciplinary overview of vaginal atrophy and associated genitourinary symptoms in postmenopausal women. Sex. Med. Today 1, 44–53 (2013).
Article Google Scholar
Szymański, J. K., Słabuszewska-Jóźwiak, A. & Jakiel, G. Vaginal aging—what we know and what we do not know. Int. J. Environ. Res. Public Health 18, 4935 (2021).
Article PubMed PubMed Central Google Scholar
Kim, H.-K., Kang, S.-Y., Chung, Y.-J., Kim, J.-H. & Kim, M.-R. The recent review of the genitourinary syndrome of menopause. J. Menopausa Med 21, 65–71 (2015).
Article CAS Google Scholar
Hummelen, R. et al. Vaginal microbiome and epithelial gene array in post-menopausal women with moderate to severe dryness. PLoS One 6, e26602 (2011).
Article CAS PubMed PubMed Central ADS Google Scholar
Faubion, S. S., Sood, R. & Kapoor, E. Genitourinary syndrome of menopause: management strategies for the clinician. Mayo Clin. Proc. 92, 1842–1849 (2017).
Article PubMed Google Scholar
Buchanan, D. L. et al. Role of stromal and epithelial estrogen receptors in vaginal epithelial proliferation, stratification, and cornification. Endocrinology 139, 4345–4352 (1998).
Article CAS PubMed Google Scholar
Anderson, D. J., Marathe, J. & Pudney, J. The structure of the human vaginal stratum corneum and its role in immune defense. Am. J. Reprod. Immunol. 71, 618–623 (2014).
Article CAS PubMed PubMed Central Google Scholar
Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 (2007).
Article CAS PubMed PubMed Central ADS Google Scholar
Ho, M. et al. Update of the keratin gene family: evolution, tissue-specific expression patterns, and relevance to clinical disorders. Hum. Genomics 16, 1 (2022).
Article CAS PubMed PubMed Central Google Scholar
Szegedi, A. et al. Protein kinase C isoenzymes differentially regulate the differentiation-dependent expression of adhesion molecules in human epidermal keratinocytes. Exp. Dermatol. 18, 122–129 (2009).
Article CAS PubMed Google Scholar
Pontén, F., Jirström, K. & Uhlen, M. The human protein Atlas-a tool for pathology. J. Pathol. 216, 387–393 (2008).
Article PubMed Google Scholar
Cotreau, M. M. et al. A study of 17β-estradiol-regulated genes in the vagina of postmenopausal women with vaginal atrophy. Maturitas 58, 366–376 (2007).
Article CAS PubMed Google Scholar
Miyagawa, S. & Iguchi, T. Epithelial estrogen receptor 1 intrinsically mediates squamous differentiation in the mouse vagina. Proc. Natl. Acad. Sci. USA. 112, 12986–12991 (2015).
Article CAS PubMed PubMed Central ADS Google Scholar
Zhang, X. et al. Identification of key pseudogenes in nasopharyngeal carcinoma based on RNA-Seq analysis. BMC Cancer 21, 483 (2021).
Article PubMed PubMed Central Google Scholar
Gao, L. et al. Gene expression profile of THZ1-treated nasopharyngeal carcinoma cell lines indicates its involvement in the inhibition of the cell cycle. Transl. Cancer Res. 10, 445–460 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dudbridge, F. & Fletcher, O. Gene-environment dependence creates spurious gene-environment interaction. Am. J. Hum. Genet. 95, 301–307 (2014).
Article CAS PubMed PubMed Central Google Scholar
Vetr, N. G., Gay, N. R., MoTrPAC Study Group & Montgomery, S. B. The impact of exercise on gene regulation in association with complex trait genetics. Nat. Commun. 15, 3346 (2024).
Article CAS PubMed PubMed Central ADS Google Scholar
Thomas, D. Gene–Environment-wide association studies: emerging approaches. Nat. Rev. Genet. 11, 259–272 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gilliland, F. D. et al. Effects of glutathione S-transferase M1, maternal smoking during pregnancy, and environmental tobacco smoke on asthma and wheezing in children. Am. J. Respir. Crit. Care Med. 166, 457–463 (2002).
Article PubMed Google Scholar
Li, Q. et al. Genome-wide search for exonic variants affecting translational efficiency. Nat. Commun. 4, 2260 (2013).
Article PubMed ADS Google Scholar
Liu, L. et al. Mutation of the CDKN2A 5’ UTR creates an aberrant initiation codon and predisposes to melanoma. Nat. Genet. 21, 128–132 (1999).
Article PubMed Google Scholar
Khan, Z. et al. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342, 1100–1104 (2013).
Article CAS PubMed PubMed Central ADS Google Scholar
Wang, S. H., Hsiao, C. J., Khan, Z. & Pritchard, J. K. Post-translational buffering leads to convergent protein expression levels between primates. Genome Biol. 19, 83 (2018).
Article PubMed PubMed Central Google Scholar
Kallak, T. K. et al. Vaginal gene expression during treatment with aromatase inhibitors. Clin. Breast Cancer 15, 527–535.e2 (2015).
Article CAS PubMed Google Scholar
Shalek, A. K. et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498, 236–240 (2013).
Article CAS PubMed PubMed Central ADS Google Scholar
Perez, L. V. et al. Principal Component Analysis To Address Multicollinearity. Whitman College: Walla Walla, WA, USA (2017).
Hellwig, B. et al. Comparison of scores for bimodality of gene expression distributions and genome-wide evaluation of the prognostic relevance of high-scoring genes. BMC Bioinforma. 11, 276 (2010).
Article Google Scholar
Dobrin, R. et al. Multi-tissue coexpression networks reveal unexpected subnetworks associated with disease. Genome Biol. 10, R55 (2009).
Article PubMed PubMed Central Google Scholar
1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cohen, J. et al. Statistical Power Analysis for the Behavioral Sciences. (Routledge, London, England, 1988).
Hansen, K. D. IlluminaHumanMethylationEPICanno.ilm10b4.hg19: annotation for Illumina’s EPIC methylation arrays. R package version 0.6.0, https://bitbucket.com/kasperdanielhansen/Illumina_EPIC (2017).
Kallak, T. K. et al. Aromatase inhibitors affect vaginal proliferation and steroid hormone receptors. Menopause 21, 383–390 (2014).
Article PubMed Google Scholar
Aqil, A. Switch-like genes modulate disease risk (supplementary data), FigShare. https://doi.org/10.6084/m9.figshare.28848356 (2025).
Aqil, A. Switch-like gene expression modulates disease risk (code), Zenodo. https://doi.org/10.5281/zenodo.15270596 (2025).

Download references

Acknowledgements

We thank Dr Vincent Lynch, Dr Eric Strobel, and Dr Charikleia Karageorgiou for carefully reading this manuscript. We also thank Hiba Hassan and Fatima Azeem for their help in creating schematics and selecting reader-friendly color schemes. O.G. and N.M. acknowledge support from the National Institute of General Medical Sciences (under grant no. 1R01GM148973-01). O.G. also acknowledges support from the National Science Foundation (under grant nos. 2049947 and 2123284) and National Institute of General Medical Sciences (R35-GM156519). N.M. also acknowledges support from the Japan Science and Technology Agency (JST) Moonshot R&D (under grant no JPMJMS2021), the National Science Foundation (under grant no. 2052720), and JSPS KAKENHI (under grant no. 24K14840). The funders had no role in study design, data collection and analysis, publication decision, or manuscript preparation.

Author information

Authors and Affiliations

Department of Biological Sciences, State University of New York at Buffalo, Buffalo, NY, USA
Alber Aqil & Omer Gokcumen
Department of Mathematics, State University of New York at Buffalo, Buffalo, NY, USA
Yanyan Li, Zhiliang Wang, Madison Russell & Naoki Masuda
Institute for Artificial Intelligence and Data Science, State University of New York at Buffalo, Buffalo, NY, USA
Saiful Islam & Naoki Masuda
Department of Women’s and Children’s Health, Uppsala University, Uppsala, Sweden
Theodora Kunovac Kallak
Faculty of Biosciences, Norwegian University of Life Sciences, Aas, Norway
Marie Saitou

Authors

Alber Aqil
View author publications
Search author on:PubMed Google Scholar
Yanyan Li
View author publications
Search author on:PubMed Google Scholar
Zhiliang Wang
View author publications
Search author on:PubMed Google Scholar
Saiful Islam
View author publications
Search author on:PubMed Google Scholar
Madison Russell
View author publications
Search author on:PubMed Google Scholar
Theodora Kunovac Kallak
View author publications
Search author on:PubMed Google Scholar
Marie Saitou
View author publications
Search author on:PubMed Google Scholar
Omer Gokcumen
View author publications
Search author on:PubMed Google Scholar
Naoki Masuda
View author publications
Search author on:PubMed Google Scholar

Contributions

O.G. and N.M. co-supervised the project. M.S., O.G., and N.M. conceived the project. Y.L. performed the initial computations with help from Z.W., S.I., M.R., and N.M. A.A. created the figures and performed the final computational analysis. T.K.K. performed the protein staining experiments on vaginal tissues. AA, OG, and NM wrote the manuscript.

Corresponding authors

Correspondence to Omer Gokcumen or Naoki Masuda.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Natalie Telis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Supplementary Data 10

Reporting Summary

Transparent Peer Review file

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Aqil, A., Li, Y., Wang, Z. et al. Switch-like gene expression modulates disease risk. Nat Commun 16, 5323 (2025). https://doi.org/10.1038/s41467-025-60513-x

Download citation

Received: 25 August 2024
Accepted: 22 May 2025
Published: 18 June 2025
DOI: https://doi.org/10.1038/s41467-025-60513-x