Introduction

Comorbidity and multimorbidity, defined as the presence of more than one medical condition in individuals1,2, have been investigated using cohort, case-control, and nested case-control study designs. These studies have shown, for example, that patients with type II diabetes (T2D) exhibit a higher prevalence of dementia and cancer, with men showing a particularly elevated risk of Alzheimer’s disease3,4. The development of digital medical record systems5 and the implementation of the Veterans Health Information System and Technology Architecture6 enabled large-scale collection of clinical information in the form of electronic health records (EHR). Since then, multiple studies have systematically analyzed comorbidity relations and constructed networks based on epidemiological evidence. For example, Hidalgo et al. generated a comorbidity network using Medicare data from more than 30 million patients in the United States7. They found insightful sex-specific differences in comorbidity patterns—for example, a higher risk of nephropathies in women and acute myocardial infarction in men with T2D. Similarly, Jensen et al. used Danish EHRs to construct temporal disease trajectories by linking statistically significant disease co-occurrences, thereby identifying key conditions that drive disease progression and increase mortality8. In the same population, Westergaard et al. reported that women exhibit a greater number of comorbidities than men, as well as more frequent disease co-occurrences over longer time spans9. Together, these findings indicate that men and women differ substantially in their comorbidity profiles, underscoring the importance of understanding these differences from a molecular perspective.

Since the publication of the first human disease network in 2007 by Goh et al., in which diseases were connected if they shared at least one altered gene (the diseasome)10, numerous studies have explored molecular similarities between diseases, including efforts to better understand comorbidities. Lee et al. demonstrated that diseases involving coupled metabolic reactions co-occur three to seven times more frequently than those without such connections11. Likewise, studies measuring distances between disease modules in the protein interactome have shown that diseases with overlapping modules tend to co-occur more often than expected by chance12,13. Transcriptomic similarities between diseases have also been found to reflect epidemiologically observed co-occurrences14,15. More recently, Dong et al. integrated EHR and genome-wide association studies (GWAS) data from the UK Biobank to recapitulate 46% of observed multimorbidities16, while Murrin et al. found that most pairs of chronic conditions with shared genetic features co-occur in the primary care setting17.

We have previously highlighted the lack of studies examining biological differences between women and men to better understand sex-specific comorbidity patterns18. This gap is notable given that 37% of all genes exhibit sex-biased expression in at least one tissue19. Liu et al. found sex-specific disease-associated polymorphisms in GWAS20, and Lopes-Ramos et al. constructed sample-specific gene regulatory networks from healthy human tissues to reveal that many transcription factors have sex-dependent regulatory targets. Interestingly, these differentially targeted genes are enriched for tissue-related functions and diseases. For example, genes associated with Alzheimer’s disease are regulated by distinct transcription factors depending on the sex of the sample21.

To address the knowledge gap in the molecular bases of sex-specific comorbidities, we have generated disease transcriptomic similarity networks separately for men and women. To maintain consistent terminology, we use the gender terms women and men to refer to the binary categories in both transcriptomic data (females/males) and epidemiological data (women/men), and refer to them collectively as sexes without implying alignment between sex and gender. The resulting networks recover a representative set of comorbidities previously described for women and men9. By analyzing pathways altered in the same direction in comorbid diseases, we propose hypotheses explaining differences in disease co-occurrence between sexes. Moreover, we find that disease pairs may co-occur more frequently than expected by chance in both sexes, but through distinct biological processes. Finally, we extend these findings to potentially related drugs, emphasizing the scientific and clinical relevance of studying sex-specific molecular differences in disease and comorbidity. In summary, this study provides molecular hypotheses to explain sex differences in comorbidity relationships and explores the potential roles of drugs within these relationships.

Methods

Gene expression analysis

Raw gene expression data were obtained from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) and ArrayExpress repositories (https://www.ebi.ac.uk/arrayexpress). Studies conducted using the Affymetrix HG U133Plus 2 microarray platform were selected for their cost-effectiveness, reproducibility, and translational potential compared to other assays. This selection also ensured sufficiently large sample sizes for robust disease-disease association analyses, while minimizing biases from platform heterogeneity. After excluding low-quality samples (GNUSE median values >1.2522), we retained 128 diseases with at least three case and three control samples (Supplementary Data 1). Because sex information was available for only 52.23% (6685/12,797) of samples, we used the massiR package to infer sample sex, correctly recovering sex in 94.24% of annotated samples23. This method classifies samples as male or female by analyzing the expression levels of probes corresponding to Y-chromosome genes. To harmonize our transcriptomic similarity networks with previously published epidemiological networks9, disease names were mapped to three-digit codes of the World Health Organization’s International Classification of Diseases, version 10 (ICD-10), grouping specific conditions under each code. This resulted in 76 ICD-10 codes in women (2465 cases and 1370 controls) and 77 in men (3200 cases and 1871 controls), with 59 codes common to both sexes (see Supplementary Fig. 1). Overall, women represented 43% of cases and 42% of controls, with case samples comprising ~63–64% of total samples in both sexes. Lowly expressed genes (detection p-value < 0.05) were identified and removed using the MAS 5.0 algorithm24. Background correction, normalization, and summarization were performed using the frozen Robust Multiarray Analysis (fRMA) preprocessing algorithm with default parameters25. Differential expression analyses comparing case (disease) versus control (disease-free) samples were conducted using the modeling framework implemented in the LIMMA package26. Analyses were performed both separately by sex and jointly across all samples, adjusting for potential confounders such as study of origin and sex. Genes with a false discovery rate (FDR) ≤0.05 and log fold change (logFC) <0 or >0 were classified as significantly down- or up-regulated, respectively. All R packages used for analysis and visualization are listed at https://github.com/jonsv89/SHDC27.

Gene set enrichment analysis

Functional enrichment was performed using gene set enrichment analysis (GSEA)28, applied to the full list of genes ranked by logFC from the differential expression results. Gene sets from Reactome, Gene Ontology (GO), and KEGG databases were used for GSEA. Disease clustering was based specifically on Reactome pathways, whose hierarchical organization enabled the identification of 29 lowest-level pathway categories. Pairwise Euclidean distances were computed between diseases based on their Reactome pathway enrichment profiles (using normalized enrichment scores from GSEA). Hierarchical clustering was performed using the Ward2 linkage method29, and significant clusters (p value ≤0.05) were identified via bootstrap resampling using the pvclust R package30. Resulting dendrograms for women and men were compared using tanglegrams generated with the dendextend R package31.

Network construction

Transcriptional similarities between diseases were calculated using three gene sets: (i) all annotated genes, (ii) the union of genes with significant differential expression (sDEGs), and (iii) the intersection of sDEGs, following previous studies on the molecular bases of comorbidities14,15 (see Fig. 1). Six similarity metrics were computed: Pearson and Spearman correlation coefficients, cosine similarity, and Euclidean, Canberra, and Manhattan distances. Empirical p values were obtained from 10,000 random permutations for cosine, Euclidean, Canberra, and Manhattan metrics, with Bonferroni correction applied; similarities with FDR ≤0.05 were considered significant. For distance metrics (Euclidean, Canberra, Manhattan) observed distances were compared to the mean of random distances, yielding positive (greater) or negative (lesser) similarity values relative to expectation (see Supplementary Fig. 2). Similarity values were then binarized: coefficients >0 were set to +1, and those <0 to −1. The resulting disease transcriptomic similarity networks (DTSNs) generated from the different metrics were largely consistent, particularly those based on Pearson, Spearman, cosine, and Euclidean measures (see Supplementary Fig. 3). For clarity, results shown in the main text correspond to Euclidean-based DTSNs, which recovered the highest number of comorbidity relationships (see Supplementary Table 1). As the number of sDEGs strongly influences similarity detection (correlations = 0.75–0.86 for sDEG sets vs. 0.34 using all genes; see Supplementary Fig. 4), analyses were focused on the complete list of genes. All DTSNs are available at the disease-perception portal (https://disease-perception.bsc.es/shdc/). The network backbone was extracted following the method described by ref. 32.

Fig. 1: Study roadmap.
figure 1

Schematic representation of the main questions raised in the study. Squares, diamonds, triangles and circles denote diseases. Arrows indicate a higher-than-expected risk of developing a disease when suffering from a different one.

Overlap with epidemiology

To evaluate the extent to which DTSNs capture known comorbidity patterns, we used the epidemiological network published by ref. 9 (Supplementary Note 1). Overlap analyses were conducted on the shared set of diseases present in both transcriptomic and epidemiological networks. Positive and negative transcriptomic similarities were compared separately with epidemiological associations. All comparisons were stratified by sex (women-women, men-men). Statistical significance was assessed using Fisher’s exact test and a randomization approach, generating 10,000 degree-preserving random networks to establish null distributions.

Disease–drug associations

To explore potential sex-specific drug influences on comorbidities, drug target data were retrieved from the DrugBank33. Because the number of direct targets per drug is relatively small for enrichment analyses, we expanded target sets using the experimentally validated human protein-protein interaction network from IID34. First-neighbor interactors of each drug’s primary targets were included to increase pathway coverage. We then performed GSEA28 to associate drugs targeting proteins encoded by up- or down-regulated genes with corresponding diseases, analyzed separately by sex. Disease–drug associations were obtained from the SIDER database, which compiles drug indications and adverse reactions extracted from the package inserts using natural language processing35. Disease names were converted to ICD-10 codes via the Unified Medical Language System36, and DrugBank identifiers were mapped to standardized drug names.

Statistics and reproducibility

The five sections, “Gene expression analysis”, “Gene set enrichment analysis”, “Network construction”, “Overlap with epidemiology”, and “Disease–drug associations” describe how the statistical analyses of the data were conducted.

Ethics

All analyses were performed on publicly available, de-identified gene expression datasets obtained from the Gene Expression Omnibus (GEO) and ArrayExpress repositories. According to GEO submission requirements, data submitters must ensure that deposited datasets comply with institutional and national ethical regulations, including that the information “does not compromise participant privacy and is in accord with the original consent,” and that non-NIH-funded studies have “appropriate consent/permission to submit the data to a public database.” Similarly, ArrayExpress accepts high-throughput functional genomics data only when accompanied by appropriate study metadata, sample annotations, and protocols, in accordance with community guidelines for ethically conducted studies.

Ethical approval and informed consent for each original dataset were therefore obtained by the investigators who generated the data, as documented in the corresponding primary publications. No new data were collected for the present study, and all datasets were fully de-identified prior to public release. Because this work involves secondary analysis of publicly available, non-identifiable data, it does not constitute human subjects research, and no additional IRB/Ethics approval was required at our institution.

Results

Sex-associated differences in gene expression

To obtain an initial overview of disease-related differences between women and men, we performed hierarchical clustering on the normalized enrichment score of their enriched pathways based on GSEA (see methods). On average, clusters were larger and more heterogeneous in women than in men (66% of clusters in women and 25% in men included diseases from different categories, see Fig. 2). Overall, the overlap between men and women clusters was limited. Parkinson’s and Alzheimer’s diseases (G20 and G30), which have been previously linked at the molecular level37, clustered significantly in men. These two diseases shared 112 pathways enriched in differentially expressed genes, primarily related to signal transduction, the immune system, and neuronal function. In contrast, in women, Alzheimer’s disease clustered with schizophrenia (F20), sharing 116 enriched pathways. These included metabolic pathways (glucose metabolism and respiratory electron transport)38,39, protein metabolism (protein ubiquitination)40,41, and neuronal system pathways (such as the serotonin neurotransmitter release cycle and GABA synthesis, release, reuptake, and degradation)42,43, all previously associated with both diseases. Interestingly, dementia (including Alzheimer’s disease) has been reported to co-occur significantly in patients with schizophrenia, with a relative risk (RR) of 2.29, being the risk higher among women with schizophrenia44.

Fig. 2: Reactome pathways are significantly altered in disease in women and men.
figure 2

The diseases are represented in rows, and the pathways are in columns. The colors of the diseases refer to the category of the ICD-10 to which they belong. The color bar in the columns represents the Reactome parents to which each pathway belongs. The red (blue) lines represent pathways that are under- (over-) expressed in each disease. The diseases have been clustered by calculating Euclidean distances between them and using the ward.D2 method29 on the GSEA’s normalized enrichment scores. The gray lines connect diseases in women and men. Black lines near ICD codes indicate significant clusters after bootstrapping.

Only two pairs of diseases clustered together significantly in both sexes: pancreatic cancer—gastric cancer (C25 and C16) and irritable bowel syndrome (IBS)—ulcerative colitis (K58 and K51). Although both tumors clustered together in men and women, the dominant molecular similarities differed: cell cycle-related processes predominated in men, while immune system-related processes predominated in women. Notably, interleukin signaling pathways were overactivated in women, supporting their potential as candidates for developing immunomodulatory therapeutic strategies45. Considerable sex-specific divergence in tumor immune responses has been documented, which may have implications for sex differences in immunotherapy efficacy46. For the two digestive system diseases (K58 and K51), these also clustered with oral cavity cancer (C14) in women, a known comorbidity. The risk of oral cavity cancer is higher in women (standardized incidence ratio of 12.07 in women vs. 8.49 in men)47. The increasing prevalence of HPV may further amplify this risk in IBS patients, potentially enhanced by immunosuppressive therapies48, underscoring the influence of treatments in shaping comorbidity relationships. In this women-specific cluster, 37 pathways were significantly altered—28 upregulated (mainly signal transduction pathways) and nine downregulated (primarily metabolic pathways). Among them, NOTCH4 signaling was overactivated, a pathway proposed as an oral cancer marker49. NOTCH4 expression is induced in macrophages following activation by Toll-like receptors (TLRs) and interferon-γ (IFN-γ), both extensively studied in the context of IBS50. Together, these findings reveal substantial molecular differences between women and men across diseases from distinct categories, raising the question of whether such molecular disparities contribute to differences in comorbidity patterns.

Disease transcriptomic similarity networks

After observing sex-specific differences in disease clustering based on pathway enrichment, we constructed disease transcriptomic similarity networks (DTSNs) to evaluate pairwise disease relationships. Similarities between disease pairs were calculated adjusting for sex differences, and separately for women (wDTSN) and men (mDTSN, see Methods).

Between 45 and 48% of all edges in the wDTSN and mDTSN were positive, indicating smaller-than-expected distances between diseases. In contrast, 25–27% of the edges were negative, reflecting larger-than-expected distances. Consistent with epidemiological observations9, significantly more positive interactions (i.e., direct comorbidities) were detected between diseases belonging to the same category than between those from different categories in both women and men (odds ratio (OR) = 2.81 in women vs. OR = 1.83 in men; Supplementary Table 2). This finding suggests that transcriptional similarity may provide insight into why diseases affecting the same system are more likely to co-occur than those affecting different systems. The ORs increased further when considering only the connections that preserved the most similar comorbidity trajectories—that is, the network backbone32, (4.4 in women and 3.2 in men)—highlighting the stronger relevance of intra-category connections in the flow of disease information. This pattern is also consistent in epidemiology (see Supplementary Note 2). When analyzing specific disease categories within the mDTSN and wDTSN, we found that digestive and musculoskeletal system diseases in men and skin diseases in women each exhibited a clustering coefficient of 1, indicating full interconnection within the category. Mental illnesses followed, with a network density of 0.83 in women compared with 0.4 in men, reflecting the higher comorbidity burden of mental disorders among women51 (see Supplementary Fig. 5). When focusing on diseases common to both sexes, we observed significantly fewer positive interactions in women than in men between digestive system and nervous system diseases (OR = 0.21) and between digestive system and blood diseases (OR = 0.27, see Supplementary Table 3). These findings align with the higher risk of co-occurrence between IBS and neurological disorders such as multiple sclerosis52, Parkinson’s disease53, or dementia54. Notably, no positive interactions were found between IBS and multiple sclerosis in women. Conversely, women displayed significantly more negative interactions between digestive and nervous system diseases (OR = 10.81, see Supplementary Table 4). Comparing the networks generated jointly (both sexes combined) with those generated separately, 16.22 and 13.49% of positive interactions were exclusive to women and men, respectively—proportions roughly consistent with epidemiological estimates (9% in women and 4% in men)9 (see Supplementary Fig. 6A, B). These findings demonstrate that, as in epidemiology, sex-specific transcriptional relationships can be obscured when analyzing data from women and men together.

Biological clues to differences in disease co-occurrences in men and women

Previous studies have demonstrated that transcriptional similarity can significantly recover comorbidity relationships14,15, even across diverse populations, underscoring the robustness of this approach (see Supplementary Note 3). However, as discussed in the Introduction, this phenomenon has not yet been evaluated separately for women and men18. In our analysis, 53.12 and 60.68% of the disease co-occurrences reported in women and men by ref. 9 were also captured in the wDTSN and mDTSN, respectively (see Fig. 3, Supplementary Fig. 6C, and Supplementary Table 1).

Fig. 3: Comorbidities explained by the disease transcriptomic similarity network.
figure 3

Network of comorbidities recovered based on transcriptomic similarities. Each node represents a disease, colored based on the disease category it belongs to (ICD-10). Red, blue, and green edges represent comorbidities (retrieved by analyzing the epidemiology separately for women and men and adjusting for sex) recovered by calculating similarities at the gene expression level between diseases in the same way (separately for women and men and adjusting for sex differences). The dashed edges represent comorbidities described in women and men that are recovered by analyzing transcriptomic similarities separately for women and men. The network’s visualizations have been generated using Cytoscape software112.

The sex-specific DTSNs revealed altered biological processes in comorbid diseases within both the same and different disease categories. For example, schizophrenia (F20) and chronic obstructive pulmonary disease (COPD, J44) were connected in the wDTSN and shared enrichment for mitochondrial processes (mitochondrial translation, mitochondrial RNA metabolic process, and mitochondrial gene expression) and immune-related pathways (interleukin 10 signaling, neutrophil degranulation, macrophage cytokine production, and antimicrobial peptides) (see Supplementary Data 3). Interestingly, patients with schizophrenia have been reported to exhibit an elevated risk of COPD55, even when compared to smoking-matched controls56. Impaired lung function is often observed in schizophrenia57 and shared eQTL variants affecting both lung function and neuropsychiatric traits58 may contribute to this relationship. Clinical data also show sex differences in schizophrenia prevalence, symptomatology, and treatment response59, as well as in comorbidity patterns—for instance, the risk of developing COPD appears to be higher among women60 and is statistically significant in women but not men in the Danish population9. The enrichment of mitochondrial and immune pathways is consistent with reports of stronger mitochondrial signatures in women with schizophrenia61 and COPD62 and with known sex-related immune differences63,64. These findings support the hypothesis that sex-specific molecular variations may contribute to distinct comorbidity patterns between women and men.

A comparable pattern emerged for smoking (F17) and irritable bowel syndrome (IBS, K58), which were linked in the wDTSN through shared enrichment of mitochondrial respiratory chain assembly, electron transport, complex I biogenesis, detoxification of reactive oxygen species, and immune processes such as neutrophil activation and immunoregulatory interactions between lymphoid and non-lymphoid cells. Epidemiological data indicate an elevated risk of IBS among smokers65, particularly in women, consistent with the greater physiological impact of smoking in women66 and the higher prevalence of IBS in women67, possibly influenced by hormonal factors68. Prior molecular studies have independently reported mitochondrial and immune alterations in both conditions69,70,71. Additional disease pairs that co-occur more frequently in women—but not in men—and were uniquely connected in the wDTSN included type I diabetes (T1D, E10) with myocardial infarction (I21)72, bipolar disorder (F31) with uremia (N19), and COPD (J44) with chronic lung allograft dysfunction (B44, see Fig. 3).

In contrast, the mDTSN highlighted a men-specific association between T1D (E10) and liver cancer (C22), correlating with the higher-than-expected risk of developing liver cancer (C22) in T1D patients73. Notably, T1D prevalence is greater in prepubertal girls but becomes approximately twice as common in men after puberty74. In men, most biological processes altered in the same direction in both diseases were associated with metabolism (metabolism of amino acids and derivatives, metabolism of vitamins and cofactors, glutathione metabolic process, reactive oxygen species metabolic process, and response to starvation) and immune regulation (humoral immune response, positive regulation of immune effector process, and regulation of t helper 1 type immune response). Additional examples are detailed in Supplementary Note 4).

Finally, we calculated disease similarities separately within each Reactome parent category (see Supplementary Note 5). Gene expression and immune system pathways recovered the highest proportions of known comorbidities (41–47%), whereas drug ADME pathways recovered the fewest (4–5%) (see Supplementary Table 5). Despite variation in the number of comorbidities captured across categories, all overlaps with epidemiological data were statistically significant (see Supplementary Fig. 7A, B and Supplementary Table 5). Collectively, these findings provide strong evidence for sex-dependent differences in the biological processes underlying diseases and suggest that such differences may contribute to the distinct patterns of disease co-occurrence observed between women and men. Among the most relevant processes are those related to the immune system, metabolism, and mitochondrial function—mechanisms previously reported to drive biological differences between sexes75,76. All generated networks are available for interactive visualization at https://disease-perception.bsc.es/shdc/.

Comorbidities occur through different mechanisms in women and men

After confirming that sex-specific differences in transcriptional similarities between diseases can explain differences in comorbidity patterns, we next investigated whether the underlying mechanisms driving disease co-occurrence might differ between women and men. Although mechanistic sex differences have been described for several biological processes—such as mechanical pain hypersensitivity, which is mediated by distinct immune cells in male (microglia) and female (T-cell) mice77—such distinctions have not been explored in the context of comorbidities. As illustrated in Fig. 4, 29 disease pairs were found to co-occur more often than expected by chance in both women and men (12 of which belong to different disease categories), and these comorbidities were consistently recovered in both the wDTSN and mDTSN. Nevertheless, the associated biological alterations displayed clear sex-specific patterns (see Supplementary Data 4).

Fig. 4: Pathways shared between comorbid diseases in women or men only.
figure 4

Heatmap with disease pairs in rows and Reactome categories in columns representing the percentages of pathways in each category that are up- (blue) or down-regulated (red) in both diseases. The color of the squares following the ICD-10 codes denotes the category of disease to which they belong (ICD-10), highlighting rows with two different colors reflect comorbidities between diseases from different categories. Each square is divided into two triangles; the one on the top (bottom) refers to women (men). For specific pathways enriched in comorbid diseases in women and men, see https://disease-perception.bsc.es/shdc/.

A well-established example is smoking (F17) and COPD (J44)], which co-occur in both women and men78. In women, shared alterations between smokers and COPD patients involved pathways related to the cell cycle, response to stimuli, metabolism, immune system, and developmental biology, whereas these pathways were not observed in men (see Fig. 4). Conversely, in men, shared alterations were primarily found in DNA repair, protein metabolism, and RNA metabolism pathways, which were not observed in women. These differences align with prior studies suggesting that cigarette metabolism may differ between sexes due to variations in cytochrome P450 enzyme expression and activity79, a process found overexpressed in women with smoking exposure and COPD but not in men. Mitochondrial functional pathways were altered in women but not men, potentially representing key drivers of sex-specific lung disease pathophysiology62, whereas processes such as sumoylation of transcription cofactors, processing of capped intron-containing pre-mRNA, and base excision repair were downregulated in men but not women.

Another illustrative example is the significant co-occurrence between pancreatic cancer (C25) and T2D (E11)80. Interestingly, men and women with pancreatic cancer shared a high number of altered pathways in the same direction (Jaccard indices of 0.78 and 0.58 for up- and down-regulated pathways, respectively), whereas the overlaps for T2D were minimal (0.03 and 0, see Supplementary Table 6). These findings indicate that sex differences are more pronounced in T2D than in pancreatic cancer, suggesting that the mechanisms leading to pancreatic cancer may differ between sexes81. In men, T2D and pancreatic cancer shared alterations in extracellular matrix organization pathways—including collagen biosynthesis and modifying enzymes, laminin interactions, and ECM proteoglycans—as well as signal transduction pathways altered in the same direction, none of which were shared in women. Conversely, in women, the two diseases shared immune system pathways (e.g., neutrophil degranulation, antiviral mechanism by IFN-stimulated genes) and metabolic pathways (including sphingolipid de novo biosynthesis and regulation of cholesterol biosynthesis by SREBP (SREBF)). These findings underscore the value of our approach in identifying biological processes that are differentially altered between women and men and may underlie sex-specific patterns of disease development and comorbidity. Nonetheless, substantial knowledge gaps remain regarding the biological differences between the sexes in disease pathogenesis—gaps that must be addressed to fully understand the mechanisms driving sex-specific comorbidity profiles.

Sex differences in drug effects

In previous work, we identified drugs that may influence disease co-occurrence—either by increasing or decreasing comorbidity risk—highlighting this strategy as a potential avenue for drug repositioning82. Building on this, we previously identified patient subgroups in which specific drug associations might contribute to the elevated risk of developing secondary diseases14. In the present study, we investigated how drug-disease associations may differ between women and men, potentially explaining sex-specific comorbidity patterns. Given the scarcity of studies examining sex-specific gene expression changes following drug exposure, we extracted drug targets from DrugBank33, expanded these associations using a protein-protein interaction network34, and performed enrichment analyses28 to identify drugs whose expanded targets were significantly over- or under-expressed across diseases in a sex-dependent manner. In total, 3878 DrugBank IDs were significantly enriched in at least one disease. Of these, 616 were linked through 3997 associations to 568 ICD-10 codes in the SIDER database35. Focusing on diseases with sufficient sample sizes in both sexes, 43 of 59 diseases had at least one enriched drug associated with them in SIDER. Among the top 10 drugs associated with the largest number of diseases in women and men, only three were shared—metformin, clofarabine and irinotecan—agents used to treat diabetes, acute lymphoblastic leukemia, and metastatic cancers. Other highly connected drugs included arginine, bortezomib, and carfilzomib in women, and dexrazoxane, etoposide, and idarubicin in men—all used to treat high blood pressure, cardiomyopathies, or cancer. Seven diseases showed no overlap in drug associations between sexes—Aspergillus colonization of lung allograft, T1D, amyotrophic lateral sclerosis, interstitial lung disease, rosacea, connective tissue disorders, and axial spondyloarthropathy (see Supplementary Fig. 8). In contrast, eight diseases exhibited a Jaccard index >0.5: pancreatic, colon, and kidney cancers; neoplasms of uncertain behavior of lymphoid, hematopoietic and related tissues; oral dysplasia; Job’s syndrome; thalassemia; and IBS. These findings highlight substantial heterogeneity in the extent of drug overlap between sexes across the diseases. Overall, women had more drug associations in 19 diseases, whereas men had more in 20. These differences were not explained by disparities in sample size between sexes (correlation p value = 0.64). For instance, in schizophrenia, 3.13 more drugs were identified in women than in men, despite there being 2.7 more men than women samples. Analysis of drugs enriched across diseases in women and men revealed notable sex-specific differences in drug-disease relationships (see Fig. 5 and Supplementary Fig. 9).

Fig. 5: Sex-specific disease–drug associations.
figure 5

The nodes represent diseases, colored according to the disease category (ICD-10) to which they belong. An edge connects two diseases if the drug indicated to treat one disease (source node) is enriched in the differential expression profile of another disease (sink node). Only associations between comorbid diseases9 from different ICD-10 categories are shown. The name of the drug connecting both diseases is indicated below the edge. Only associations detected exclusively in one sex are shown, with red (blue) edges denoting associations detected only in females (males). Solid (dashed) lines indicate that the normalized enrichment score of the GSEA28 is positive (negative), for all other associations see https://disease-perception.bsc.es/shdc/ and Supplementary Fig. 9. The network’s visualizations have been generated using Cytoscape software112.

Drug-mediated disease associations may arise for several reasons: (i) the two diseases share symptoms, (ii) patients with one disease also have the other, (iii) treatment for one disease contributes to the onset of the other, or (iv) the drug used to treat one disease could be used to treat the other. In the first scenario, IBS (K58) and major depression (F33)—two comorbid conditions with shared pathophysiological mechanisms83—were linked by lubiprostone, a drug used to treat constipation, a common manifestation of both diseases84 (see Fig. 5 and Supplementary Data 5). Lubiprostone was indicated for IBS and was enriched in the differential expression profile of major depression in women but not men. Notably, constipation occurs more frequently in women with depression than in men85. In the second scenario, the association between T2D (E11) and schizophrenia (F20)—where diabetes medications such as metformin and gliclazide were enriched in women with schizophrenia—may suggest that these patients were taking these drugs or had coexisting T2D. This interpretation is supported by the high representation of elderly schizophrenia patients in our dataset, consistent with previous reports that late-life schizophrenia is associated with a higher prevalence in women than in men (35 vs. 21.53%)86. In the third scenario, patients with essential thrombocythemia are known to have an elevated risk of progression to myelofibrosis, particularly when treated with anagrelide87—an effect more pronounced in women. Consistent with this, anagrelide (indicated for essential thrombocythemia (D47)) was enriched in women with myelofibrosis (D75). However, the absence of detailed metadata prevents further confirmation of this hypothesis. In the fourth scenario, patients with type 2 diabetes have been observed to be at greater risk of developing liver cancer, with the risk being higher in men88. One hypothesis to explain this sex difference involves higher circulating levels of adiponectin in women89. Notably, both diseases are linked through metformin, which is indicated for the treatment of type 2 diabetes and whose targets are significantly upregulated in liver cancer. Interestingly, metformin has been described as exerting a protective effect against the development of liver cancer, with this effect being stronger in men90,91,92. A possible explanation for the sex difference in metformin’s protective effect could lie in the hormonal context: higher baseline adiponectin levels in women may already confer protection, whereas in men, metformin partly compensates for their lower adiponectin levels and higher baseline risk. These observations underscore the importance of considering hormonal context when evaluating potential drug repositioning strategies.

Additional pharmacologically supported comorbidity relationships not reported by ref. 9 included the association between T1D and asthma93, where the risk is higher in boys compared to girls94. We identified 15 T1D-enriched drugs indicated for asthma in men—including salbutamol and salmeterol—that were not observed in women. Notably, salbutamol use has been linked to elevated blood glucose levels95, emphasizing the importance of considering both sex and age when prescribing treatment. Similarly, two drugs —risperidone and citalopram—were significantly enriched in women with schizophrenia and have also been associated with Alzheimer’s disease. Risperidone is an atypical antipsychotic used to treat schizophrenia96 and behavioral symptoms in Alzheimer’s disease patients97, whereas citalopram is an antidepressant prescribed for depression and negative symptoms in schizophrenia98 as well as agitation in Alzheimer’s disease99. Notably, as previously mentioned, schizophrenia patients face a higher risk of developing Alzheimer’s disease—particularly among women44—and citalopram response has also been shown to differ by sex100. Together, these findings support the hypothesis that drug-disease associations vary between women and men and may contribute to sex-specific comorbidity patterns. Nevertheless, additional studies are needed to directly investigate molecular-level differences in drug effects as a function of patient sex.

Discussion

Calculating similarities between diseases using molecular data is a well-established approach for understanding disease co-occurrence and for identifying opportunities for drug repositioning13,18,101. Among various molecular data types, transcriptomics has emerged as a particularly promising source, given its strong capacity to reveal biological mechanisms underlying comorbidity relationships. Transcriptomic analyses have proven useful for elucidating both direct and inverse comorbidity relationships, identifying candidate drugs for repurposing, and detecting disease subtype-specific molecular similarities14,15,82,102,103. However, despite the clear physical and physiological differences between women and men—and their evident impact on disease development and comorbidity—sex differences in transcriptional similarities between diseases have not been systematically investigated. Although public transcriptomic databases have historically lacked reliable sex annotations104, advances in computational methods now enable accurate inference of sample sex, making such analyses feasible.

In this study, we generated disease networks separately for women and men and observed that diseases cluster differently by sex based on their differential expression profiles—consistent with previous PheWAS-based findings105. Notably, the clusters in women were more heterogeneous across disease categories, suggesting differences in multimorbidity patterns between the sexes. The relevance of these observations is supported by the strong concordance between sex-specific transcriptional similarities and epidemiologically observed disease co-occurrences. For example, type 1 diabetes and liver cancer co-occur frequently in men, while schizophrenia and COPD do so in women. In both cases, these patterns correlate with sex-specific alterations in metabolic and immune system-related biological processes.

Collectively, our findings highlight the importance of analyzing comorbidity patterns separately in women and men and investigating their underlying molecular mechanisms—an approach that may ultimately inform more effective treatments.

Historically, biomedical research has predominantly focused on men, contributing to diagnostic biases and suboptimal therapeutic strategies for women. Moreover, women remain underrepresented in clinical trials, resulting in poorer therapeutic optimization and health outcomes106. Combined with the limited inclusion of comorbid and multimorbid patients in clinical trials107, this underrepresentation may contribute to future challenges in the safe and effective use of medications. Therefore, incorporating sex-based molecular and comorbidity differences into research and clinical guidelines is essential for developing safer and more personalized medical practices108.

While our study is currently limited by the availability and quality of sex-specific information across public databases, ongoing efforts to improve data annotation and increase the representation of both sexes in transcriptomic studies will progressively strengthen future analyses. Likewise, as more population-level studies systematically investigate comorbidity relationships by sex, it will become increasingly feasible to validate and integrate our findings within DTSNs, enhancing their robustness and translational relevance.

Although a few studies have explored sex-based comorbidity differences7, many of these datasets—such as the extensive US medical claims-based network describing fourfold more comorbidities than ref. 9—are no longer publicly available (see Supplementary Note 5). Furthermore, most transcriptomic studies focus on individual diseases and lack key contextual metadata such as comorbidities, medication use, and patient age—all of which may influence molecular profiles. Other relevant variables, such as ethnicity or socioeconomic status—known to significantly affect disease development and comorbidity patterns109,110 —could not be incorporated due to the absence of such metadata. Additionally, for comparison with epidemiological networks, we standardized disease names to three-digit ICD-10 codes, which was necessary for studying sex-related comorbidity patterns. Consequently, certain disease subtypes—such as lung cancer, non-small cell lung cancer, and basaloid lung cancer—were grouped under the same ICD-10 code (C34), potentially obscuring subtype-specific differences. Future research should therefore aim to investigate sex differences at the level of specific disease subtypes. Finally, greater availability of molecular data would enhance the statistical power of such analyses. The growing number of biobanks integrating molecular profiles with electronic health records may, in the future, enable more comprehensive studies of this kind18.

In summary, this study reinforces the marked differences between the sexes in the development of diseases and comorbidities previously described at the epidemiological level. It also generates multiple molecular hypotheses regarding sex-specific differences in comorbidity relationships, paving the way for future experimental validation. Future work should also explore, from a molecular perspective, how aging influences the development of comorbidities and their sex-specific differences, while integrating additional data sources to refine and expand the map of disease interrelationship generated here.