Introduction

Accounting for sex differences at the molecular level could lead to better personalized disease prediction, diagnosis, and treatment, as well as an improved understanding of the biological mechanisms driving these differences. Mounting evidence suggests differences between males and females in various aspects of health disorders, including the effects of risk factors, prevalence, and disease outcomes1,2,3,4,5,6,7,8,9,10,11,12,13. At the molecular level, sex-differentiated architectures and effects have been investigated at the genomic, transcriptomic, and epigenomic levels, providing insight into sex differences14,15,16,17,18,19,20,21,22,23,24,25.

In recent years, there has been growing evidence of sex differences in protein levels and the differential effects of genetic variants on protein levels26,27,28. For example, a study involving 1,277 European brains indicated differences in protein expression between sexes, as well as genetic variants affecting protein levels differently according to sex28. Another study involving 800 individuals from a Dutch population reported sex-dimorphic genetic regulation of inflammatory proteins, providing broad insights into sex differences in proteo-genetic architecture29. These studies have, however, been limited in terms of the number of traits and proteins studied as well as the number of individuals and ancestries represented.

The objectives of the present study were to identify sex-dimorphic protein quantitative trait loci (SD-pQTLs) and examine their association with sex differences in health disorders. To achieve this, we analyzed 2,922 proteins using data from 30,272 individuals of Caucasian ancestry from the UK Biobank. Next, we replicated the identified SD-pQTLs using five different datasets: (1) 2,886 and (2) 1,394 individuals of Japanese ancestry from the BioBank Japan and the Japan COVID-19 Task Force, respectively, (3) 1,990 individuals of Finnish ancestry from FinnGen, as well as (4) 630 individuals of South Asian ancestry (Indian, Pakistani, and Bangladeshi) and (5) 662 individuals of Black ancestry (African and Caribbean) from the UK Biobank. In addition, we assessed the sex-dimorphic effects of SD-pQTLs on health disorders by conducting sex-stratified GWAS for 30 long-term conditions using 338,568 individuals, distinct from those included in the proteomics analysis.

Results

Sex differences in proteome profiles and regulation

A total of 30,272 individuals were included in the proteome analysis. The mean age ± SD was 57.1 ± 7.9 years. 46.2% (13,974 individuals; mean age ± SD: 57.3 ± 8.1) were males and 53.8% (16,298; 57.0 ± 7.8) were females. Of the 2,922 proteins, 2,249 showed a significant association with sex (false discovery rate (FDR)-corrected p-value, termed q, < 0.05, Supplementary Table 1). To estimate the heritability of proteins in each sex and the genetic correlation of proteins between the sexes, we assessed the effect of variants on blood protein levels within each sex by performing sex-stratified genome-wide association study (GWAS) using the 30,272 individuals. As a result, 1,612 proteins showed significant heritability in both males and females. 194 and 348 proteins showed sex-specific significant heritability in males and females, respectively (q < 0.05, Supplementary Table 1). 1,818 proteins showed a significant genetic correlation differing from zero between males and females (q < 0.05, Supplementary Table 1).

Identification of sex-dimorphic pQTL

After selecting index variants from the sex-stratified GWAS using linkage disequilibrium clumping, 31,753 and 36,979 pQTLs were identified in males and females, respectively (q < 0.05, Supplementary Tables 2 and 3). By comparing the effects of variants between males and females across the genome, we derived 113 index pQTLs that exhibit significantly different effects on the levels of 65 proteins by sex, henceforth termed sex-dimorphic pQTLs, or SD-pQTLs (q < 0.05, Supplementary Table 4). 25 out of the 113 SD-pQTLs were not significant in the sex-combined GWAS, which was conducted on protein levels without sex stratification. 52 out of the 113 SD-pQTLs were significant in both sexes, while 42 and 14 were significant only in males and females, respectively (q < 0.05). The remaining five SD-pQTLs were sex-dimorphic but not significant in either sex. Among the 52 SD-pQTLs exhibiting a significant effect in both sexes, variant rs2270416, associated with CDH15, was a stop-gain variant in CDH15 and the only variant with a sex-antagonistic effect. rs2270416 was masked in the sex-agnostic analysis, as it was not significant in the sex-combined GWAS (sex-dimorphic test: q = 2.07E-27; beta in males: -0.23, q = 4.63E-11; beta in females: 0.26, q = 3.01E-17; sex-combined test q = 0.957).

Replication of the identified SD-pQTL

To assess the confidence of the identified SD-pQTLs, we conducted sex-stratified analyses using independent datasets of multiple ancestries. In the BioBank Japan dataset, only 76 out of the 113 SD-pQTLs could be tested because either the proteins were not available or the genetic variants were not polymorphic in BioBank Japan. Among the 76 SD-pQTLs, 12 exhibited significant sex-dimorphic (q-value < 0.05) effects on the protein levels. In the Japan COVID-19 Task Force dataset, 80 out of the 113 SD-pQTLs were testable. Among the 80 SD-pQTLs, three showed significant sex-dimorphic effects. In the FinnGen dataset, 13 of the 111 tested SD-pQTLs showed significant effects. In the UK Biobank South Asian samples, two out of 90 tested SD-pQTLs were significant. In the UK Biobank Black samples, none of the 113 tested SD-pQTLs were significant (Supplementary Table 4). One SD-pQTL (Protein PAEP: variant rs67944) was replicated in BioBank Japan, Japan COVID-19 Task Force, FinnGen, and UK Biobank South Asian samples. Two SD-pQTLs (EDDM3B: rs12890226 and LEFTY2: rs360076) were replicated in BioBank Japan and Japan COVID-19 Task Force. Seven SD-pQTLs (INSL3: rs1044303, KLK3: rs10993994, rs2569747, rs266849, rs266869, PLA2G2A: rs12044628, and PLB1: rs34590437) were replicated only in BioBank Japan, nine (DDR1: rs1264344, EDDM3B: rs4982354, KLK4: rs79486581, NCAM1: rs748631, PAEP: rs10858128, PZP: rs11048434, rs2277413, SUSD4: rs7526539, and TEX101: rs7259375, rs35033974, rs2355990) were replicated only in FinnGen, and one (PZP: rs11615443) was replicated only in UK Biobank South Asian samples. A meta-analysis encompassing all datasets resulted in 22 out of the 113 SD-pQTLs yielding higher confidence than the UK Biobank analysis alone, suggesting that these 22 SD-pQTLs are robust associations (Fig. 1). The sex-dimorphic effect of the variant rs2270416 on CDH15 did not show higher significance in the meta-analysis (p-value = 4.18E-06) than in the UK Biobank alone (p-value = 2.89E-35).

Fig. 1
Fig. 1
Full size image

The effects of variants on protein levels in males and females. Each point represents a variant, while lines indicating the 95% confidence interval. The beta values and confidence intervals correspond to the UK Biobank Caucasian ancestry analysis results. Points are highlighted in red when the sex-dimorphic p-value is more significant in the multi-ancestry meta-analysis than in the Caucasian-only UK-Biobank analysis.

SD-pQTLs’ sex-dimorphic effect on health disorders

Genetic variants exhibiting a sex-dimorphic effect on protein levels might also influence health disorders in a sex-dimorphic manner, highlighting the importance of considering sex differences in genetic studies for improving precision medicine and understanding disease mechanisms. To investigate whether the 113 SD-pQTLs in this study also exhibit sex-dimorphic effects on health disorders, we first conducted sex-stratified GWAS on 30 long-term conditions using the 338,568 Caucasian individuals, independent of the samples used in the proteomic GWAS analyses (Supplementary Tables 5 and 6). We then derived results for the 113 SD-pQTLs. As a result, two out of the 113 SD-pQTLs exhibited a significant sex-dimorphic effect on health disorders (p-value < 1.47E-05, Bonferroni corrected p-value accounting for the 113 SD-pQTL and the 30 long-term conditions). SD-pQTLs of the proteins APOE (rs157581, chr19: 45,395,714) and SNAP25 (rs4420638, chr19: 45,422,946) exhibited a sex dimorphic effect on dementia, suggesting potential sex-dimorphic pleiotropy involving these proteins and dementia (Fig. 2). Although it did not reach a significant threshold after correction for multiple testing, the variant rs2270416, which showed a sex dimorphic effect on CDH15, also exhibited a suggested sex dimorphic effect on depression (p-value = 3.29E-02). Sex-specific survival analysis between APOE and SNAP25 measured plasma protein levels and prospective dementia showed a significant association for APOE both in males (hazard ratio (HR) = 0.68, p-value = 3.89E-10) and females (HR = 0.67, p-value = 5.64E-13), but with non-significant sex-dimorphic effect (p-value = 0.94). SNAP25 showed a significant association with prospective dementia only in females (HR = 1.24, p-value = 3.17E-04) but not in males (HR = 1.01, p-value = 0.838).

Fig. 2
Fig. 2
Full size image

Sex dimorphic pleiotropy between proteins and health disorders. Scatterplot of variants’ sex-dimorphic effect on proteins and health disorders. Each point represents a variant, with lines indicating the 95% confidence interval. The X-axis and Y-axis represents the difference in the effect on protein and health disorders, between males and females, respectively, with a higher value indicating a higher effect in females. The label on each point describes the associated protein and health disorder for that variant.

We also questioned whether proteins can, in a causal way, differentially influence disease risk depending on sex. This can shed light on possible sex-dimorphic disease mechanisms, that are potentially responsible for sex disparities in drug effectiveness and safety. One such example is the higher risk of adverse drug reactions observed in females, which has been attributed to sex-agnostic drug prescription practices30. Mendelian randomisation (MR) can infer causal relationships between an exposure, such as protein level, and a disease outcome using genetic variants as instrumental variables. That is, genetic variants that are associated with the outcome only through their effect on the exposure. Rather than mere correlation, where no cause-and-effect relationship is implied, MR can estimate causality between exposure and outcome. This is achieved by leveraging the random allocation of genetic variants at birth, thus minimising confounding and reverse causation from outcome to exposure. In light of this, we investigated possible sex-dimorphic causal effects of proteins on health disorders by conducting sex-stratified MR analyses using the identified SD-pQTLs. Here, we conducted MR using seven proteins that have more than one valid SD-pQTL in both sexes and 30 long-term conditions (Fig. 3 and Supplementary Table 7). In contrast to our previous analysis of sex-dimorphic effects on disorders (Fig. 2), sex-stratified MR has the advantages to assess causality and whether or not SD-pQTL effects on disorders are driven by pleiotropy. We identified four protein-disorder pairs where a causal relationship was observed in only one sex. For male-specific relationships, SUSD4-inflammatory bowel disease (Inverse variance weighted (IVW) fixed effects meta-analysis MR estimate q-value in males = 0.038, in females = 1.00) and NCAM1-dementia (q-value in males = 0.045, q-value in females = 1.00) pairs were identified. For female-specific relationships, TSPAN8-asthma (beta in females = 0.84, q-value in females = 2.52E-04) and PZP-dementia (beta in females = 0.96, q-value in females = 0.040) pairs were identified. We did not observe protein-disorder pairs where the male and female-specific causal estimates differed significantly (t-test between male and female estimates q-value > 0.05). No protein-disorder pair showed evidence of heterogeneity or pleiotropy (Supplementary Tables 8 and 9). For the four protein-disorder pairs identified through sex-specific MR, we further fit sex-specific survival analyses. We did not find a significant association between protein levels and prospective disease onset of the tested health disorders in either males or females. Cox Proportional Hazard models were fit for SUSD4-inflammatory bowel disease (protein HR p-value in males = 0.52, in females = 0.35), NCAM1-dementia (HR p-value in males = 0.76, HR p-value in females = 0.61), TSPAN8-asthma (HR p-value in males = 0.37, HR p-value in females = 0.66), PZP-dementia (HR p-value in males = 0.87, HR p-value in females = 0.41). In summary, we identified protein-disorder pairs causally affecting health disorder risk in a male-only or female-only way. No protein-disorder pairs showed significant difference in their sex-stratified causal effects here. These findings may guide future research on potential sex-specific role of those proteins in disease pathogenesis.

Fig. 3
Fig. 3
Full size image

Sex specific protein-disorder causal relationships from SD-pQTLs. Scatterplot of sex-stratified MR for protein-disorder pairs. Each point represents a pair, with lines indicating the 95% confidence interval. The X-axis and Y-axis represent causal estimates in males and females, respectively. Blue-coloured pairs indicate male-specific relationships, while red coloured pairs indicate female-specific relationships. The label on each point provides the protein and health disorder of the pair.

Discussion

Our study investigated sex differences in the proteo-genetic architecture of blood proteins across ancestries, providing a comprehensive assessment at a depth and breadth not previously achieved. The findings provide insight into different relationships between genetic variants, protein levels, and health disorders depending on sex. The replication of the identified SD-pQTLs across independent datasets of various ancestries strengthens the validity of our findings.

Our analysis revealed SD-pQTLs that exhibited various differential effects on protein levels between males and females, such as sex-specific or sex-antagonistic effects, as reported in previous sex-stratified GWAS conducted on various human phenotypes16. This suggests that sex differences at the molecular level have complex background constituted of various mechanism that cannot be explained solely by sex-specific or sex-antagonistic pathways. One notable finding from this study is variant rs2270416, which is significantly associated in both sexes, yet exhibits opposite effect directions with CDH15 plasma protein level. This is the only such case in this study, although sex-specific or dimorphic effect of CDH15 have not been clearly suggested to date. It may be implied that this antagonistic association is specific to the individuals in the UK Biobank, considering the lower significance in the meta-analysis or a potential false positive. Hence, further investigation would be required on the SD-pQTLs identified in this study to assess whether the sex-dimorphic effect of each variant is shared or differs across ancestries.

We also found that some SD-pQTLs have been identified as significant pQTLs from sex-agnostic GWAS. This implies that effect size estimates from sex-agnostic GWAS may be inaccurate, as they can represent an average between significantly different effects in males and females. Furthermore, significant associations in sex-agnostics GWAS may reflect strong effects in only one sex, which would go undetected without sex-stratified analysis. These findings highlight the importance of performing sex-stratified GWAS in uncovering sex-dimorphic genetic effects that may be missed in sex-agnostic studies.

In addition, our study provides insights into the sex-dimorphic effects of SD-pQTLs on diseases risk. For instance, we identified SD-pQTLs associated with dementia, exhibiting sex dimorphic effect on disease risk. Blood levels of APOE and SNAP25 have been reported to be associated with dementia and suggested as biomarkers for dementia31,32,33,34. Further investigation with the identified SD-pQTL of APOE and SNAP25 in this study could provide insight into the pathophysiology of dementia and help improve its prediction, diagnosis, and treatment. Additionally, our Mendelian randomization analysis using SD-pQTLs revealed sex-specific causal relationships between proteins and health disorders. Specifically, the NCAM1 and PZP proteins showed sex-specific causal relationships with dementia in males and females, respectively. This finding may provide insight into sex-specific pathophysiology of dementia through further investigation of their biological pathways. Blood levels of these two proteins have previously been reported to be associated with dementia35,36,37. Here, we provide additional genetically informed support for these associations.

PZP is a protein initially described as a major pregnancy-associated protein, showing elevated levels during pregnancy and higher expression levels in females than in males38,39. It has been reported that only females showed elevated serum PZP levels prior to the onset of Alzheimer’s disease, in line with the female-specific causal relationship between PZP and dementia observed in the present study37. Plasma NCAM1 has also been found to mediate sex-related neurodegeneration differences in cognitively normal adults, potentially suggesting that NCAM1 exerts sex-different neuropathological effects at a pre-dementia stage as well40. A previous study of temporal lobe proteomes reported a significant upregulation of NCAM1 only in the brains of male dementia patients41. Although protein levels can vary between brain and blood tissues, the male-specific association between brain NCAM1 levels and dementia reported in the previous study of temporal lobe proteomes and the male-specific causal relationship between serum NCAM1 and dementia in the present study could provide illuminating insight into the sex-dimorphic pathophysiology of dementia. Additionally, NCAM1 has been implicated in the mechanism of action of antidepressants, particularly in response to duloxetine treatment for depression42,43. Studies of pooled data from seven randomised clinical trials found no significant sex differences in duloxetine’s efficacy, safety or tolerability in treating major depressive disorder44,45. However, these duloxetine studies did not evaluate response based on drug dosage or plasma concentrations, which could mask dose-dependent sex differences. Further study on a large cohort, accompanied by experimental analysis, would be necessary to dissect the detailed role of these proteins in dementia. Taken together, these findings highlight the potential role of sex-dimorphic genetic regulation in disease pathogenesis and its implication for sex-specific precision medicine in prediction, diagnosis, and treatment.

However, the results of this study should be interpreted with caution, as we utilized only a limited number of blood proteins currently available, which is just a subset of human proteome. The SD-pQTLs and their associations with health disorders also require careful consideration, as the analysis was limited to Caucasian individuals from the UK Biobank, leveraging its large sample size. Additionally, disease prevalence differs between males and females, as observed in our dataset. Furthermore, the MR investigation of sex-dimorphic causal relationship should also be interpreted with caution, as this study relied solely on SD-pQTLs. Further studies using additional valid genetic instruments for each disease are needed to better dissect sex-dimorphic effects in causal inference. Another limitation of our MR analysis is that it was based on individuals of Caucasian ancestry from the UK Biobank, as SD-pQTLs from other ancestries would have limited power in an MR framework. Restricting MR analyses to ancestrally homogenous samples reduces the risk of population stratification that can lead to violation of the independence and exclusion restriction MR assumptions46. However, transferring MR results across ancestries is challenging due to differences in LD patterns and allele frequencies46, although methods to facilitate trans-ancestry MR have been proposed47. It should also be noted that aside from population stratification, MR analyses can be biased by assortative mating, dynastic events (the direct effect of one’s parents on a phenotype), or selection bias, such as participation bias in the UK Biobank. Such processes can confound the relationship between SD-pQTLs and disease outcomes, potentially violating the MR independence assumption48. Finally, out MR analysis does not account for potential time-varying effects of proteins on disease risk, which may be important when considering the timing of intervention46. Additionally, the limited statistical support in the survival analysis maybe due to a small sample size, specifically the low number of incident dementia events following the date the blood sample was collected for the protein level measurement (288 males and 302 females). Notably, the survival analyses of PZP and NCAM1 on dementia showed the same direction of effect as in the MR, suggesting the possibility of insufficient sample size. Therefore, larger studies with longer follow-up are needed to clarify the sex-dimorphic associations between protein levels and disease incidence observed in this study.

Conclusions

Our study provides comprehensive findings, resources, and insights into sex differences in the proteo-genetic architecture of proteins and their relationships with disease susceptibility. Further research is needed to investigate the underlying mechanisms of sex-dimorphic protein regulation, their links to diseases, and their potential for therapeutic applications.

Methods

Data sources

The UK Biobank is a prospective research resource of population-based cohort study that include comprehensive phenotype and genotype data from approximately 500,000 participants recruited in 2006–2010 residing in England, Scotland, and Wales (www.ukbiobank.ac.uk). This open-access resource was established to support investigations into the factors influencing various health outcomes49. We utilized genotyped data and recently released proteome data from the UK Biobank26. To ensure homogeneity, we limited analyses to unrelated individuals of Caucasian genetic ancestry (UK Biobank Data-Field 22006) with less than 10% missing genotypes and those with matched recorded sex (Data-Field 31) and genetically determined sex (Data-Field 22001). We used this matched sex information to define sex in subsequent analyses. The unrelated participants were identified and extracted using the KING software with following options: --unrelated –degree 2 (version 2.28)50. Autosomal and X-chromosomal genotypes of the selected individuals were filtered using PLINK software (version 1.90b) with the following options; --geno 0.01, --hwe 1e-15, --maf 0.01, and mind 0.1, retaining 539,158 variants51. These filtered variants were used in the subsequent analyses. To ensure homogeneity in proteome analysis, we extracted proteome data of randomly selected baseline participants from protein batches 0–6, which are highly representative of the UK Biobank overall26. Following these selections, the final dataset for proteome analysis consisted of 13,974 males and 16,298 females, totalling 30,272 individuals. The remaining individuals without proteome data, 156,581 males and 181,987 females, totalling 338,568 individuals, were retained, and utilized in subsequent sex-stratified analyses on health disorders.

The BioBank Japan (https://biobankjp.org/en/) project (first cohort) is a hospital-based cohort study that recruited approximately 200,000 patients with at least one of the 47 complex diseases between 2003 and 2007 across 66 hospitals in Japan52. Proteomic profiling was performed on unrelated individuals of East-Asian ancestry from two previous studies with whole genome sequencing datasets, using the Olink Explore 3072 panel following the manufacturer’s protocol53,54. Proteomic data processing and data quality control were conducted according to Olink protocols. The rank-based inverse normal transformation was applied to protein level measurements before association tests for males and females, respectively. Sex-stratified pQTL summary statistics of serum protein levels in the BioBank Japan project were obtained by meta-analysing results for each sex derived from each study separately using REGENIE v3.2.9 (adjusted for age, age2, proteomic profiling batch, and the first 10 genetic principal components) and METAL (fixed-effect inverse variance weighted)55,56. In total, 2,886 participants (including 2,151 males and 735 females) were included in the sex-stratified pQTL analyses in BioBank Japan.

The Japan COVID-19 Task Force (JCTF) was established in early 2020 as a nationwide multicentre consortium to overcome the COVID-19 pandemic (https://www.covid19-taskforce.jp/en/home/). Plasma protein expression was measured using the Olink Explore 3072 platform. We bridge-normalized the Normalized Protein eXpression (NPX) values using the OlinkAnalyze R package with 16 intersecting samples as bridging samples. Samples with QC warning flags were removed. Genotyping was performed using Infinium Asian Screening Array (Illumina, CA, USA), and stringent sample and variant level quality control (QC) filters were applied (sample QC: sample call rate < 0.98, samples of estimated non East Asian ancestry based on PCA with HapMap project samples, variant QC: variant call rate < 0.99, minor allele count < 5, p-value for Hardy-Weinberg equilibrium < 1e-10, and with more than 5% allele frequency difference when compared with the representative reference panels of Japanese ancestry57. We performed genome-wide genotype imputation, by using SHAPEIT4 software version 4.2.1 for haplotype phasing and Minimac4 software version 1.0.1 for genotype imputation58,59. For imputation, we used our in-house and Japanese-specific reference panel composed of N = 4,561 whole-genome sequence (WGS) data from multiple studies (e.g., N = 1,939 from the BBJ study and N = 141 WGS from the previous study)60,61. The final dataset consisted of 995 males and 399 females, totalling 1,394 individuals. The genomic coordinates of the variants are based on the genome build GRCh37 throughout this study.

FinnGen (https://www.finngen.fi/en) is a public-private research project, combining genome and digital healthcare data on ~ 500,000 Finns that launched in 2017. The nation-wide research project is a pre-competitive partnership of Finnish biobanks and their background organisations (universities and university hospitals) and international pharmaceutical industry partners and Finnish biobank cooperative (FINBB). FinnGen aims to provide novel medically and therapeutically relevant insight into human diseases and is described in detail in their flagship paper62. FinnGen partners are listed in full here: https://www.finngen.fi/en/partners.

Investigating sex different genetics in blood proteome

To investigate the effect of sex on proteins, multivariable regression was conducted accounting for age, age2, body mass index, UK Biobank Centre, UK Biobank genetic array, time between blood sampling and measurement, and the first 20 genetic principal components. To investigate genetic effects on blood proteins by sex, we conducted sex-stratified GWAS on protein levels using the proteome data from the 30,272 selected Caucasian individuals and compared variants’ effect on protein between males and females. For the sex-stratified GWAS, REGENIE software (version 3.2.2) was utilized with sex-stratified protein level normalization using --apply-rint option55. Covariates considered in the GWAS included age, age2, batch, UK Biobank Centre, UK Biobank genetic array, time between blood sampling and measurement, and the first 20 genetic principal components. Sex, Sex*age, and sex*age2 were added in covariates for sex-combined GWAS. Protein GLIPR1 was excluded from the analysis due to exceptionally high percent of failing QC (99.40%) during the protein measurement26.

We estimated variant-based heritability as the variance explained by the genetic variants’ effect using the LD-score regression63. To estimate sex difference in heritability, we applied t-statistics utilized in a previous sex stratified study16. We estimated genetic correlation of each protein between males and females using the High-Definition Likelihood method64. To account for multiple testing, we adjusted the p-values using the Benjamini-Hochberg method for each analysis65.

Identification and replication of sex-dimorphic pQTL

To identify variants with different effects between males and females, we compared the effects of each variant across the genome for each protein using two-tailed Student’s t-test, applied in previous sex-stratified GWAS comparison studies16,66,67. To select index variants in each significant locus, we utilized PLINK software with following options: --clump, --clump-p1 0.00000005, --clump-p2 0.001, --clump-r2 0.2, --clump-kb 10,000, considering sex-different p-value as the variant’s significance. Clumping was also applied to GWAS results from each sex as well51. To account for multiple testing, we adjusted the sex different p-values using the Benjamini-Hochberg method, and index variants with false discovery rate < 0.05 were considered significant and presented as the identified SD-pQTLs in this study65. The functional consequences of variants were annotated using Variant Effect Predictor and Ensembl GRCh37 release 11268.

To ensure the confidence of the identified SD-pQTLs, we replicated the identified SD-pQTLs in each independent dataset comprised of 2,886 individuals (2,151 males and 735 females) of Japanese ancestry from the BioBank Japan, 1,394 individuals (995 males and 399 females) of Japanese ancestry from the Japan COVID-19 Task Force, 1,990 (958 males and 1,032 females) of Finnish ancestry from FinnGen, as well as 630 individuals (336 males and 294 females) of South Asian (Indian, Pakistani, and Bangladeshi) and 662 individuals (290 males and 372 females) of Black ancestries (Caribbean and African) in the UK Biobank based on the self-reported ancestry (Data-Field 21000), using the protocols from the identification step in this study. For FinnGen, the 1,990 individuals had plasma proteomics measured using the Olink Explore 3072 panel. Quality control of the proteomics data was carried out in accordance with Olink recommendations by the core FinnGen analysis team. GWAS of plasma protein levels were carried out separately by sex using REGENIE [Ref]. NPX values were rank-based inverse normal transformed (--apply-rint) within sex. Age, batch, genotyping array and genetic PCs (1–5) were included as covariates. A detailed description of FinnGen genotype data, its quality control and pre-processing has been described previously62. FinnGen data freeze 12 was used for this analysis and sex-specific pQTL results were downloaded from the FinnGen sandbox under approved proposal F_2024_056. The replicated results were combined using a random-effect meta-analysis, conducted with the metafor R package and set to “REML”, to obtain a single estimate69. Variants were considered replicated SD-pQTLs if the p-value from the meta-analysis was more significant than that from the UK Biobank analysis alone, indicating a consistent trend in sex-different effect of the variant across independent datasets of multiple ancestries.

Sex dimorphic effect of SD-pQTL on health disorders

To investigate SD-pQTLs’ sex dimorphic effect on health disorder, we firstly conducted sex-stratified GWAS on 30 predefined long-term conditions (Supplementary Table 5) using the 338,568 individuals retained during sample selection70. We then derived results for the 113 SD-pQTLs. For the sex-stratified GWAS, we used the REGENIE software with the following covariates: age, age2, UK Biobank Centre, UK Biobank genetic array, and the first 20 genetic principal components55. Sex-different effects on health disorders were derived using the method applied in SD-pQTL analysis in this study. SD-pQTLs were considered to have sex-different pleiotropy with health disorders if the SD-pQTL exhibited p-value for sex dimorphic effect on health disorder lower than 1.47E-05, which 0.05 divided by the number of long-term conditions and the number of SD-pQTLs, accounting for multiple tests.

To investigate sex-dimorphic causal relationships between plasma protein levels and the 30 predefined health disorders, we employed a two-sample mendelian randomisation (MR) approach separately for males and females71. MR analyses are based on the use of genetic variants as instrumental variables (IV). IVs are variables associated with an exposure but not with the outcome of interest through any other pathway. Three assumptions are required for MR to be valid: (1) IVs are significantly associated with the exposure (the relevance assumption); (2) there are no confounders of the IVs and the outcome (the independence assumption); and (3) IVs do not affect the outcome other than through the exposure (the exclusion restriction assumption)71.

We used the SD-pQTLs identified in this study as instrumental variables. For each MR analysis between protein and health disorder by sex, SD-pQTLs that were significantly associated with each protein (FDR < 0.05) and were not significantly associated with each health disorder (p-value ≥ 5E-8) were selected. SD-pQTLs were used regardless of their genomic position relative to their associated protein’s gene location, i.e. both cis- and trans-pQTLs. Only proteins with more than one SD-pQTLs after filtering were considered for the MR analysis. This was done in order to allow the use of sensitivity analyses testing for horizontal pleiotropy and heterogeneity46.

MR was conducted using the TwoSampleMR R package (version 0.5.11)72. The causal estimates were initially derived using the inverse-variance weighed (IVW) fixed effects meta-analysis method, accompanied by weighted median (WM) and MR-Egger methods73,74,75. WM and MR-Egger analyses were conducted to the proteins with more than 2 valid SD-pQTLs. Sensitivity analyses to test potential horizontal pleiotropy of the causal relationships and heterogeneity of the instrumental variables were conducted using the MR-Egger intercept test and the Cochran’s Q statistic test, respectively75,76,77. Sex dimorphic causal relationships were identified using a two-tailed Student’s t-test and considered as sex-dimorphic if FDR for the sex dimorphic effect is below 0.05. Causal relationships with FDR < 0.05 for either the pleiotropy or heterogeneity tests were excluded.

For the protein-disorder pairs that were significant in the sex-stratified MR analysis, we further examined whether UKB protein levels could predict incident health disorder survival in a sex-dimorphic way. For this purpose, we employed Cox Proportional Hazards models separately for each sex, using baseline protein levels as predictors and time to first incident diagnosis of the respective health disorder as outcome. Survival times were derived based on up to 16 years of follow-up health records. Models were adjusted for age at protein measurement, as it is a confounding factor for many age-related health disorders. Individuals with a prevalent health disorder diagnosis before, or up to 1 year after protein measurement were excluded. The “survival” R package (version 3.7-0) was used to fit the Cox Proportional Hazards models78. All methods have been carried out in accordance with relevant guidelines and regulations.