Abstract
There is insufficient understanding of the molecular basis of prostate cancer (PCa) across different populations. We perform a large-scale proteome-wide association study (PWAS) to identify proteins with genetically regulated expression in plasma to be associated with PCa risk across populations. We develop genetic prediction models for expression of 1578, 1993, 1218, and 1390 proteins for African (n = 450), European (n = 758), Asian (n = 289), and Hispanic/Latino (n = 474) males, respectively, and evaluate associations of genetically regulated protein expression with PCa risk in 19,391 PCa cases and 61,608 controls of African population, 122,188 cases and 604,640 controls of European population, 10,809 cases and 95,790 controls of Asian population, and 3931 cases and 26,405 controls of Hispanic/Latino population. We identify three, four, 15, and 73 PCa-associated proteins in African, Hispanic/Latino, Asian, and European populations, respectively, and 83 in trans-population meta-analysis. There are both pan-population and population-specific associations. Our findings provide valuable insights into etiology of PCa.
Similar content being viewed by others
Introduction
Prostate cancer (PCa) is the most frequently diagnosed cancer among men in 118 countries/territories and the leading cause of cancer death in 52 countries1. There is a critical need to better understand its etiology for developing improved therapeutic and risk assessment strategies. Genetic factors have been demonstrated to play an important role in PCa etiology. It is well established that there is a significantly increased risk of PCa among men with a family history of the disease2. Genome-wide association studies (GWAS) have contributed significantly to our understanding of the heritability and familial risk of PCa3,4. Numerous genomic loci associated with PCa risk have been identified in GWAS3,4, suggesting a multigenic model of prostate tumorigenesis. On the other hand, the underlying biological mechanisms for a majority of the GWAS-identified risk loci remain unclear. Given the constraints inherent in conventional single-marker-based GWAS5, there has been a significant shift towards elucidating the functional implications underlying risk loci in the post-GWAS era. This emphasis aims to accurately delineate the exact biological mechanisms, particularly in post-translational processes through which the identified single nucleotide polymorphism (SNP)s and target genes exert their effects6,7,8,9,10,11.
It is known that the proteome is commonly dysregulated during the development of diseases. Several proteins have been reported to be associated with PCa risk for their measured levels in blood, such as KLK11 (kallikrein related peptidase 11)12, IL-6 (interleukin 6)13, and EPCA (early PCa antigen)14. However, the findings are not always consistent15,16. Conventional epidemiological studies could confer several limitations, such as selection bias, potential confounding, and reverse causation, which may account for some of the inconsistent results observed. One alternative strategy is to evaluate the associations between genetically predicted protein levels and PCa risk. To identify proteins potentially relevant to PCa, we have comprehensively evaluated the relationship between genetically predicted protein levels and PCa risk in European population17,18 and identified a potential link between several protein biomarkers and PCa risk, including Laminin, IL-21, and HOXB13 (homeobox B13)18,19.
Racial and ethnic background is a well-established risk factor for PCa20,21. PCa incidence rates vary significantly by regions, with the lowest rates observed in South Central Asia (6.4 age-standardized rate per 100,000), and markedly higher rates in Northern Europe (82.8 age-standardized rate per 100,000)1. Meanwhile, PCa mortality rates among men of African population are two to four times higher than those observed in men of other racial and ethnic groups1,22. Biological differences in PCa have also been observed across populations. For instance, the prostate-specific Ca2+-dependent chloride ion channel protein23, ANO7 (anoctamin 7), has been identified as a population-relevant protein-altering PCa-risk locus24,25. Two variants in the ANO7 gene, rs74804606(p.Ile740Leu) and rs60985508 (p.Ser914*), have been identified to be associated with PCa in men of African ancestry25. In contrast, four European-specific PCa-risk variants in ANO7, rs77559646, rs2074840, p.Ala759Thr (rs76832527), and p.Glu226Lys/p.Glu226* (rs77482050), were largely excluded from analyses in African, East Asian, and Hispanic populations due to low frequency. Additionally, African-American men are known to exhibit higher expression of gene sets involved in the immune response, apoptosis, hypoxia, and reactive oxygen species production26. Despite these well-established differences, a comprehensive characterization of population-specific proteomic heterogeneity in PCa remains lacking, as most existing studies have focused primarily on men of European descent19,27,28. Although a previous study has examined population-specific exosomal proteins, its findings were limited by a very small sample size (n = 12 patients and nine controls) and a lack of adjustment for multiple comparisons29.
In this work, we leverage a very large and comprehensive reference dataset of 7367 plasma proteins measured across four diverse populations — 450 African, 758 European, 289 Asian, and 474 Hispanic/Latino men — to establish population-specific genetic prediction models. We further analyze GWAS summary statistics for PCa risk, involving men in the African population (19,391 cases and 61,608 controls), European population (122,188 cases and 604,640 controls), Asian population (10,809 cases and 95,790 controls), and Hispanic/Latino population (3931 cases and 26,405 controls) to identify proteins whose genetically predicted levels are associated with PCa risk. Our study demonstrates the value of large-scale, diverse omics data for understanding the etiology of PCa to reduce its health disparities.
Results
Building population-specific protein genetic prediction models
In this study, we first identified protein quantitative trait loci (pQTLs) for 450 African, 758 European, 289 Asian, and 474 Hispanic/Latino men residing in the USA in the Multi-Ethnic Study of Atherosclerosis (MESA) cohort. Among the 6999 proteins examined, we identified at least one cis-pQTL (within ±100 Kb of the transcription start site) associated at false discovery rate (FDR) < 0.05 and/or one trans-pQTL associated at P < 5 × 10−9 for 1609 proteins in the African population, 2058 in the European population, 1237 in the Asian population, and 1415 in the Hispanic/Latino population. We further established prediction models for 1578 proteins in the African population, 1993 in the European population, 1218 in the Asian population, and 1390 in the Hispanic/Latino population, each with an R2 ≥ 0.01, with 697 proteins overlapping across all genetic ancestries (Fig. 1a and Supplementary Data 1). Among the established prediction models, population-specific models were developed for 340 proteins in Africans, 515 in Europeans, 211 in Asians, and 156 in Hispanic/Latinos. The model performance R2 varies across populations, ranging from 0.01 to 0.74 in Africans, 0.01 to 0.79 in Europeans, 0.01 to 0.77 in Asians, and 0.01 to 0.77 in Hispanic/Latinos, with Asians exhibiting the highest mean R2 of 0.18 (Fig. 1b). Kruskal–Wallis test revealed a significant difference in R2 values across the four groups (χ² = 98.39, df = 3, P < 2.20 × 10−16). Post hoc Dunn’s tests with Bonferroni correction confirmed that R2 was significantly higher in Asians compared with Africans (P = 2.57 × 10−5), Europeans (P = 7.81 × 10−22), and Hispanic/Latinos (P = 1.34 × 10−5). It is worth noting that the Asian group was the smallest MESA population.
a Venn plot of protein prediction models across populations. b Distribution and performance of protein prediction models in different populations. The pie charts show the proportion of modeled proteins categorized by the type of genetic variants used. The violin plot illustrates the distribution of R2 values for protein models. The horizontal line within the box plot represents the mean value of R2. The box boundaries define the interquartile range (IQR, 25th to 75th percentiles), and whiskers extend to the most extreme points within 1.5 × IQR. Source data are provided in the Source Data file.
We further validated the European population prediction models using INTERVAL data in an external validation30. Due to differences between Somalogic’s SomaScan platform versions, 844 proteins were available for analysis. Of these testable models, 584 (69.19%) exhibited a prediction performance of R2 ≥ 0.01 in the external validation (Supplementary Fig. 1).
Identification of proteins associated with PCa risk
To identify proteins associated with PCa risk in each population of interest, we applied our established prediction models to the large-scale PCa GWAS summary statistics generated from 156,391 PCa cases and 666,248 controls across diverse populations. We identified significant associations (FDR < 0.05) for three proteins in the African population, namely, microseminoprotein-β (encoded by MSMB), epidermal growth factor:extracellular domain (encoded by EGF), and ADP-ribosylation factor-like protein 3 (encoded by ARL3). In addition, 73 proteins in Europeans, 15 in Asians, and four in Hispanic/Latinos were significantly associated with PCa risk at FDR < 0.05 (Fig. 2 and Supplementary Data 2). Among them, microseminoprotein-β was detected across all four populations, and two proteins (ADP-ribosylation factor-like protein 3 and epidermal growth factor:extracellular domain) were identified in both African and European populations. Two unique proteins, prostate-specific antigen (PSA) and benign prostate-specific antigen (BPSA) encoded by KLK3 (kallikrein related peptidase 3) were shared between Asian and Hispanic/Latino populations. A total of 60 proteins were unique to European population (FDR < 0.05 in European and raw p > 0.05 in all other populations) and eight proteins tended to be Asian-specific (Supplementary Data 2).
Each dot represents the P-value for the association between genetically predicted protein abundances and PCa risk, plotted by genomic position on the x-axis. Red dashed lines indicated the significant threshold at False Discovery Rate (FDR) < 0.05 for each population: African (19,391 PCa cases and 61,608 controls), European (122,188 cases and 604,640 controls), Asian (10,809 cases and 95,790 controls), and Hispanic/Latino (3931 cases and 26,405 controls). Source data are provided in the Source Data file.
A trans-population meta-analysis was conducted to identify proteins associated with PCa risk. A total of 92 proteins encoded by 83 genes were significantly associated with PCa risk at FDR < 0.05. Among them, 45 demonstrated inverse protein-PCa associations and 47 showed associations between higher predicted levels and increased PCa risk. Eighteen proteins not detected in population-specific analyses were identified in this meta-analysis (Supplementary Data 2). Based on a heterogeneity p-value (HetPval) of <0.05 or I2 heterogeneity statistic (HetISq) > 75%, we found significant heterogeneity across populations for 10 unique proteins, including microseminoprotein-β, plasminogen, Charged multivesicular body protein 2b, Cathepsin S, Epidermal growth factor:Extracellular domain, tripartite motif-containing protein 40, Trypsin-3, Interleukin-36 alpha, PSA, Mth938 domain-containing protein, and UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 8 (Supplementary Data 2).
Through population-specific proteome-wide association study (PWAS) and trans-population meta-analysis, we finally identified 96 protein-PCa risk associations, corresponding to 104 SOMAscan protein IDs. We then compared these 96 associated proteins with independent risk loci previously reported in GWAS4,24. Of these, 92 proteins were located within 500 kb of previously reported lead SNPs. The remaining four proteins (SLIT and NTRK-like protein 1, toll-like receptor 3, synaptotagmin-2, and complement factor H-related protein 1) were located more than 500 kb away from known loci (Supplementary Data 2).
Two-stage constrained maximum likelihood (2ScML) robustness analysis
To assess the robustness of our results, we applied the 2ScML method under relaxed instrumental variable assumptions to 95 population-specific associations spanning 79 unique proteins with 86 SOMAscan IDs (three in Africans, 73 in Europeans, 15 in Asians, and four in Hispanic/Latinos). As shown in Supplementary Data 3, approximately 44.21% (42 of 95) of the associations remained significant at FDR < 0.05 with consistent effect directions. Except for PSA, detected in both Asian and Hispanic/Latino populations, the remaining 40 proteins were population-specific. Of these, 36 proteins tend to be specific to the Europeans and four were uniquely linked to PCa risk in the Asian population (aldehyde dehydrogenase, mitochondrial, arachidonate 15-lipoxygenase B, insulin-like growth factor-binding protein 3, and SLIT and NTRK-like protein 1).
Tissue expression, functional enrichment, and network analysis of identified proteins
We evaluated the tissue-specific expression of 94 genes coding the 96 PWAS-identified proteins using RNA data from the Human Protein Atlas (HPA). Among these, three genes (KLK3, MSMB, and ALOX15B) exhibited prostate-enriched expression. These well-known PCa markers31,32,33 validate our gene selection strategy. An additional 56 genes showed higher expression in other tissues but remained detectable in the prostate, many previously implicated in PCa-related pathways, including growth factor signaling (e.g., EGF34, IGFBP335), extracellular matrix remodeling and metastasis (e.g., PRSS336, MMP737), and inflammatory regulation (e.g., SOCS338).
Our Gene Ontology (GO) enrichment analysis of the 94 candidate genes revealed significant overrepresentation in relevant biological processes, including insulin-like growth factor receptor signaling pathway (GO:0048009, P = 2.92 × 10−5) and extracellular matrix disassembly (GO:0022617, P = 2.39 × 10−4), both associated with tumor progression39,40 and metastatic potential41 in PCa. In the Molecular Function category, enriched terms included serine-type endopeptidase activity (GO:0004252, P = 2.70 × 10−5), serine-type peptidase activity (GO:0008236, P = 4.90 × 10−5), growth factor receptor binding (GO:0070851, P = 7.63 × 10−5), and cytokine activity (GO:0005125, P = 1.17 × 10−3), all relevant to tumor proliferation and microenvironmental signaling42,43,44. Cellular Component analysis indicated predominant localization to the collagen-containing extracellular matrix (GO:0062023, P = 3.43 × 10−5), highlighting roles in PCa development and metastasis45 (Supplementary Data 4). Protein-protein interaction (PPI) analysis identified IL-10 (Interleukin-10) with the highest node degree of 18, followed by Angiostatin with a node degree of 14 (Supplementary Data 5), suggesting both topological centrality and functional significance in PCa.
Subsequent ingenuity pathway analysis (IPA) analysis was performed to explore potential regulatory mechanisms underlying the genes encoding the identified PCa-associated proteins (Supplementary Data 6). Kallikrein-related peptidase 3 (KLK3), (mitogen-activated protein kinase 3) (MAPK3), and ras-related protein (RALB) were significantly enriched in the PCa signaling pathway (P = 7.76 × 10−3). Additionally, several PCa-related biofunctions were identified, involving DPEP1 (dipeptidase 1), EGF, IGF2R (insulin-like growth factor 2 receptor), KLK3, LGALS3 (galectin—3), LHCGR (luteinizing hormone/chorionic gonadotropin receptor), MAPK3, MSMB, PLG (plasminogen), SOCS3 (suppressor of cytokine signaling 3), TLR3 (toll-like receptor 3). These biofunctions encompassed proliferation (P = 8.42 × 10−5), Cell cycle progression (P = 5.91 × 10−4), apoptosis (P = 6.16 × 10−4), binding (P = 1.27 × 10−3), PCa pathogenesis (P = 2.94 × 10−3), G0/G1 phase transition (P = 3.48 × 10−3), atrophy of prostate gland (P = 3.48 × 10−3), and hereditary PCa (P = 3.48 × 10−3) (Supplementary Data 6).
Candidate drugs targeting identified associated proteins
To explore potential drug repurposing opportunities for the 79 uniquely identified proteins, we curated the DrugBank database and found 31 proteins as targets of FDA-approved drugs for various human diseases (Supplementary Data 7). Of these, 25 proteins were further supported as relevant to PCa based on a positive overallAssociationScore from the OpenTargets platform (Supplementary Data 7). Notably, the lutropin-choriogonadotropic hormone receptor (LSHR) is targeted by goserelin (DrugBank ID: DB00014), a synthetic analog of luteinizing hormone-releasing hormone used clinically to treat both breast cancer and PCa by reducing the secretion of pituitary gonadotropins46,47. Additionally, aldehyde dehydrogenase, mitochondrial (ALDH-E2) is targeted by disulfiram (DrugBank ID: DB00822), an FDA-approved drug for treating chronic alcoholism. In preclinical studies, disulfiram, along with delivering copper, targets and eradicates copper-laden PCa cells in animal models, while leaving non-cancer cells intact48. Our findings suggest promising opportunities for repurposing these drugs as potential treatments for PCa (Supplementary Data 7).
Discussion
In this study, we developed comprehensive protein prediction models by integrating genetic and protein data collected from 1971 independent males representing diverse racial and ethnic backgrounds, including African, European, Asian, and Hispanic/Latino populations. Interestingly, the models for the Asian population exhibited the highest average cross-validated R² value of 0.18, despite this being the smallest subgroup within the MESA cohort. This suggests that, in this population, certain proteins are more strongly regulated by genetic variants, leading to improved predictive performance at specific loci. This hypothesis is further supported by the Kruskal–Wallis test, which revealed significant differences in heritability (H²) values across the four groups (χ² = 156.89, df = 3, P < 2.20 × 10−16). In post hoc Dunn’s tests, H² values were significantly higher in Asians compared with Africans (P = 0.02), Europeans (P = 4.38 × 10⁻29), and Hispanic/Latino participants (P = 2.69 × 10-3). While these results are encouraging, they should be interpreted with caution given the limited sample size. Future studies leveraging larger and more diverse Asian cohorts will be essential to validate and further investigate these observations.
These race- and ethnic-specific protein prediction models were then applied to GWAS summary statistics for PCa risk in each population of interest. Consequently, we identified three, 73, 15, and four proteins associated with PCa risk in African, European, Asian, and Hispanic/Latino populations, respectively. Furthermore, a trans-population meta-analysis detected 92 proteins that showed significant associations with PCa risk. To assess the robustness of causal inference, we applied the 2ScML method and found that over 40% of the identified population-specific associations remained statistically significant after accounting for potential pleiotropic effects of SNPs. Overall, the predicted abundances of a total of 104 proteins were associated with PCa risk in either population-specific or cross-population analyses. These proteins include targets such as cell adhesion molecules, protease inhibitors, receptors, enzymes, and cytokines, highlighting the multifaceted nature of the biological pathways involved in PCa development.
Among the proteins identified, several and/or their encoding genes have been previously reported to play crucial roles in PCa progression. For example, glucosamine-6 phosphate-N-acetyl Transferase (GNPNAT1) has been implicated in the pathogenesis of castration-resistant PCa49 through the phosphatidylinositol 3-kinase/protein kinase B (PI3K/Akt) signaling pathway. Additionally, Shah et al. reported that enzalutamide-induced dysregulation of androgen receptor (AR) signaling increases exon-2 inclusion in PLA2G2A transcripts in PCa cells50. Moreover, PCa risk-associated SNPs rs9364554 and rs7629490 have been linked to the expression of nearby genes IGF2R and CHMP2B, respectively ( ± 500 kb)51. We further compared our results to previously reported proteins associated with PCa risk in PWAS18,19,52,53,54 and found 16 overlapping proteins (Supplementary Data 8). Of these, eight encoded by ARL3, B3GNT8, CTSS, IGF2R, MSMB, MICB, PLG, and SPINT2 have been demonstrated to be associated with PCa risk in our earlier studies, which included over 140,000 cases and controls of European descendant18,19. Among them, MSMB stands out as a well-established biomarker for PCa diagnosis and prognosis32. Two GWAS have reported the strongest association between PCa risk and rs10993994, located 2 base pairs upstream of the MSMB transcription start site55,56. This association has been validated in both European57,58,59 and Asian populations60,61. Furthermore, two proteins identified in this study, encoded by ATF6B and PFKM, were previously implicated as potential causal genes at PCa susceptibility loci in our earlier transcriptome-wide association study11.
In addition to proteins previously reported to be associated with PCa risk, our study also identified several additional proteins that may play important roles in PCa development. While some of these proteins have been linked to other cancer types, their potential involvement in PCa has not been previously characterized. For instance, LCTL encoding Lactase-like protein has been reported as a prognostic biomarker in glioma, where it modulates immune responses62. Moreover, SVEP1 expression is suppressed by miR-1269b, which activates the PI3K/Akt pathway and promotes recurrence and metastasis in hepatocellular carcinoma63. Our integrative analysis further identified PCa risk-associated proteins whose biological functions in prostate carcinogenesis remain largely unexplored. Functional annotation through IPA highlighted DPEP1 as a potential biomarker for PCa. DPEP1 has been shown to promote metastasis in colorectal cancer64, but inhibit invasiveness in pancreatic ductal adenocarcinoma cells65, suggesting a context-dependent role in tumor progression. These associations warrant further study to clarify their roles in PCa and may reveal new pathways or targets for biomarkers and therapy.
Through our drug repurposing analysis, we identified 31 proteins that are targets of approved drugs used to treat various human diseases, including PCa (treated by goserelin) (Supplementary Data 6). Among the list of implicated drugs, several drugs primarily used for non-PCa indications (alcohol dependence, inflammation, leukemia) demonstrate significant anti-PCa activity in preclinical models and in some cases early clinical settings, highlighting their potential for repurposing in PCa treatment. For example, in vitro and in vivo studies demonstrate that disulfiram (DrugBank ID DB00822), which targets ALDH-E2, has been shown to inhibit PCa cell growth, induces apoptosis, and demethylates DNA via DNMT1 inhibition66,67. Early-phase clinical trials in both recurrent68 and metastatic PCa69 have shown biological activity, although clinical benefit remains to be fully established. Additionally, mitogen-activated protein kinase 3 (ERK-1) is targeted by sulindac (DrugBank ID DB00605) and arsenic trioxide (DrugBank ID DB01169). Sulindac is a non-steroidal anti-inflammatory drug. Sulindac derivatives have demonstrated anti-proliferative and pro-apoptotic effects in PCa cell lines70,71. Arsenic trioxide, which is FDA-approved for the treatment of acute promyelocytic leukemia, has also shown promise in PCa. It induces apoptosis and cytotoxicity in PCa cell lines72 and has demonstrated significant antitumor activity in an in vivo model of androgen-independent PCa73. Phase II clinical trial reported PSA reductions or disease stabilization in some patients with hormone-refractory PCa74. These findings underscore their promise for repurposing in PCa therapy and highlight the need for further clinical investigations to fully evaluate their efficacy and therapeutic value in this context.
One advantage of the current study is the use of genetic instruments rather than directly measured protein levels. Unlike traditional epidemiological approaches using measured protein levels for association estimation, our method minimized potential biases (including residual confounding, reverse causation, and selection bias) and also increased the statistical power. This improvement is achieved by applying comprehensive genetic prediction models to large-scale GWAS summary statistics of PCa risk, which involves a very large number of cases and controls. In our work, we also applied a design that leverages information from both cis- and trans- pQTLs to fully capture the genetically regulated components of protein levels. Although trans-regions exhibit more complex regulatory mechanisms than cis-regions, the majority of identified pQTLs are trans-acting instead of cis-acting30. Therefore, we integrated both cis- and trans-genetic instruments to develop our genetic prediction models for protein levels. The numbers of identified associations in African, Hispanic/Latino, and Asian populations were relatively smaller than that observed in European population. Future studies that increase the sample sizes of underrepresented groups including African, Hispanic/Latino, and Asian groups will be critical for enhancing our understanding of PCa etiology in these understudied populations and for reducing health disparity24. Another major strength of our study is the development of population-specific protein predicted models for males in African, Asian, and Hispanic/Latino groups. This method helps mitigate potential bias and loss of statistical power caused by varying linkage disequilibrium (LD) patterns across populations29. We leveraged a large blood proteome reference dataset to establish prediction models for nearly 7,000 proteins to examine their potential relationship with PCa.
Our study has several limitations that need to be acknowledged to appropriately interpret our findings. Firstly, we were unable to conduct external validation of the established prediction models for populations other than European population30, due to a lack of data from additional well-designed studies. However, our external validation results focusing on the European population suggest that our modeling strategy is effective. As reference data of proteomic and genomic resources expand to include more globally underrepresented populations, this limitation can be well addressed in the near future. Secondly, the SomaScan platform has inherent limitations. Its aptamer-based detection method may be more susceptible to binding variability and off-target effects compared with antibody-based assays, such as Olink in the UK Biobank Pharma Proteomics Project (UKB-PPP)75. Moreover, SomaScan interrogates only a subset of plasma proteins, and may not accurately reflect protein levels in other tissues or specific cell types. Future work using more comprehensive proteome platforms, as they become available, and extending investigations to other disease-relevant solid tissues and/or cell types will be crucial to better characterize additional disease-relevant proteins. Thirdly, due to the scope of the current study, we were unable to functionally characterize the potential roles of the identified proteins in PCa development. Future functional studies are warranted to elucidate the specific function and mechanisms of these proteins, particularly those not previously reported in prostate tumorigenesis.
In conclusion, our comprehensive PWAS, conducted across diverse populations and encompassing a large number of proteins (~7000), identified multiple associations between genetically predicted circulating levels of proteins and PCa risk. By developing population-specific genetic prediction models for protein abundances, we were able to elucidate distinct protein-PCa risk associations in each population of interest. These findings underscored the necessity of multi-population investigations to comprehensively elucidate the etiology of PCa and reduce health disparities.
Methods
Study population
The overall design of this study is illustrated in Fig. 3. The Multi-Ethnic Study of Atherosclerosis (MESA) included approximately 6500 men and women without known clinical cardiovascular disease at baseline, aged 45–84 years who were initially enrolled in 200076. Baseline information and blood samples of the MESA participants were collected at their initial visit, which took place in six US states (New York, Maryland, Illinois, California, Minnesota, and North Carolina). Detailed information on the MESA study design can be found elsewhere76. The MESA study protocol was reviewed and approved by the Institutional Review Boards (IRBs) of all participating institutions as well as by the National Heart, Lung, and Blood Institute (NHLBI). All participants provided written informed consent to participate in the parent study and received financial compensation.
Key steps are illustrated with arrows connecting panels that indicate data processing, modeling, and analysis stages. Symbols and colors indicate different types of data and analytical procedures.
In this study, we focused on 2013 male subjects who had no self-reported cancer diagnosis and no ICD-9 or ICD-10 cancer diagnoses in their hospitalization records or death certificates at baseline, including 453 African, 765 European, 296 Asian, and 499 Hispanic/Latino individuals residing in the USA. Population groups in our analysis were initially defined based on self-reported race/ethnicity as recorded in the original studies. All subjects had data available on blood protein levels (v1), genotype, and relevant covariates, including body mass index (BMI), sex, age, cigarette smoking status, and pack-years of cigarette smoking.
Genotype data processing and quality control
The genotype data utilized in this study was generated using Affymetrix SNP 6.0, which was obtained from the MESA SNP Health Association Resource (SHARe) study (phs000420.v6.p3) and imputed on the Michigan imputation server (Minimac4.v1.0.0) using the 1000 Genomes reference panel77. For each population of interest, we excluded subjects who were identified as related via identity-by-descent (IBD) analysis, employing an in-house script that considered independent (R2 < 0.2) and common (minor allele frequency (MAF) ≥ 5%) SNPs. This process resulted in a final sample of 1971 unrelated male subjects, including 450 of African background, 758 of European background, 289 of Asian background, and of 474 Hispanic/Latino background. Subsequently, SNPs were filtered based on pre-determined criteria including MAF > 0.05, genotyping missingness <5%, and adherence to Hardy-Weinberg equilibrium (HWE, P > 5 × 10−6) using PLINK v1.9 software. After filtering for variants present in the 1000 Genomes reference panel, a total of 8,011,863, 5,814,519, 5,308,357, and 6,058,753 high-quality SNPs were retained for African, European, Asian, and Hispanic/Latino populations, respectively. The LD-pruned (R2 < 0.2), common (MAF ≥ 5%), and genotyped variants within 200 base pair windows were used to calculate genetic principal components (PCs) using the EIGENSOFT software78.
Proteomic data processing
Proteomic profiling was conducted using the aptamer-based SomaScan assay, which quantified 7289 human proteins. We excluded 10 SOMAmers whose target protein-encoding genes lacked positional information in the BioMart79 database. An additional 280 SOMAmers targeting proteins encoded on sex chromosomes were removed to focus the analysis on plasma proteins or protein complexes encoded by autosomal genes. After these exclusions, 6999 proteins were retained for downstream analysis. The residuals of these 6999 protein abundances after adjusting for covariates were then transformed using a rank-based inverse normal transformation for model building.
Building protein genetic prediction models
To identify informative SNPs for model building, we first conducted pQTL analyses adjusting for study site, BMI, sex, age, cigarette smoking status, pack-years of cigarette smoking, and top ten PCs. Significant cis-pQTLs were defined as SNPs within cis-regions associated with a protein at FDR < 0.05, while significant trans-pQTLs were defined as SNPs in trans-regions with P < 5 × 10−9. These significance thresholds were chosen to maximize the inclusion of potentially informative SNPs while minimizing excess noise80. We further extracted non-strand-ambiguous SNPs within 100 Kb of significant cis- and trans-pQTLs to serve as candidate predictors for each protein.
We used TWAS/FUSION framework81 to construct subsequent genetic prediction models. Four methods were used for model construction: best linear unbiased predictor (BLUP), least absolute shrinkage and selection operator (LASSO), elastic net, and top SNPs (top1). BLUP estimates the joint effect sizes of all SNPs using a single variance component82. LASSO is a penalized regression method utilizing L1 regularization techniques to produce sparse models83. As a generalization of the LASSO, elastic net linearly combines the L1-penalty of LASSO and L2-penalty of ridge regression, to select highly correlated variables together84. For each protein of interest, the prediction model with the most significant cross-validation P-value was retained. Only models with a cross-validation R2 > 0.01 (indicating they explain more than 1% of the variance, corresponding to a minimum ~10% correlation between predicted and measured protein levels) were included in subsequent association analysis. This threshold is commonly applied in similar studies8,80,85,86,87,88. Cross-validation was performed using a five-fold scheme, where the dataset was randomly divided into five equal parts. In each fold, models were trained on 80% of the data and tested on the remaining 20%, rotating such that each subset served as a test set once. The final cross-validation performance was calculated by regressing observed protein levels against the predicted values aggregated from all test folds. This adjusted R2 accounts for model complexity and sample size, providing a conservative measure of the variance explained by the genetic predictors.
We compared R2 and H² across African, European, Asian, and Hispanic/Latino populations using the Kruskal–Wallis test, given the non-normal distribution of R2 and H² values. When significant, post hoc pairwise comparisons were performed using Dunn’s test with Bonferroni correction. Analyses were conducted in R (version 4.1.2) using the ‘FSA’ package.
Validation of protein genetic prediction models of the European population
We conducted external validation of the European population models using data from 1685 healthy European males of the INTERVAL study, which measured plasma concentrations of 3622 proteins30. Genotype data were obtained using the Affymetrix Axiom UK Biobank genotyping array, and variants were phased with SHAPEIT3 before being imputed with a reference panel comprising the 1000 Genomes Phase 3-UK10K dataset. Log-transformed protein levels were adjusted for age, sex, duration between blood draw and processing, and the top three PCs30. The rank-inverse normalized residuals from the linear regression were used to compare with the predicted protein abundances, which were generated by applying the established genetic prediction models to the INTERVAL genetic data. Models with a performance R2 value ≥ 0.01 (at least ~10% correlation between predicted and measured protein levels) were considered to have successfully passed external validation18,80,86,89.
Associations between genetically predicted circulating protein levels and PCa risk
We evaluated the associations between genetically predicted protein levels and PCa risk by leveraging the summary statistics of a large GWAS meta-analysis4. This meta-analysis included 156,319 PCa cases and 666,248 controls, comprising 19,391 PCa cases and 61,608 controls of African population, 122,188 cases and 604,640 controls of European population, 10,809 cases and 95,790 controls of Asian population, and 3931 cases and 26,405 controls of Hispanic/Latino population4. The TWAS/FUSION framework was used to evaluate the associations between predicted protein levels and PCa risk. The trans-population meta-analysis via an inverse variance fixed-effect approach was further implemented using METAL with fixed-effects model90. The protein-PCa risk associations were determined to be statistically significant with a threshold of FDR < 0.05. Evidence of heterogeneity of the associations across racial/ethnic groups was assessed using the I2 statistic in METAL, with high heterogeneity defined as I2 > 75 and a heterogeneity P-value (HetPVal) < 0.0591.
Robustness test using 2ScML method
The protein prediction models in this study were built utilizing the TWAS/FUSION software, which allows for the incorporation of multiple correlated SNPs as predictors via methods such as LASSO, BLUP, or elastic net. In actual analyses, the assumption of PWAS might be violated, as it assumes that all employed SNPs serve as valid instrumental variables. To assess the robustness of our main results, we applied the 2ScML method to infer the likely causal effects of the associated proteins on PCa risk. This method relies on a plurality condition, which posits that the largest cluster of SNPs sharing the same causal effect comprises valid instrumental variables, thereby permitting more than 50% of the SNPs to be invalid92. By accounting for the horizontal pleiotropic effects of the SNPs used in the protein prediction models, 2ScML identifies valid instruments through constrained maximum likelihood estimation. This method is applicable when at least three predictors are present, enabling the selection of potentially pleiotropic SNPs. A threshold of FDR < 0.05 was applied to determine significant associations using the 2ScML method.
Tissue expression, functional enrichment, and network analysis
To evaluate tissue-specific expression, we cross-referenced 94 PCa–associated genes with RNA expression data from the HPA)93. Genes were classified based on HPA criteria as: (1) Elevated in prostate, (2) Elevated in other but expressed in prostate, (3) Low tissue specificity but expressed in prostate, (4) Not detected in prostate, (5) Not detected in any tissue. Classifications were confirmed using prostate-specific RNA expression levels, enabling assessment of prostate relevance and transcriptional activity of each gene.
To investigate the biological functions of 96 unique proteins encoded by 94 genes identified in the PWAS, we performed GO enrichment analysis across three domains: Biological Process, Molecular Function, and Cellular Component. GO enrichment was conducted using ‘ClusterProfiler’ package94, with the human genome (org.Hs.eg.db) as background. Only terms with adjusted P-value < 0.05 were considered significant.
Proteins operate within a complex network of intermolecular interactions rather than acting independently95. To understand the networks of PCa risk-associated proteins, we constructed a PPI network using the STRING database (https://string-db.org/, accessed on June 18, 2025). Additionally, we employed IPA (version 03-29-25) to perform an enrichment analysis of the genes encoding identified proteins, assessing their enrichment in canonical pathways, molecular and cellular functions, and networks. Detailed methodology for this tool has been described previously96.
Drug repurposing analysis
To explore potential drug repurposing opportunities, we queried the DrugBank database97 to investigate evidence of existing drugs targeting the identified associated proteins of interest. This analysis may reveal promising candidate drugs for further investigation of their potential efficacy in PCa treatment. We further assessed the potential relevance of these proteins to PCa using data from OpenTargets98. Specifically, proteins with a positive overallAssociationScore for PCa-related outcomes, including prostate-specific antigen levels, prostate carcinoma, prostate adenocarcinoma, and PCa, were retained for further consideration.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Specific genome, proteome, and covariate data of MESA76 have been deposited to the database of Genotypes and Phenotypes (dbGaP) under accession code phs000209.v13.p3 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000209.v13.p3). Additional data of MESA are available through a concept proposal application via MESA Genetics P and P Committee. These data are available under restricted access to protect participant privacy and comply with informed consent agreements. For data available through dbGaP, access is limited to qualified researchers who submit a Data Access Request (DAR) through dbGaP. Such requests must include a research use statement and a data use certification signed by the principal investigator and the institutional signing official. Requests will be reviewed by the relevant Data Access Committee (DAC) with a timeline developed by dbGaP. Once approved by dbGaP, access will be granted for one year. It requires an annual renewal via dbGaP to continue access beyond the initial 12-month period. Individual level data of genotype and proteomic data of INTERVAL30 study are available under controlled access in European Genome-phenome Archive (EGA) under accession number EGAS00001002555. Access is restricted to protect participant privacy and is granted to qualified researchers following approval by the EGA Data Access Committee. Researchers interested in accessing these data can submit a request via the EGA data access portal, and access can be provided after the EGA Data Access Committee reviews and approves it. DAC Aim to respond to all initial requests in less than 2 weeks. The length of time you can access the data depends on the terms set by the DAC. The publicly available summary statistics of multi-population PCa GWAS4 are available on the GWAS Catalog (https://www.ebi.ac.uk/gwas/). These statistics are categorized by different racial/ethnicity groups with the following accession codes: European (GCST90274714, https://www.ebi.ac.uk/gwas/studies/GCST90274714)4, African (GCST90274715, https://www.ebi.ac.uk/gwas/studies/GCST90274715)4, Asian (GCST90274716, https://www.ebi.ac.uk/gwas/studies/GCST90274716)4, and Hispanic/Latino (GCST90274717, https://www.ebi.ac.uk/gwas/studies/GCST90274717)4. The remaining data are available within the Article, Supplementary Information or Source Data file. Source data are provided with this paper.
Code availability
The code used to perform the analyses in this study is available at https://github.com/HuaZ-bioinfomatics/MESA-7K-PWAS-PCa and Zenodo https://zenodo.org/records/1728022599.
References
Bray, F. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74, 229–263 (2024).
Hemminki, K. Familial risk and familial survival in prostate cancer. World J. Urol. 30, 143–148 (2012).
Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).
Wang, A. et al. Characterizing prostate cancer risk through multi-ancestry genome-wide discovery of 187 novel risk variants. Nat. Genet. 55, 2065–2074 (2023).
Chimusa, E. R., Dalvie, S., Dandara, C., Wonkam, A. & Mazandu, G. K. Post genome-wide association analysis: dissecting computational pathway/network-based approaches. Brief. Bioinform. 20, 690–700 (2019).
Freedman, M. L. et al. Principles for the post-GWAS functional characterization of cancer risk loci. Nat. Genet. 43, 513–518 (2011).
Edwards, S. L., Beesley, J., French, J. D. & Dunning, A. M. Beyond GWASs: illuminating the dark road from association to function. Am. J. Hum. Genet. 93, 779–797 (2013).
Wu, L. et al. An integrative multi-omics analysis to identify candidate DNA methylation biomarkers related to prostate cancer risk. Nat. Commun. 11, 1–11 (2020).
Liu, D. et al. A transcriptome-wide association study identifies novel candidate susceptibility genes for prostate cancer risk. Int J. Cancer 150, 80–90 (2022).
Zhong, H., Liu, S., Zhu, J. & Wu, L. Associations between genetically predicted levels of blood metabolites and pancreatic cancer risk. Int. J. Cancer 153, 103–110 (2023).
Wu, L. et al. Identification of novel susceptibility loci and genes for prostate cancer risk: a transcriptome-wide association study in over 140,000 European Descendants. Cancer Res. 79, 3192–3204 (2019).
Stephan, C. et al. Improved prostate cancer detection with a human kallikrein 11 and percentage free PSA-based artificial neural network. Biol. Chem. 387, 801–805 (2006).
Nakashima, J. et al. Serum interleukin 6 as a prognostic factor in patients with prostate cancer. Clin. Cancer Res. 6, 2702–2706 (2000).
Uetsuki, H. et al. Expression of a novel biomarker, EPCA, in adenocarcinomas and precancerous lesions in the prostate. J. Urol. 174, 514–518 (2005).
Burgess, S., Small, D. S. & Thompson, S. G. A review of instrumental variable estimators for Mendelian randomization. Stat. Methods Med. Res. 26, 2333–2355 (2017).
Balázs, K., Antal, L., Sáfrány, G. & Lumniczky, K. Blood-derived biomarkers of diagnosis, prognosis and therapy response in prostate cancer patients. J. Personal. Med. 11, 296 (2021).
Zhang, J. et al. Plasma proteome analyses in individuals of European and African ancestry identify cis-pQTLs and models for proteome-wide association studies. Nat. Genet. 54, 593–602 (2022).
Zhong, H. et al. Identification of blood protein biomarkers associated with prostate cancer risk using genetic prediction models: analysis of over 140,000 subjects. Hum. Mol. Genet. 32, 3181–3193 (2023).
Wu, L. et al. Analysis of Over 140,000 European Descendants identifies genetically predicted blood protein biomarkers associated with prostate cancer risk. Cancer Res. 79, 4592–4598 (2019).
Farashi, S., Kryza, T., Clements, J. & Batra, J. Post-GWAS in prostate cancer: from genetic association to biological contribution. Nat. Rev. Cancer 19, 46–59 (2019).
Wang, G., Zhao, D., Spring, D. J. & DePinho, R. A. Genetics and biology of prostate cancer. Genes Dev. 32, 1105–1140 (2018).
Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73, 17–48 (2023).
Guo, J. et al. ANO7: Insights into topology, function, and potential applications as a biomarker and immunotherapy target. Tissue Cell 72, 101546 (2021).
Conti, D. V. et al. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet 53, 65–75 (2021).
Jiang, J. et al. ANO7 African-ancestral genomic diversity and advanced prostate cancer. Prostate Cancer Prostatic Dis. 27, 558–565 (2024).
Rayford, W. et al. Comparative analysis of 1152 African-American and European-American men with prostate cancer identifies distinct genomic and immunological differences. Commun. Biol. 4, 670 (2021).
Liu, D. et al. Associations between genetically predicted plasma N-glycans and prostate cancer risk: analysis of over 140,000 European descendants. Pharmgenom. Pers. Med. 14, 1211 (2021).
Brandes, N., Linial, N. & Linial, M. Genetic association studies of alterations in protein function expose recessive effects on cancer predisposition. Sci. Rep. 11, 1–16 (2021).
Turay, D. et al. Proteomic profiling of serum-derived exosomes from ethnically diverse prostate cancer patients. Cancer Investig. 34, 1–11 (2016).
Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018).
Lilja, H., Ulmert, D. & Vickers, A. J. Prostate-specific antigen and prostate cancer: prediction, detection and monitoring. Nat. Revi. Cancer 8, 268–278 (2008).
Whitaker, H. C., Warren, A. Y., Eeles, R., Kote-Jarai, Z. & Neal, D. E. The potential value of microseminoprotein-β as a prostate cancer biomarker and therapeutic target. Prostate 70, 333–340 (2010).
Benatzy, Y., Palmer, M. A. & Brüne, B. Arachidonate 15-lipoxygenase type B: regulation, function, and its role in pathophysiology. Front. Pharmacol. 13, 1042420 (2022).
Gregg, J. & Fraizer, G. Transcriptional regulation of EGR1 by EGF and the ERK signaling pathway in prostate cancer cells. Genes Cancer 2, 900−909 (2011).
Mehta, H. H. et al. IGFBP-3 is a metastasis suppression gene in prostate cancer. Cancer Res. 71, 5154−5163 (2011).
Hockla, A. et al. PRSS3/mesotrypsin is a therapeutic target for metastatic prostate cancer. Mol. Cancer Res. 10, 1555−1566 (2012).
Tregunna, R. Serum MMP7 levels could guide metastatic therapy for prostate cancer. Nat. Rev. Urol. 17, 658−658 (2020).
Pierconti, F. et al. Epigenetic silencing of SOCS3 identifies a subset of prostate cancer with an aggressive behavior. Prostate 71, (2011).
Kojima, S., Inahara, M., Suzuki, H., Ichikawa, T. & Furuya, Y. Implications of insulin-like growth factor-I for prostate cancer therapies: Review Article. Int. J. Urol. 16, 161–167 (2009).
Wu, J. & Yu, E. Insulin-like growth factor receptor-1 (IGF-IR) as a target for prostate cancer therapy. Cancer and Metastasis Reviews 33, 607–617 (2014).
Yang, H. et al. Matrix Metalloproteinase 11 Is a Potential Therapeutic Target in Lung Adenocarcinoma. Mol. Ther. Oncolytics 14, 82–93 (2019).
Tagirasa, R. & Yoo, E. Role of Serine Proteases at the Tumor-Stroma Interface. Front. Immunol. 13, 832418 (2022).
Barton, J., Blackledge, G. & Wakeling, A. Growth factors and their receptors: new targets for prostate cancer therapy. (2001).
Nguyen, D. P., Li, J. & Tewari, A. K. Inflammation and prostate cancer: The role of interleukin 6 (IL-6). BJU Int. 113, 986–992 (2014).
Stewart, D. A., Cooper, C. R. & Sikes, R. A. Changes in extracellular matrix (ECM) and ECM-associated proteins in the metastatic progression of prostate cancer. Reprod. Biol. Endocrinol. 2, 2 (2004).
Akaza, H. Future prospects for luteinizing hormone-releasing hormone analogues in prostate cancer treatment. Pharmacology 85, 110–120 (2010).
Huerta-Reyes, M. et al. Treatment of breast cancer with gonadotropin-releasing hormone analogs. Front. Oncol. 9, 943 (2019).
Prostate cancer cells tricked into self-destruction because of their hunger for copper. Pharm J. https://doi.org/10.1211/pj.2014.20066843 (2015).
Kaushik, A. K. et al. Inhibition of the hexosamine biosynthetic pathway promotes castration-resistant prostate cancer. Nat. Commun. 7, 11612 (2016).
Shah, K. et al. Androgen receptor signaling regulates the transcriptome of prostate cancer cells by modulating global alternative splicing. Oncogene 39, 6172–6189 (2020).
Penney, K. L. et al. Association of prostate cancer risk variants with gene expression in normal and tumor tissue. Cancer Epidemiol. Biomark. Prevent. 24, 255–260 (2015).
Desai, T. A. et al. Identifying proteomic risk factors for overall, aggressive, and early onset prostate cancer using Mendelian Randomisation and tumour spatial transcriptomics. EBioMedicine 105, 105168 (2024).
Ren, F., Jin, Q., Liu, T., Ren, X. & Zhan, Y. Proteome-wide mendelian randomization study implicates therapeutic targets in common cancers. J. Transl. Med. 21, 646 (2023).
Wu, J. et al. Proteome-wide Mendelian randomization identifies causal plasma proteins in prostate cancer development. Hum. Genom. 19, 17 (2025).
Thomas, G. et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nat. Genet 40, 310–315 (2008).
Eeles, R. A. et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nat. Genet. 40, 316–321 (2008).
Lou, H. et al. Fine mapping and functional analysis of a common variant in MSMB on chromosome 10q11.2 associated with prostate cancer susceptibility. Proc. Natl. Acad. Sci. USA 106, 7933–7938 (2009).
Camp, N. J. et al. Replication of the 10q11 and Xp11 prostate cancer risk variants: Results from a Utah pedigree-based study. Cancer Epidemiol. Biomark. Prevent. 18, 1290–1294 (2009).
Chang, B. L. et al. Fine mapping association study and functional analysis implicate a SNP in MSMB at 10q11 as a causal variant for prostate cancer risk. Hum. Mol. Genet. 18, (2009).
Mhatre, D. R. et al. The rs10993994 in the proximal MSMB promoter region is a functional polymorphism in Asian Indian subjects. Springerplus 4, 380 (2015).
Xu, B. et al. A functional polymorphism in MSMB gene promoter is associated with prostate cancer risk and serum MSMB expression. Prostate 70, 1146−1152 (2010).
Su, J. et al. LCTL is a prognostic biomarker and correlates with stromal and immune infiltration in gliomas. Front. Oncol. 9, 1083 (2019).
Chen, L. et al. The novel miR-1269b-regulated protein SVEP1 induces hepatocellular carcinoma proliferation and metastasis likely through the PI3K/Akt pathway. Cell Death Dis. 11, 320 (2020).
Zeng, C. et al. DPEP1 promotes drug resistance in colon cancer cells by forming a positive feedback loop with ASCL2. Cancer Med. 12, 412−424 (2023).
Zhang, G. et al. DPEP1 inhibits tumor cell invasiveness, enhances chemosensitivity and predicts clinical outcome in pancreatic ductal adenocarcinoma. PLoS ONE 7, e31507 (2012).
Iljin, K. et al. High-throughput cell-based screening of 4910 known drugs and drug-like small molecules identifies disulfiram as an inhibitor of prostate cancer cell growth. Clin. Cancer Res. 15, 6070−6078 (2009).
Lin, J. et al. Disulfiram is a DNA demethylating agent and inhibits prostate cancer cell growth. Prostate 71, 333−343 (2011).
Schweizer, M. T. et al. Pharmacodynamic study of disulfiram in men with non-metastatic recurrent prostate cancer. Prostate Cancer Prostatic Dis. 16, 357−361 (2013).
Zhang, T. et al. Prospective clinical trial of disulfiram plus copper in men with metastatic castration-resistant prostate cancer. Prostate 82, (2022).
Lim, J. T. E. et al. Sulindac derivatives inhibit growth and induce apoptosis in human prostate cancer cell lines. Biochem. Pharmacol. 58, 1097−1107 (1999).
Nortcliffe, A. et al. Synthesis and biological evaluation of nitric oxide-donating analogues of sulindac for prostate cancer treatment. Bioorg. Med. Chem. 22, 756−761 (2014).
Uslu, R. et al. Arsenic trioxide-mediated cytotoxicity and apoptosis inprostate and ovarian carcinoma cell lines. Clin. Cancer Res. 6, 4957−4964 (2000).
Maeda, H. et al. Tumor growth inhibition by arsenic trioxide (As2O3) in the orthotopic metastasis model of androgen-independent prostate cancer. Cancer Res. 61, 5432−5440 (2001).
Berry, W., Dakhil, S., Gregurich, M. A. & Asmar, L. Phase II trial of single-agent weekly docetaxel in hormone-refractory, symptomatic, metastatic carcinoma of the prostate. Semin Oncol. 28, 8–15 (2001).
Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
Bild, D. E. et al. Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol. 156, 871–881 (2002).
Mogil, L. S. et al. Genetic architecture of gene expression traits across diverse populations. PLoS Genet. 14, e1007586 (2018).
Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, 2074–2093 (2006).
Durinck, S. et al. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005).
Liu, S. et al. Identification of proteins associated with type 2 diabetes risk in diverse racial and ethnic populations. Diabetologia https://doi.org/10.1007/S00125-024-06277-3 (2024).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245−252 (2016).
Robinson, G. K. That BLUP is a good thing: the estimation of random effects. Stat. Sci. 6, 15−32 (1991).
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267−288 (1996).
Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 301−320 (2005).
Liu, S., Zhong, H., Zhu, J. & Wu, L. Identification of blood metabolites associated with risk of Alzheimer’s disease by integrating genomics and metabolomics data. Mol.Psychiatry 29, 1153–1162 (2024).
Zhu, J. et al. Associations between genetically predicted plasma protein levels and Alzheimer’s disease risk: a study using genetic prediction models. Alzheimers Res. Ther. 16, 8 (2024).
He, J. et al. Enhancing disease risk gene discovery by integrating transcription factor-linked trans-variants into transcriptome-wide association analyses. Nucleic Acids Res. 53, gkae1035 (2025).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Zhu, J. et al. Proteome-wide association study and functional validation identify novel protein markers for pancreatic ductal adenocarcinoma. Gigascience 13, giae012 (2024).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Graff, M. et al. Discovery and fine-mapping of height loci via high-density imputation of GWASs in individuals of African ancestry. Am. J. Hum. Genet. 108, 564−582 (2021).
Xue, H., Shen, X. & Pan, W. Causal inference in transcriptome-wide association studies with invalid instruments and GWAS summary data. J. Am. Stat. Assoc. 118, 1525–1537 (2023).
Uhlén, M. et al. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2, 100141 (2021).
Lehne, B. & Schlitt, T. Protein-protein interaction databases: keeping up with growing interactomes. Hum. Genom. 3, 1–7 (2009).
Krämer, A., Green, J., Pollard, J. Jr & Tugendreich, S. Causal analysis approaches in ingenuity pathway analysis. Bioinformatics 30, 523–530 (2014).
Wishart, D. S. et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34, D668–D672 (2006).
Koscielny, G. et al. Open Targets: a platform for therapeutic target identification and Validation. Nucleic Acids Res. 45, D985−D994 (2017).
Hua, Z. Proteome-wide association study of prostate cancer risk across populations. Zenodo https://doi.org/10.5281/ZENODO.17280225 (2025).
Acknowledgements
The authors also would like to thank all of the individuals for their participation in the parent studies and all the researchers, clinicians, technicians and administrative staff for their contribution to the studies. The contents of this paper are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. Lang Wu is supported by National Human Genome Research Institute/National Institute on Minority Health and Health Disparities (NHGRI/NIMHD) U54 HG013243 and National Cancer Institute R01CA263494 and U01CA293883. Hua Zhong and Shuai Liu are partially supported under award number U24DK132746-01, UCLA LIFT-UP (Leveraging Institutional support for Talented, Upcoming Physicians and/or Scientists). Peter Ganz, Rajat Deo, and Ruth F Dubin are supported by R01HL159081. The MESA projects are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163,75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1TR001881, DK063491, and R01HL105756. The authors thank the other investigators, the staff, and the participants of the MESA study for their valuable contributions. A full list of participating MESA investigators and institutes can be found at http://www.mesa-nhlbi.org.
Author information
Authors and Affiliations
Contributions
L.W. conceived the study and supervised the work. J.Z. contributed to the study design and conducted drug repurposing analysis. S.L., H.Z., and J.Z. performed statistical analyses. H.Z. performed PPI and IPA analyses. H.Z., J.Z, S.L., and L.W. wrote the initial draft manuscript. C.W., L.W., S.P.W., C.H.M., M.J.B., P.D., X.G., C.W.J., H.J.L., K.D.T., R.P.T., R.Y., A.W.M., S.S.R., J.I.R., R.D., R.F.D., and P.G. conducted the experiment, contributed materials, and provided valuable feedback for revising the manuscript. All authors have reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
Lang Wu provided consulting services to Pupil Bio Inc., Techspert, and Galiher DeRobertis & Waxman LLP, and reviewed manuscripts for Gastroenterology Report, not related to this study, and received honorarium. No potential conflicts of interest were disclosed by the other authors.
Peer review
Peer review information
Nature Communications thanks Zhongming Zhao, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhong, H., Zhu, J., Liu, S. et al. Proteome-wide association study of prostate cancer risk across populations. Nat Commun 17, 3043 (2026). https://doi.org/10.1038/s41467-025-66250-5
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-66250-5





