Introduction

Lung cancer (LC), the leading cause of cancer-related deaths globally1, is categorized into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC), with NSCLC being the more prevalent type. While smoking is the primary risk factor, non-smokers may also develop LC due to genetic predisposition, air pollution, or occupational exposure2. Symptoms of LC vary, but typically include persistent cough, hemoptysis, chest pain, shortness of breath, hoarseness, weight loss, and fatigue3. Early symptoms are often subtle, leading to high diagnostic costs, with most LC cases diagnosed at advanced stages4,5. Despite improvements in treatment modalities such as surgery, radiotherapy, chemotherapy, immunotherapy, and targeted therapies, the five-year survival rate remains low (15–20%)1,6. Targeted therapy has become the primary treatment for advanced LC, but challenges remain, including inefficacy in the absence of driver mutations and frequent acquired resistance7. Consequently, identifying new therapeutic targets for LC continues to be a critical area of research.

Expression quantitative trait loci (eQTLs), primarily single-nucleotide polymorphisms (SNPs), are genomic variants that regulate gene expression and can be classified as cis-eQTLs or trans-eQTLs, depending on their proximity to the target genes8,9. Similarly, protein quantitative trait loci (pQTLs) map genetic variants that influence protein levels, with cis-pQTLs located near coding regions10. eQTLs analysis involves assessing SNP-gene expression correlations and physical distance, with databases like the eQTL Gen Phase II providing transcriptomic insights for complex traits derived from blood11. While numerous large-scale genome-wide association studies (GWAS) have identified SNPs associated with LC risk, most studies focus on eQTLs or pQTLs in isolation, with limited integration of multi-omics approaches. Moreover, many of these SNPs reside in non-coding regions or gene intervals, resulting in GWAS data that offer limited insights into pathogenic genes and drug targets. Therefore, this study aims to provide further theoretical support for identifying drug targets in LC.

Traditional observational studies are often hindered by confounding bias and reverse causality in causal inference. Mendelian randomization (MR) addresses these limitations by using genetic variants as instrumental variables (IVs) to estimate causal effects between exposures and outcomes, provided that key assumptions (relevance, independence, exclusion restriction) are met12,13. Gene expression levels are influenced by eQTLs, with cis-eQTLs commonly recognized as regulatory factors affecting gene expression in drug target MR analysis. The robustness of MR against unmeasured confounding has established it as a crucial method for identifying causal biomarkers and therapeutic targets across various complex diseases, including bipolar disorder, inflammatory bowel disease, coronary heart disease, sepsis, and aortic aneurysms14,15,16,17,18. In drug target MR studies, cis-eQTLs act as genetic proxies for gene expression modulation9. Recent MR analyses also assess potential side effects of drug targets and their adverse reactions. Notably, MR analysis of LC is gaining traction, with studies demonstrating potential causal relationships between specific gut microbiota and SCLC, as well as genetic predictors of schizophrenia and LC19,20. However, comprehensive genomic evidence supporting druggable targets for LC remains lacking.

This study identified causal drug targets for LC through MR analyses integrating multi-omics data. Additionally, co-localization analysis was performed to prioritize therapeutic candidates and validate their clinical potential.

Materials and methods

Data extraction

Data for LC, plasma genes, and plasma proteins were obtained from the publicly available Integrative Epidemiology Unit (IEU) Open GWAS database (https://gwas.mrcieu.ac.uk/) on December 21, 2023. The LC dataset included two GWAS IDs: ieu-a-987 (used as the training set), comprising 85,449 samples (29,863 LC and 55,586 controls) with a total of 10,439,018 SNPs, and ieu-a-966 (used as the testing set), comprising 27,209 samples (11,348 LC and 15,861 controls) with 8,945,893 SNPs. Additionally, the eQTL dataset included 19,942 genes, and the pQTL dataset included 3622 plasma proteins derived from 3301 healthy samples from the Open GWAS database, as well as 1531 plasma proteins from the literature21,22. In this study, the eQTL and pQTL of plasma were collectively referred to as epQTL. Furthermore, this study adhered to the STROBE-MR checklist.

Data pre-processing for MR study

In this MR analysis, epQTLs were treated as exposure factors, and LC was regarded as the outcome. The three main assumptions of classical MR analysis were satisfied: (i) the independence assumption (IVs are not associated with any confounders); (ii) the association assumption (IVs directly affect exposure); and (iii) the exclusivity assumption (IVs affect the outcome solely through the exposure and not via other pathways). To identify and screen IVs, the following steps were performed. First, the exposure factors were screened for IVs (p < 5 × 10–8) using the ‘extract instruments’ function from the “TwoSampleMR” R package (version 0.5.6)23. This stringent p-value threshold ensures that only highly significant genetic variants are selected as potential IVs, minimizing the risk of weak or spurious associations. Simultaneously, SNPs in linkage disequilibrium (LD) were removed (r2 = 0.001, kb = 10,000), eliminating redundant genetic variants that are highly correlated and reducing multicollinearity, thus improving the independence of the selected IVs. Next, the strength of each IV was assessed using F-statistics (F > 10), which ensures that the IVs are sufficiently strong, reducing the potential for weak instrument bias and enhancing the reliability of causal inference. F-statistics were calculated using the formula:

$$F = \beta^{2} /se^{2}$$

where “se” indicates the standard error. Subsequently, SNPs associated with the outcome were removed, and effect alleles and effect sizes were harmonized. After eliminating duplicates, the remaining SNPs were used for the subsequent MR analysis.

MR study

In the MR analysis, five algorithms were employed using the ‘mr’ function: Inverse variance weighted (IVW)24, MR Egger25, Weighted median26, Simple mode27, and Weighted mode26. Among these, IVW was regarded as the most crucial method. The weighting formula for IVW was calculated as follows:

$$W_{i} = \frac{1}{{se^{2} \left( {\hat{\theta }_{i} } \right)}} \times \frac{{\beta_{i}^{2} }}{{se^{2} \left( {\gamma_{i} } \right)}}$$

For the ith IV, \({\upbeta }_{{\text{i}}}\) represents the effect estimate of the IV on the exposure, \({\upgamma }_{{\text{i}}}\) indicates the effect estimate of the IV on the outcome, and se denotes the standard error. Missing values were handled based on their quantity: if missing values were minimal, observations with missing data were excluded using the na.omit function; otherwise, multiple imputation via the mice package was employed to address larger amounts of missing data.

The inclusion criteria for MR results were PIVW < 0.05 and a minimum of three SNPs (SNP ≥ 3). The odds ratio (OR) quantifies the likelihood of one event relative to another. An OR greater than 1 suggests an increased likelihood of the event due to the risk factor, while an OR less than 1 indicates a protective effect. Scatter plots were generated to assess the correlation between SNPs of the exposure and the outcome, and forest plots were created to illustrate the diagnostic impact of exposure on the outcome. A funnel plot was also constructed to evaluate the symmetry of causal effects, aiding in the detection of publication bias, assessment of small-study effects, and ensuring the overall reliability of the results.

A series of sensitivity analyses were conducted to assess the robustness of the MR results. Heterogeneity was tested using the mr_heterogeneity function28 based on Cochran’s Q test. When P > 0.05, the fixed IVW method was applied; if P < 0.05, the random IVW method was used. Additionally, the MR-Egger method from the “TwoSampleMR” R package (version 0.5.6) was employed to test for horizontal pleiotropy, with a P-value greater than 0.05 indicating the absence of horizontal pleiotropy. The MR-PRESSO method from the “MRPRESSO” R package (version 1.0)29 was also used to identify potential confounders. A P-value greater than 0.05 suggested no confounding factors. Furthermore, a leave-one-out (LOO) analysis was performed using the mr_leaveoneout function30.

Steiger analysis

To establish the causal relationship, the Steiger analysis was applied using “TwoSampleMR,” with the criteria for passing the Steiger test being a correct causal direction (value = 1) and a Steiger test adjusted P-value < 0.05.

Colocalization analysis

To identify potential drug targets for LC, epQTL-GWAS colocalization analysis was performed using the “coloc” R package (version 5.2.2)31. Four hypotheses were tested in the colocalization analysis: Hypothesis 0 (H0) indicating no association with GWAS and epQTL, Hypotheses 1 and 2 (H1/H2) indicating association with either GWAS or epQTL, Hypothesis 3 (H3) indicating that both GWAS and epQTL are associated but with distinct causal variants, and Hypothesis 4 (H4) indicating shared causal variants for both traits. Posterior probabilities (PPs) were calculated for each hypothesis. A PP greater than 0.6 for H4 suggests that colocalization analysis is valid, indicating that these epQTLs could serve as potential drug targets for LC.

Drug and disease predicted, molecular docking analysis

Based on the identified drug targets, the Comparative Toxicogenomics Database (CTD, http://ctdbase.org) was utilized to predict potential drugs and associated diseases. Subsequently, target-drug and drug-disease networks were constructed and visualized using Cytoscape (version 3.10.1)32.

To investigate the interaction between the targets and drugs, drugs with the highest degree values from the target-drug network were selected for molecular docking analysis. Protein crystal structures of the targets (acting as receptors) were sourced from the RCSB Protein Data Bank (PDB, https://www.rcsb.org/), while the 3D molecular structures of the selected drugs (acting as ligands) were obtained from the PubChem database (https://pubchem.ncbi.nlm.nih.gov/). Molecular docking was conducted using AutoDock Vina (https://vina.scripps.edu/), and the binding energies were calculated. A binding energy below -5.0 kcal/mol typically suggests a strong affinity between the molecules for effective binding.

Phenotypic scanning

To further investigate the potential side effects of interventions that reduce LC risk by targeting the identified drug targets, an agnostic phenome-wide MR (PheW-MR) analysis was performed. This analysis used the same cis-epQTL, with genetic instruments for disease traits selected from the IEU Open GWAS database.

Validation analysis

The same MR analysis methods were applied to validate the results, using the identified potential drug targets as exposure factors and LC (GWAS ID: ieu-a-966) as the outcome. The methodology and procedures for this analysis align with Sects. "Data pre-processing for MR study" and "MR study" of this study.

Statistical Analysis

Statistical analyses were conducted using R (version 4.2.2), with differences analyzed via the Wilcoxon test (P < 0.05). The overall study design is illustrated in Fig. 1.

Fig. 1
figure 1

Overview of the study design to identify the drug target for lung cancer.

Results

383 epQTLs had a causal relationship with LC

After screening, 5,522 epQTLs (340 eQTLs and 5,182 pQTLs) were retained as exposure factors for further analysis. The univariate MR analysis revealed a causal relationship between 383 epQTLs (352 eQTLs and 31 pQTLs) and LC using the IVW method (p < 0.05). Among these, 199 risk factors were identified, including GBP6 (OR = 1.072, 95% confidence interval (CI) = 1.002–1.147, p = 0.043) and GPD1L (OR = 1.073, 95% CI = 1.011–1.138, p = 0.020), and 184 protective factors were identified, such as FAM3D (OR = 0.942, 95% CI = 0.917–0.968, p < 0.001) (Supplementary Table 1).The scatter plot corroborated these findings, with FAM3D showing a negative slope and BCL2L13 showing a positive slope, indicating that FAM3D is a protective factor, while BCL2L13 is a risk factor (Fig. 2a). Forest plot analysis further supported these results, with the MR effect size for BCL2L13 greater than 0, while the effect size for FAM3D was less than 0 in the IVW method (Fig. 2b). The funnel plot (Fig. 2c) demonstrated that IVs were symmetrically distributed around the IVW line, consistent with Mendel’s second law. Additionally, heterogeneity testing revealed a P-value for exposure factors greater than 0.05, indicating no heterogeneity in the MR study (Table 1, Supplementary Table 2). The MR-Egger method also showed no evidence of horizontal pleiotropy (p > 0.05). In the MR-PRESSO analysis, 130 eQTLs and 28 pQTLs were missing (NA), while the remaining 222 eQTLs and 3 pQTLs showed no evidence of confounding factors (p > 0.05) (Table 2, Supplementary Table 3). The LOO analysis indicated no significant aberrations, supporting the reliability of the MR results (Fig. 2d). Ultimately, 383 epQTLs were confirmed to have a causal relationship with LC in our MR study.

Fig. 2
figure 2

MR analyses of the causal effect of epQTL on LC. (a) Scatter plots for MR analyses of the causal effect of epQTL on LC; (b) forest map for MR analyses of the causal effect of epQTL on LC; (c) funnel plot for MR analyses of the causal effect of epQTL on LC; (d) leave-one-out for MR analyses of the causal effect of epQTL on LC.

Table 1 Partial results of heterogeneity test.
Table 2 Partial results of horizontal pleiotropy test.

Steiger analysis confirmed the causality of 333 epQTL to LC was real and effective

To determine the precise causal relationship between 383 epQTLs and LC, a Steiger analysis was performed, with LC as the exposure and epQTL as the outcome. A total of 333 epQTLs (305 eQTLs and 28 pQTLs) passed the Steiger test, confirming that the causality between epQTLs and LC was valid and not influenced by reverse causality (Table 3, Supplementary Table 4).

Table 3 Results of Steiger analysis (partial positive result).

The potential drug targets were identified using colocalization analysis

Colocalization analysis identified 20 epQTLs (20 eQTLs and 0 pQTLs) that passed the test, including B3GNT5 (PP.H4 = 0.671), BRAT1 (PP.H4 = 0.738), and COPS3 (PP.H4 = 0.996) (Table 4). These were considered potential drug targets. As shown in Fig. 3, BRAT1 exhibited a strong association with LC, with the rs13243437 locus linking BRAT1 to LC and supporting a causal relationship.

Table 4 Colocalization results of gene and outcome.
Fig. 3
figure 3

The colocation analysis results of eQTL of the BRAT1 gene and LC. The figure shows that eqtl-a-ENSG00000106009 (BRAT1) is strongly associated with LC, and this region can establish a causal relationship between BRAT1 and LC through the rs13243437 site.

The targets-drugs and drugs-diseases networks might be helpful for treating LC

From the 20 drug targets identified through colocalization, 257 drugs were predicted. A target-drug network was constructed, consisting of 275 nodes (18 targets and 257 drugs) and 692 edges, including PTGFR-D005557, IREB2-C004925, and others (Fig. 4a). Additionally, 17 diseases were predicted based on the targeted drugs, including liver cirrhosis, epilepsy, and schizophrenia. A drug-disease network was created, comprising 139 nodes (17 diseases and 122 drugs) and 643 edges, with examples such as D007213-Liver Cirrhosis and D013749-Schizophrenia (Fig. 4b).

Fig. 4
figure 4

The target-drug and drug-disease networks. (a) Target-drug network, comprising 275 nodes (18 targets and 257 drugs) and 692 edges, where the red circles represent drug targets and the blue diamonds represent drugs; (b) Drug-disease network, comprising 139 nodes (17 diseases and 122 drugs) and 643 edges, with red circles representing drug targets and blue diamonds representing drugs. The line thickness indicates the confidence between drug targets and diseases, with thicker lines signifying higher confidence.

Valproic acid might be helpful for treating LC

In the target-drug network, valproic acid and bisphenol A exhibited the highest degree values. However, due to potential health risks associated with bisphenol A, including its weak estrogenic effects that may interfere with the human endocrine system and its lack of medicinal properties33, valproic acid was selected for molecular docking analysis. Among the 18 targets, protein structures were available for only 8 genes (BRAT1, H2BC11, IREB2, MICAL1, MPHOSPH6, PTGFR, RHNO1, and SERPING1), so molecular docking was conducted for these 8 genes. The results revealed favorable binding energies between valproic acid and all 8 genes, with the following binding energies: valproic acid and BRAT1 at − 4.9 kcal/mol (Fig. 5a), valproic acid and H2BC11 at − 5.3 kcal/mol (Fig. 5b), valproic acid and IREB2 at − 5.0 kcal/mol (Fig. 5c), valproic acid and MICAL1 at − 4.8 kcal/mol (Fig. 5d), valproic acid and MPHOSPH6 at − 4.2 kcal/mol (Fig. 5e), valproic acid and PTGFR at − 5.2 kcal/mol (Fig. 5f), valproic acid and RHNO1 at − 6.2 kcal/mol (Fig. 5g), and valproic acid and SERPING1 at − 4.9 kcal/mol (Fig. 5h).

Fig. 5
figure 5

Visualization of molecular docking results. (a-h) The docking site of the valproic acid drug molecule on the (a) BRAT1, (b) H2BC11, (c) IREB2, (d) MICAL1, (e) MPHOSPH6, (f) PTGFR, (g) RHNO1, (h) SERPING1 proteins.

Some potential side effects might occur in LC patients

A search of the GWAS IDs for 17 diseases revealed 6 diseases with relevant studies, including liver cirrhosis, endometriosis, and schizophrenia. These 6 diseases corresponded to 20 epQTLs, requiring 120 phenotypic scans. Seven results were visualized, such as the phenotype scan showing that GBAP1 had a causal relationship with influenza (OR = 1.130, p = 0.006), identifying GBAP1 as a risk factor (Fig. 6). However, the MR study demonstrated a causal relationship between GBAP1 and LC, with GBAP1 acting as a protective factor (OR = 0.908, p = 0.08). This suggests that, in the treatment of LC, potential side effects may arise if the patient also has influenza.

Fig. 6
figure 6

Results of phenotypic scanning. snp, single nucleotide polymorphisms used in MR; OR, odds ratio.

A total of 7 key drug targets passed validation

To further validate the 20 drug targets identified in the previous studies, a validation analysis was conducted. Seven key drug targets (SERPING1, TDRD9, GBAP1, FAM241A, ZKSCAN4, ZKSCAN3, Z94721.1) were significantly associated across two MR studies, with one outcome GWAS ID being ieu-a-987 and the other being ieu-a-966. These 7 drug targets were thus classified as key drug targets (Fig. 7).

Fig. 7
figure 7

Results of queue validation. snp, single nucleotide polymorphisms used in MR; OR, odds ratio.

Discussion

LC remains a significant threat to human health, with current treatment methods facing limitations such as restricted efficacy, substantial adverse reactions, and varying degrees of drug resistance, highlighting the urgent need for new therapeutic targets and strategies7. MR has emerged as a powerful tool, with numerous studies utilizing genetic variation as IVs to infer causality between potential drug targets and disease outcomes34,35. The integration of MR with genetic and proteomic data offers a novel approach to identifying promising targets for LC treatment.

This study conducted a large-scale MR analysis incorporating LC along with plasma gene and protein data obtained from the IEU Open GWAS database. Multiple MR analyses provided strong evidence of an association between 20 predicted eQTLs and LC. These 20 eQTLs, including B3GNT5, BRAT1, COPS3, FAM241A, and GBAP1, are proposed as potential drug targets for LC. Additionally, the CTD database was employed for drug and disease prediction, followed by molecular docking to validate the pharmaceutical potential of the identified targets. To assess potential side effects associated with candidate drug targets, phenotype scanning was performed, revealing 7 key drug targets (SERPING1, TDRD9, GBAP1, FAM241A, ZKSCAN4, ZKSCAN3, Z94721.1) that showed significant overlap. For instance, GBAP1, identified as a potential drug target, exhibited a causal relationship with influenza (p < 0.05) and was classified as a risk factor. However, MR analysis indicated that GBAP1 is protective for LC, suggesting that potential side effects may arise if LC patients with GBAP1 as a target also have influenza.

SERPING1 (C1-inhibitor, C1INH), a member of the serine protease inhibitor family G1, encodes a highly glycosylated plasma protein involved in complement activation, contact, coagulation, and fibrinolysis systems36. Prior studies have demonstrated the strong anti-inflammatory functions of SERPING1 both in vivo and in vitro. While SERPING1 is vital for various physiological processes, its deficiency is well-documented in hereditary angioedema (HAE)37. Numerous studies have also linked SERPING1 to cancer. It has been associated with lymph node and bone metastasis in breast cancer38,39,40, and a decrease in SERPING1 mRNA levels correlates with lower survival rates and increased malignancy in prostate cancer40. Furthermore, SERPING1 has been implicated in the diagnosis and prognosis of liver cancer, ovarian cancer, colon cancer, glioma, and other cancers41,42,43,44,45. Two studies specifically on LC indicated that SERPING1 is underexpressed in LC and serves as an independent prognostic predictor in NSCLC46,47, which aligns with our research findings.

GBAP1, a glucosylceramidase (GBA) pseudogene 1, exhibits 96% homology with the GBA sequence and is located 16 kb downstream of the functional gene48,49. Despite limited available data on GBAP1, reported studies have explored its role in Parkinson’s disease, liver cancer, and gastric cancer50,51. Additionally, GBAP1 acts as a protective factor in gastric cancer, while exhibiting pro-oncogenic functions in hepatocellular carcinoma (HCC), positioning it as a potential prognostic biomarker and therapeutic target52,53. In influenza-related studies, only GBA—not GBAP1—has been investigated in relation to the influenza virus54. Interestingly, in LC patients, influenza virus infection has been linked to increased disease progression55. A population-based study revealed that regional influenza-like illness (ILI) activity is associated with higher mortality rates in NSCLC patients53. Furthermore, published research indicates that after recovery from influenza A virus (IAV) infection, the lungs can develop long-lasting antitumor immunity56. However, no research has yet explored the role of GBAP1 in LC and influenza. This study fills this gap by providing reliable evidence for the potential involvement of GBAP1 in this context.

Tudor domain-containing protein 9 (TDRD9), an RNA helicase with a TUDOR domain, is primarily expressed in the germline and participates in the biosynthesis of PIWI-interacting RNAs57,58. Most existing research on TDRD9 has focused on male infertility, with fewer studies examining its role in other fields. Notably, a study on the involvement of TDRD9 in LC found that TDRD9 is significantly upregulated in NSCLC and its derived cell lines due to the low methylation of CpG islands. Additionally, the expression of TDRD9 has been associated with poor prognosis in lung adenocarcinoma59. Our findings corroborate a causal relationship between TDRD9 and LC, this contrasts with previous research and warrants further investigation through mechanistic studies.

ZKSCAN3 (zinc-finger with KRAB and SCAN domains 3), a member of the zinc-finger transcription factor family, is widely expressed in human tissues and plays a role in regulating various physiological processes, including cell proliferation, apoptosis, autophagy, and tumor transformation60. Numerous studies suggest that ZKSCAN3 inhibits the expression of autophagy lysosomal mediators in certain cancer cells, thus impeding cancer progression and positioning it as a potential therapeutic target61,62,63. A GWAS and subsequent large-scale follow-up identified ZKSCAN3 as a novel locus influencing lung function, which may also be linked to other complex traits and diseases64. Ouyang et al. found that ZKSCAN3 contributes to the response to severe lung infections, including susceptibility to secondary bacterial infections following immunosuppression65. While previous studies have reported associations between ZKSCAN3 and various cancers and diseases, its role in LC has not been explored. Our study provides the first evidence that ZKSCAN3 could serve as a therapeutic target for LC, further advancing research on this gene.

ZKSCAN4 (Zinc-finger with KRAB and SCAN domains 4), also known as ZNF307, belongs to the same family as ZKSCAN3. It is expressed in various tissues, including the kidneys, mouth, skin, lungs, brain, spleen, and liver66. In HEK-293 cells, ZKSCAN4 inhibits the transcriptional activity of p53 and p21 and interacts with glucocorticoid receptors66,67. In liver cancer, ZKSCAN4 has been shown to act as a tumor suppressor68. In recent years, studies on ZKSCAN4 have been limited to its role in post-traumatic stress disorder, gastrointestinal disorders, rheumatoid arthritis, and osteoarthritis with metabolic syndrome69,70,71. However, its role in LC remains largely unexplored, and our research addresses this gap.

The gene FAM241A has been shown to play a tumor-suppressive role in human LC through the ANXA2P1/miR-20b-5p/FAM241A axis, offering significant diagnostic and prognostic value72. As for Z94721.1, little is known about this gene, with only two studies suggesting its potential as a tumor prognostic marker in esophageal and ovarian cancers73,74.

The strength of this study lies in its use of MR methods, providing a robust framework to assess causal relationships and mitigating the limitations of observational studies. Moreover, the 20 potential drug targets for LC treatment were corroborated through colocalization analysis, enhancing the reliability of the findings and minimizing the risk of false positives. The CTD database was utilized for drug and disease prediction, followed by molecular docking and phenotype scanning of the predicted diseases, which yielded 7 positive results, thus adding depth and credibility to our MR study. This research highlights genes that could serve as drug targets to enhance the efficacy, safety, and success rate of drug development for LC, addressing the limitations of large-scale randomized clinical trials. Seven drug targets related to LC were identified in our population-based study, some of which have already been partially supported by existing research45,46,59,72. These findings offer valuable insights for future drug development. Additionally, our analysis of drug and disease prediction helps identify potential safety concerns, which is crucial if these targets are to be used in clinical settings.

This study has several limitations. First, it predominantly focused on European populations, necessitating further validation in diverse ethnic groups to assess the broader applicability of the findings. Second, the lack of stratification in the GWAS data based on specific LC subtypes limits the ability to conduct detailed subgroup analyses. Third, while blood-derived eQTL/pQTL data reflect systemic physiological and pathological changes, they are less capable of capturing lung tissue-specific gene and protein expression profiles. Thus, exclusive reliance on blood-based epigenetic QTLs may not fully capture the pathogenesis of pulmonary diseases or their therapeutic potential. Moreover, the phenotype-associated genomic loci identified in this study are likely involved in the early stages of lung carcinogenesis, yet their role in disease progression remains unclear. Due to the observational nature of the study, it is not possible to directly validate the interventional effects of targeting these loci on established LC. Future research should include multi-ethnic cohorts to assess the generalizability of the findings across diverse populations and incorporate molecular subtype-based stratification analyses. Additionally, integrating lung tissue and single-cell multi-omics data is essential to address the limitations of blood-derived data in reflecting lung-specific biological processes. Finally, preclinical studies, including in vitro cell models and in vivo animal experiments, are critical. Using genetically engineered models and animal survival analyses, these studies should aim to definitively establish the functional roles of these QTLs in LC development, assessing their feasibility as therapeutic or preventive targets.

Conclusions

In conclusion, this study employed comprehensive MR methods to establish a causal relationship between 333 epQTLs and LC, identifying 20 potential drug targets for LC treatment through colocalization analysis. Notably, 7 of these drug targets were successfully validated and can be considered key targets for LC therapy. These robust findings could pave the way for new strategies in the clinical diagnosis and treatment of LC. However, as the study primarily involves bioinformatics analyses, further in vivo and in vitro research is required to validate these drug targets and prioritize their development for LC therapy.