Introduction

Head and neck cancer, the seventh most common malignancy worldwide, accounts for more than 870,000 cases and 440,000 deaths annually1,2. Head and neck squamous cell carcinomas (HNSCC) account for ~90% of all cases, with tobacco use and excessive alcohol consumption considered among the most significant modifiable risk factors3,4,5,6. Additionally, Human Papilloma Virus (HPV) plays a significant role in the pathogenesis of HNSCC, particularly for oropharyngeal cancer (OPC). HPV-associated HNSCC is now recognized as a distinct entity from HPV-negative HNSCC with different epidemiology, risk factors, treatment regimens, and prognosis7,8,9,10,11. The latest edition of American Joint Committee on Cancer (AJCC) staging of OPC further highlights this distinction by incorporating HPV status in its staging12.

Although the association between smoking and HPV-negative HNSCC is well established, uncertainties persist regarding the causal effects of tobacco smoking and HPV-positive HNSCC. Some studies have indicated a positive correlation, while others have found no link between tobacco smoking and HPV-positive HNSCC13,14. These uncertainties can be attributed, in part, to the limitations of observational studies, alongside small cohorts for this relatively rare cancer, heterogeneity in study design between cohorts, and lack of distinction between HPV-positive and negative HNSCC within the studies. Similarly, uncertainty persists regarding the differences in the association of alcohol consumption with HPV-positive HNSCC and HPV-negative HNSCC, although the interactive effects of alcohol consumption and HPV status in increasing the risk of HPV-positive OPC has been previously reported15. Our study examines the association of tobacco smoking and alcohol consumption with the risk of each distinct cancer separately in one of the largest study populations to date.

Mendelian randomization (MR) is a genetic epidemiological approach that utilizes single nucleotide polymorphisms (SNPs) randomized during meiosis as instrumental variables to infer the effect of an exposure on an outcome. This approach attempts to mitigate the limitations of observational studies, such as confounding, reverse causation, and measurement error16. MR is based on three key assumptions: 1) the genetic variants used as instruments for the exposure must be valid and robustly associated with the exposure, 2) there should be no measured or unmeasured confounding of the association between the genetic instrument and the outcome, and 3) the variants should have no independent effect on the outcome other than through the exposure of interest17. MR is useful for exposures such as smoking and alcohol consumption, where obtaining unconfounded estimates by randomizing individuals to such exposures would be unfeasible and unethical. Furthermore, multivariable MR allows for the simultaneous estimation of the independent and joint effects of two or more exposures on an outcome18, which is particularly relevant given that the combined exposure to tobacco and alcohol has been demonstrated to exert a significant synergistic effect on the incidence of HNSCC19.

Large-scale genome-wide association studies (GWAS) reported SNPs reliably associated with smoking and drinking behaviors20,21. Using these SNPs as genetic instruments for the exposures and outcome data obtained from a large HNSCC GWAS22, we performed MR to estimate the risk effect of tobacco smoking and alcohol consumption on HPV-positive and HPV-negative HNSCC subtypes. A previous study from our group used MR to assess the independent causal effects of smoking and alcohol on HNSCC using summary genetic data23. The current study utilizes individual-level genetic data plus the available HPV status information to conduct an MR study evaluating HNSCC risk stratified by HPV status. We used univariable and multivariable MR methods to demonstrate independent causal effects of smoking as well as drinking behaviors on the risk of both HPV-negative and HPV-positive HNSCC. We also investigated the interactive effects between smoking and drinking behaviors with the two cancer types via factorial MR. Our study highlights similarities and differences between HPV-positive and HPV-negative HNSCC risk factors.

Results

Baseline characteristics of the study population, including smoking and drinking behavior exposures, stratified by HPV status, are presented in Supplementary Table 1. The numbers of independent SNPs included as instrument variables for each smoking and alcohol use behavior are provided in Table 1. The results for univariable MR are summarized in Table 1 and Fig. 1 for the primary smoking and alcohol consumption exposures evaluated (SI, CSI, DPW). The genetic instruments for SI comprised of 57 SNPs were found to be associated with the risk of both HPV-positive HNSCC [IVW, OR (95% CI) = 2.37 (1.33, 4.24), P = 0.0003] and HPV-negative HNSCC [IVW OR (95% CI) = 1.81 (1.19, 2.76), P = 0.0005].

Table 1 Univariable Mendelian randomization of smoking and alcohol consumption exposures on HNSCC stratified by HPV status
Fig. 1: Forest plots of univariable Mendelian randomization effects of smoking and alcohol use exposures on HNSCC risk stratified by human papillomavirus (HPV) status.
Fig. 1: Forest plots of univariable Mendelian randomization effects of smoking and alcohol use exposures on HNSCC risk stratified by human papillomavirus (HPV) status.
Full size image

Univariable estimates were obtained using summary-level data from the GWAS of a smoking initiation (n = 1232,091), b comprehensive smoking index (n = 462,690), and c drinks per week (n = 941,280) on HPV-positive HNSCC risk (n = 1105 cases and 3469 controls) and HPV-negative HNSCC (n = 2326 cases and 3469 controls). Smoking initiation estimates are reported per log odds increase, while comprehensive smoking index and drinks per week are reported per SD increase in drinks per week. Error bars represent 95% confidence intervals. All statistical tests were two-sided. CSI comprehensive smoking index, MR Mendelian randomization.

The genetic instrument for CSI comprised by 90 SNPs with independent and robust associations with the lifetime smoking exposure indicator was associated with the risk of both HPV-negative HNSCC [IVW OR (95% CI) = 2.59 (1.37, 4.92), P = 0.0004] and HPV-positive HNSCC [IVW OR (95% CI) = 2.6 (1.2, 5.65), P = 0.02]. The odds ratios correspond to a standard deviation change in CSI, which is equivalent to an individual smoking 20 cigarettes a day for 15 years and quitting 17 years ago, or an individual smoking 60 cigarettes a day for 13 years and quitting 22 years ago.

Using 25 independent SNPs associated with DPW, increased alcohol consumption was associated with the risk of both HPV-negative HNSCC [IVW OR (95%CI) = 6.79 (2.68, 17.16), P = 5.21E-05] and HPV-positive HNSCC [OR (95%CI) = 3.58 (1.27, 10.14), P value = 0.02].

Multivariable MR results are summarized in Table 2 and Fig. 2. After controlling for DPW, lifetime smoking exposure as measured by CSI was associated with an increased risk of both HPV-negative HNSCC [OR (95%CI) = 3.03 (1.75, 5.24), P = 7.00E-05] and HPV-positive HNSCC [OR (95%CI) = 2.73 (1.39, 5.36), P = 0.003]. After controlling for CSI, the number of DPW was linked with the risk of both HPV-negative HNSCC [OR (95%CI) = 7.72 (3.63, 16.4), P = 1.00E-07] and HPV-positive HNSCC [OR (95%CI) = 2.66 (1.06, 6.68), P = 0.038]. The estimates of associations from the ridge regression MVMR analyses using the optimal lambda penalty parameter and MVMR Egger regression were consistent with the estimates from the IVW MVMR method (Table 2). Owing to the weak instrument strengths (<10), we conducted Q-statistic minimization yielding a Q-statistic of 188.20 (P = 0.36), implying a lack of heterogeneity after correction for weak instrument bias.

Table 2 Multivariable Mendelian randomization for smoking and alcohol consumption with risk of HNSCC by HPV status
Fig. 2: Forest plot of multivariable Mendelian randomization (MR) effects of lifetime smoking exposure and drinks per week on HNSCC risk stratified by human papilloma virus (HPV) status, using different MR approaches.
Fig. 2: Forest plot of multivariable Mendelian randomization (MR) effects of lifetime smoking exposure and drinks per week on HNSCC risk stratified by human papilloma virus (HPV) status, using different MR approaches.
Full size image

Effect estimates were obtained using summary-level data for drinks per week (n = 226,223) and the comprehensive smoking index (n = 226,223) on HPV-positive HNSCC risk (n = 1105 cases and 3469 controls) and HPV-negative HNSCC (n = 2326 cases and 3469 controls). Comprehensive smoking index and drinks per week estimates are reported per SD change. Error bars represent 95% confidence intervals. All statistical tests were two-sided. CSI comprehensive smoking index; “drinks” refers to alcoholic drink equivalents, IVW inverse variance-weighted, HPV−, HPV-negative, HPV+, HPV−positive, CI confidence interval.

Results of the factorial MR analysis are provided in the supplementary materials (Supplementary Table S2). We did not observe interaction effect between any smoking phenotypes with DPW, though factorial MR may lack the power to detect interactions, hence these results should be interpreted with caution. Additional smoking exposure phenotypes (AI, CPD, SC) were evaluated in exploratory analyses (Supplementary Table S3). Briefly, we did not identify different effects of these exposures on the two cancers, except CPD, which was positively associated with risk of HPV-negative HNSCC [for a single cigarette per day increase in smoking intensity, IVW OR (95% CI) = 1.59 (1.17, 2.17), P = 0.0003].

Lastly, we evaluated the association of risk tolerance and high-risk sexual behaviors with each cancer subtype as genetic loci of these exposures have been shown to overlap with those of smoking and alcohol exposures24. There were no associations between risk tolerance and number of sexual partners with the risk of HPV-positive or HPV-negative HNSCC (Supplementary Table S4).

Discussion

Utilizing univariable and multivariable MR, our study evaluated the causal effects of multiple smoking and alcohol use behaviors on the risk of HPV-positive and HPV-negative HNSCC. We observed that smoking and alcohol consumption independently increased the risk of both HPV-positive and HPV-negative HNSCC. Specifically, we found that both smoking and alcohol consumption were associated with both cancer types. These findings align with previous observational studies on the role of smoking and alcohol use in HPV-negative HNSCC, while providing evidence for the effects of these behaviors on HPV-positive HNSCC risks.

Large pooled observational studies have consistently supported tobacco smoking as an independent risk factor for HNSCC25,26,27. In a meta-analysis of 15 case-control studies involving 10,244 HNSCC patients, Hashibe et al. reported a pooled OR of 2.13 for the association of cigarette smoking and HNSCC compared to never-smokers26. More recently, Gormley et al. investigated the association between smoking and alcohol consumption on the risk of oral and OPC using a multivariable MR approach23. After controlling for alcohol consumption, they report supporting evidence for a direct causal effect of lifetime smoking behavior on head and neck cancer risk (OR 2.6, 95% CI 1.7–3.9). When stratified by cancer subsite, the causal effect of cigarette smoking on OPC risk was even stronger, with risk estimates of 3.7 (95% CI 2.3–6.0) compared to 2.5 (95% CI 1.5–4.1) for oral cavity cancer. Utilizing MR on a subset of this study’s cohort, we report these associations separately for HPV-positive and HPV-negative HNSCCs.

To address the correlation between smoking and alcohol consumption, as well as to simultaneously explore their independent effects, we performed multivariable MR analyses on HPV-positive and negative HNSCC groups separately. Multivariable MR extends the basic MR framework to accommodate the complexity of multiple correlated exposures, enabling the evaluation of the independent causal effects of smoking and alcohol use on HNSCC risk. In our separate assessments of HPV-positive and HPV-negative HNSCCs, after correcting for alcohol consumption, we observed an independent causal effect of lifetime smoking on the risk of HPV-associated HNSCC, providing evidence of a significant contribution of smoking to the risk of HPV-associated HNSCC. While the association between smoking and HPV-negative HNSCC is well established, there has been uncertainty regarding the influence of smoking on the risk of developing HPV-associated HNSCC. Previous studies have reported conflicting evidence, with some demonstrating a positive association while others reporting no interactions between tobacco smoking and HPV status14. In North America, the incidence of HPV-associated HNSCC has risen over the past few decades despite declining smoking rates, in direct opposition to the decreasing incidence of all other HNSCCs28. A pooled study by Anantharaman et al.13 reported associations of smoking with an increased risk of HNSCC in models stratified by HPV16 seropositivity. Smoking is thought to act synergistically with HPV infection to increase the risk of developing cancer29. This is possibly due to smoking suppressing mediators of immune function, thus facilitating the persistence of HPV infection, which is a crucial step in the development of HPV-related cancers30. Notably, in our study, while smoking initiation increased the risk of HPV-positive HNSCC (Supplementary Table S3), we found no association between CPD and HPV-positive HNSCC. In contrast, CSI, a comprehensive index of smoking initiation, smoking intensity, and duration of exposure, increased the risk of both cancers.

Our MR analyses also revealed independent associations between alcohol consumption and increased risks of both HPV-negative and HPV-positive HNSCC. The strong co-existence of smoking and alcohol use has made it difficult to determine the independent effects of each. In one study, the joint effect of tobacco and alcohol was found to be more than multiplicative, but no marginal effect of alcohol use among never tobacco users was observed26. In contrast, Gormley et al. reported an independent causal effect of alcohol consumption in oral and OPCs when controlling for smoking using an MR approach, although HPV status was not accounted for23. In a meta-analysis evaluating traditional OPC risk factors, the summary odds ratio for the risk of OPC was 3.76 for heavy alcohol drinking and HPV negativity, whereas it was 39.32 for HPV positivity and no alcohol drinking15. Interestingly, the risk of OPC among those who were heavy alcohol drinkers and HPV-positive was 27.10, suggesting the presence of an interactive effect between alcohol use and HPV status in increasing the risk of cancer development. The factorial MR analysis did not detect any interactive effects between smoking and alcohol use, however the absence of interaction should be interpreted with caution. Multiplicative joint effects of smoking and alcohol use on head and neck cancer have been previously described from large observational cohort studies26. Factorial MR has been shown to be limited in statistical power compared to conventional epidemiological approaches, due to variance and bias represented in genetic instruments31. Past investigations employing this approach in the context of cardiovascular disease and diabetes have yielded inconclusive results32,33.

Our study has several strengths. Firstly, large, pooled analyses with individual-level data were performed incorporating individual-level HPV status. MR is a powerful approach to evaluating causal relationships between exposures and outcomes by utilizing genetic variants as instrument variables and subsequently overcoming limitations of conventional epidemiological approaches, such as confounding and reverse causality34. We also used summary statistics from large GWAS studies of smoking, alcohol use, and head and neck cancer, utilizing numerous SNPs to ensure robust associations of our genetic instruments. As for limitations, several of the genetic loci used in our study have been previously associated with other exposures, such as sexual behaviors, which is a purported risk factor for HPV infection24,35. MR approaches to delineate independent causal effects of sexual activity, such as the number of sexual partners, on HNSCC risk have so far been limited due to correlated pleiotropy and non-specification of these sexual behavior instruments36. Furthermore, the lack of sex-specific instrument exposure information prevented the assessment of smoking and alcohol use stratified by sex, which is particularly relevant given the differences in exposures seen across males and females. Factorial MR may be inefficient at detecting statistical interactions due to the variance explained by genetic instruments and the potential for weak instrument bias, compared to the robustness of a clinical trial or observational studies31. Finally, it is important to note that while MR approaches can suggest potential causal relationships, additional evidence is required to confirm causal mechanisms. HPV-positive OPCs are considered to have a distinct etiopathogenesis compared to their HPV-negative counterparts, often with less pronounced associations with smoking and alcohol use. In our study, the apparent lack of disparity in the impact of risk factors could suggest a more nuanced and complex interaction between HPV status and these carcinogens than previously understood. Mechanistic studies that explore the biological interactions between HPV oncogenes and carcinogen-induced DNA damage in epithelial cells could provide further clarity.

In conclusion, we demonstrate that smoking and alcohol consumption have independent causal effects on the risk of both HPV-positive HNSCC and HPV-negative HNSCC. Using a multivariable MR approach, we show that the influence of lifetime smoking is similarly associated with both cancer types. Furthermore, we observed statistically significant results linking increased alcohol consumption in both HPV-positive and negative HNSCC. These results shed new light on possible modifying risk factors for HPV-positive HNSCC.

Methods

The study protocol was approved by the Voyager Consortium, with research consent obtained by the institutional review boards or ethics committees of each participating institution within the consortium. All participants, including cases and controls, provided written informed consent. The complete list of collaborating studies and their respective institutions can be found at https://voyager.iarc.who.int/co-investigators/.

The study design was an MR analysis of smoking and alcohol exposures on the risk of HNSCC stratified by HPV status.

The study population consisted of individuals included in the VOYAGER (Human

Papillomavirus, Oral and Oropharyngeal Cancer Genomic Research) consortium37. Within VOYAGER, OncoArray data were available from a total of 3431 cases and 3469 controls from Europe and North America22. In brief, all VOYAGER studies are hospital- or population-based case-control studies, except for the UK’s Head and Neck 5000 (HN 5000) case series. Individual studies obtained informed consent from all participants and ethical approval from their respective Institutional Review Boards. All studies utilized standardized instruments to collect information on sociodemographic and clinical characteristics, including information on smoking and alcohol-related behaviors.

HNSCC cases comprised the following International Classification of Disease Volume 10 (ICD-10): oral cavity (C02.0-C02.9, C03.0-C03.9, C04.0-C04.9, C05.0-C06.) oropharynx (C01.9, C02.4, C09.0-C10.9), hypopharynx (C13.0-C13.9) and overlapping (C14 and combination of other sites). Further stratification based on HPV status was performed to evaluate differences between HPV-positive and HPV-negative cancers. HPV-positive cancers were defined as OPC patients with positive HPV16 antibody status as the primary classifier, given that up to 90% of HPV-positive OPCs are attributed to HPV type 16 (HPV16)38. OPC cases were classified as HPV-positive or HPV-negative based on a previously validated and described HPV16 seropattern algorithm39, acknowledging that this method may lead to an underrepresentation of seropositivity for other high-risk HPV subtypes such as HPV 18, 31, and 33. For cases where the HPV16 antibody status was indeterminable, we utilized the expression of p16 as a surrogate marker. p16 is a cellular protein whose overexpression is an indirect measure of HPV-associated oncogenic activity, rather than a direct viral marker. OPC patients with unknown HPV status were excluded from analyses (n = 102). HPV-negative OPCs were pooled together with oral cavity cancer (OCC) cases as HPV-negative cancers. With an estimated prevalence of 5% or less in OCC, HPV is considered to have a limited role in the development of carcinomas of the oral cavity40,41,42. Consequently, all OCCs in our study population were assumed to be HPV-negative. Since OCC shares similar risk factors of excessive smoking and alcohol consumption with HPV-negative OPC, we hypothesized that these tumor types have a similar etiology. After excluding people with primary tumor sites other than the oral cavity or oropharynx (n = 299) and unknown HPV status (among the OPC subgroup; n = 102), the final cohort consisted of 1105 patients classified as HPV-positive HNSCC and 2326 patients as HPV-negative HNSCC (Supplementary Table S1).

Genotyping, genetic data acquisition, quality control, and imputation

Individual-level genetic data were obtained from the VOYAGER consortium, with genotyping performed using the Illumina OncoArray43 as described previously22. Genotyping data were accessed through the database of genotypes and phenotypes (dbGaP) project number phs001202.v1.p144. Standard quality control for the genotyping array included strand correction following standard pipelines45, sex checking, missing rates, duplicates or relatedness, outlying heterozygosity rates, and population stratification. 486,987 SNPS were included after applying standard quality control procedures. Analyses were performed using PLINK v1.90b4.446 and EIGENSTRAT v6.1.447,48. Imputation was performed via the TOPMed Imputation Server49 using software Eagle v2.450 and Minimac451 with the TOPMed r2 as a reference panel. Post-imputation quality control removed variants with r2 values less than 0.3 and minor allele frequency less than 0.01.

For the exposures, we used the summary statistics and definitions from the GSCAN meta-Genome-Wide Association Study (meta-GWAS) of smoking and alcohol use behaviors conducted using data from 1.2 million individuals20. Specifically, we defined: Smoking Initiation (SI) as a dichotomous variable of never smoker versus ever smoker, the latter defined as having smoked more than 100 cigarettes during lifetime; Alcohol Use Intensity/Drinks per Week (DPW) as a continuous variable of the average number of standardized alcoholic DPW; Comprehensive Smoking Index (CSI) as an independent and comprehensive indicator of smoking52,53. We also investigated additional smoking behaviors from the GSCAN GWAS including; Age of Smoking Initiation (ASI) as a continuous variable of age at which participant started smoking cigarettes regularly, with regularly defined as >5 cigarettes/week; Smoking Intensity/Cigarettes per Day (CPD) as a continuous variable of the average number of cigarettes smoked per day; and Smoking Cessation (SC) as a dichotomous variable of former smoker versus current smoker. These GSCAN phenotypes/behaviors have been shown to be heritable and having a sufficient variation in population samples by the Tobacco and Genetics (TAG) consortium35. Further, these phenotypes have been shown to be reliable and valid measures of tobacco and alcohol use in terms of morbidity and mortality24.

For multivariable MR, we used DPW and CSI34. CSI was used in lieu of the four GSCAN smoking behaviors, which are correlated and interdependent and would thus be unsuitable for a multivariable model. A GWAS for the CSI variable conducted on the UK biobank36 was used to obtain the instruments for this variable. We further investigated potential interaction effects between smoking and drinking on the risk of both HPV-positive and negative HNSCC using factorial MR.

Outcome

We conducted GWAS of risk of HNSCC stratified by HPV status, with the comparison groups consisting of HPV-positive HNSCC versus controls, and HPV-negative HNSCC versus controls54.

GWAS of HNSCC risk stratified by HPV status were conducted using additive models. The log-odds of the outcome were regressed on the genetic variable with age, sex, and the first 7 genetic principal components (PC) as covariates. Population stratification was evaluated to determine ancestry using principal component analysis (PCA) with PC plots provided in Supplementary Fig. 1. The ethnicity of the population is described in Supplementary Table 1 based on self-reported information. The genetic association tests were performed using PLINK v1.90b4.446.

Mendelian randomization

MR is a statistical method that combines quantified estimates from associations between genetic instruments and the risk factor (smoking and alcohol use) with parallel associations between these genetic variants and the outcome, to determine an estimate of the risk factor’s impact on the outcome (HNSCC). All steps of the univariable MR were performed using the TwoSampleMR package v0.5.6 in the R statistical language55. The genetic instruments associated with smoking and alcohol use were selected based on the P values of association. Using a reference population of 1000 genomes’ European superpopulation, all SNPs with p values < 5 × 10−8 were selected as potential index SNPs and then pruned using a clumping window of 10,000 base pairs (bp), with an r2 cutoff of 0.001 to ensure independence. Secondary SNPs in LD were removed at a threshold level set at P values < 1. The proxies for SNPs missing from the outcome GWAS were generated using the LD proxy tool (European superpopulation reference) and r2 cutoff of 0.8; SNPs with the highest r2 were selected. The “harmonise_data” function from the TwoSampleMR package was used to harmonize the SNPs between exposures and outcomes. The default action to infer the positive strand alleles using allele frequencies for palindromes was used. MR analyses with the Inverse-Variance-Weighted (IVW) method, MR Egger56, MR weighted median57(3), MR weighted mode, and MR-PRESSO58 were performed for each of the behavior outcomes with each of the three outcomes (HPV-positive HNSCC patients versus controls, HPV-negative versus controls, and HPV-positive versus HPV-negative). The application of multiple MR methods allows the assessment of causal effects across various statistical assumptions, thereby adjusting for potential pleiotropy and invalid instruments. For the first four methods, the “mr()” function from the TwoSampleMR package with default parameters (z test distribution, alpha of 0.05, q threshold of 0.05, phi parameter of 1, Huber loss function, Cov parameter of 0, penk parameter of 20, over-dispersion, no shrinkage, and 1000 bootstraps) was used. For the MR-PRESSO method, R package MR-PRESSO v1.058 was used. The No Measurement Error (NOME) assumption was assessed using the I2 statistic. These results are provided in the Supplementary Materials (Supplementary Table S5, 6).

Pleiotropy arises when a genetic variant is linked to the outcome of interest through multiple pathways, which may not necessarily involve the exposure under investigation. The presence of pleiotropy can alter both the magnitude and direction of the association between the exposure and outcome. To evaluate whether the assumptions of MR hold true, multiple MR methodologies are employed to assess the consistency of findings across different approaches. Pleiotropy was assessed using MR-PRESSO, and directional pleiotropy was assessed using the intercept for MR Egger regression (Supplementary Table 7).

For multivariable (MV) MR, Inverse-Variance-Weighted (IVR) Egger MVMR regression59,60, and Q-statistic minimization approach were performed using the MVMR R package v0.4, MVMR Ridge regression, a method that uses Ridge regression to shrink the regression estimates was also used61. As previously mentioned, we used the GSCAN drinking behavior, DPW, with a comprehensive measure of lifetime smoking exposure, CSI. The summary statistics for this index have previously been derived from the UK Biobank53. Specifically, the covariances for pairwise associations between SNP-exposure effects were assumed to be zero as the summary statistics for each exposure were derived from independent, non-overlapping samples. Weak instruments were tested using the conditional F statistic with a threshold of 10. Horizontal pleiotropy was tested using a modified form of the Q-statistic with respect to differences in MVMR estimates across the set of instruments62. Causal effect estimation was performed using the IVW method wherever the assumptions of MVMR were met (strong instruments and no significant pleiotropy). Whenever assumptions were violated, we obtained more robust estimates through Q-statistic minimization, which is particularly effective when instruments are weak or exhibit pleiotropy. for the MVMR ridge regression, a sequence of penalties (lambdas) was used and the results for the best lambdas were compared with the MVMR results. MVMR Egger regression was performed using the MendelianRandomization R package v0.70.

For factorial MR, we first constructed the polygenic risk scores (PRS) for the exposures as instrument variables for each study participant31. The PRS aggregates the effects of multiple genetic variants to estimate an individual’s genetic predisposition to the specific exposure in question. The SNPs used to construct the PRS, along with their effect sizes, were the same as those used in the univariable MR. Then, we performed a one-sample MR of each phenotype on both HPV-positive and HPV-negative HNSCC. Subsequently, we performed the instrumental variables regression of PRS of each of the four smoking phenotypes with DPW used as interaction term (smoking phenotype multiplied with DPW) on both HPV-positive and HPV-negative HNSCC based on two-stage least-squares model (AER R package v1.2-10)63. Since we were not able to build the PRS for CSI, we did not investigate the interactive effect between CSI and drinking in their effect on the two cancers.

We evaluated the association of risk tolerance and high-risk sexual behaviors with each cancer type as several of the genetic loci used in our study have been previously associated with these two exposures24. Further, sexual behaviors are a purported risk factor for HPV infection24,35, and risk tolerance can predispose an individual to initiate high-risk behaviors like smoking, drinking, and unsafe sexual activity. The association of risk tolerance and high-risk sexual behaviors with the two cancers was assessed using the univariable MR approach with the same methodology as for the smoking and drinking behaviors. Summary statistics for these behaviors were obtained from a large meta-GWAS published recently24.

Statistics and reproducibility

AT and TH performed all MR analyses independently, with replication of the same results and conclusions. GWAS data used in this study had been previously replicated in the respective studies20,22,24. Additional information on research design is available in the Nature Research Reporting Summary.