Abstract
Understanding the genetic basis of COVID-19 vaccine seroconversion is crucial to study the role of genetics on vaccine effectiveness. In our study, we used UK Biobank data to find the genetic determinants of COVID-19 vaccine-induced seropositivity and breakthrough infections. We conducted four genome-wide association studies among vaccinated participants for COVID-19 vaccine seroconversion and breakthrough susceptibility and severity. Our findings confirmed a link between the HLA region and seroconversion after the first and second doses. Additionally, we identified 10 genomic regions associated with breakthrough infection (SLC6A20, ST6GAL1, MUC16, FUT6, MXI1, MUC4, HMGN2P18-KRTCAP2, NFKBIZ and APOC1), and one with breakthrough severity (APOE). No significant evidence of genetic colocalisation was found between those traits. Our study highlights the roles of individual genetic make-up in the varied antibody responses to COVID-19 vaccines and provides insights into the potential mechanisms behind breakthrough infections occurred even after the vaccination.
Similar content being viewed by others
Introduction
The emergence of SARS-CoV-2 outbreak presented a major global health challenge, resulting in an unprecedent scale of COVID-19 vaccination intervention globally. Although COVID-19 vaccines showed remarkable effectiveness in preventing severe outcomes and hospitalisation1,2, they were not equally effective for all individuals due to multifaceted factors. Among all these complex elements, the impact of host genetics on the variability of vaccine-induced seroconversion and breakthrough susceptibility and severity remains unclear.
Large-scale genome-wide association studies (GWAS) have identified over 50 common loci associated with COVID-19 susceptibility and severity3,4,5,6,7,8,9,10,11,12,13, significantly improving our understanding of the biological mechanisms underlying this complex disease. However, only a few small studies involving trial participants, have shown the genetic variants associated with immune response after vaccination. Understanding the role of genetics on seroconversion and subsequent breakthrough infection and related complications is key to further unravelling the biology of vaccine effectiveness. Additionally, identifying genetic determinants of vaccine response can inform research into personalised vaccination strategies, such as the prioritisation of booster doses for individuals less likely to respond to primary vaccination.
We used UK Biobank data, together with its unique linkage to genetics, serological and public health tests, and health records to perform four genome-wide association studies of vaccine-induced seropositivity, breakthrough infection and severe COVID-19. Furthermore, we performed a colocalisation analysis to study the genetic overlap between the identified traits.
Results
Study cohorts
Our main analyses focussed on white British ancestry subjects (field 22006, genetic ethnic ancestry in UK Biobank). 201,893 participants were included in the SARS-CoV-2 serological antibody study. Among these vaccinated participants, we studied 53,203 within the one dose seroconversion analysis, including 15,046 responders and 38,161 non-responders. Similarly, 42,509 participants were included in the two doses seroconversion analysis: 30,455 responders and 12,054 non-responders (see Fig. 1A and Fig. 2A). The distribution of the number of days between the last vaccination dose and the antibody test date is show in Supplementary Fig. 1.
A Seroconversion cohorts, stratified by the number of doses (one or two). Evidence of no prior COVID-19 infection was defined as a negative or missing COVID-19 test result from the COVID-19 infection seroprevalence study. Responders were individuals with a positive serological test from the COVID-19 self-test antibody seroprevalence study. Non-responders were individuals who tested negative in the same study. B Breakthrough susceptibility and breakthrough severity cohorts. Breakthrough infection and severity was identified through the linkage to primary care records, hospital inpatient admissions, death registrations, and national infectious diseases surveillance data.
Among the 398,943 vaccinated UK Biobank participants, 315,620 were studied in the breakthrough susceptibility analysis, including 74,662 SARS-CoV-2 breakthrough infections and 240,661 participants not infected during the study period. Finally, out of the breakthrough susceptibility cases, 3860 were severe COVID-19 infections and 70,802 were considered mild infections for the breakthrough severity analysis (see Fig. 1B and Fig. 2B).
Baseline characteristics of each of the cohorts can be found in Supplementary Table 1. Mean age for all the cohorts was around 66–71, with more presence of females (around 55–58%), and with an index of multiple deprivation around 14–16.
Genome-wide association (GWAS) analyses
From the 784,256 variants loaded from genotype calls, over 600,000 passed the quality control in all the GWAS and were used to build a whole-genome regression model in REGENIE Step 1. From the 93,095,623 imputed variants, ~9,000,000 passed the quality control and were tested for association with each one of the traits. See Supplementary Fig. 2 for further details.
GWAS identified 13 lead independent variants associated with one-dose seroconversion response. The 13 independent lead variants were distributed among two genomic loci: rs9275109 (CHR = 6; OR = 0.86; P = 1.0 × 10−26) between genes HLA-DQB1 and MTCO3P1; and rs79510369 (CHR = 2; OR = 1.48; P = 5.3 × 10−10) in PLA2R1 gene (see Table 1 and Fig. 3A). Variants within these genomic loci were found in or near NOTCH4, BTNL2, HLA-DRA, HLA-DRB9, HLA-DRB5, HLA-DRB6, HLA-DRB1, HLA-DQA1, XXbac-BPG254F23.7, HLA-DOB, TAP2, COL11A2P1, and HLA-DPB2 genes (see Supplementary Table 2).
Notice that only top lead variants of each genetic loci are identified. The P Value of the Wald test in (A–D) is presented raw and did not correct for multiple testing. Source data is provided for this figure. A One dose seroconversion results. B Two doses seroconversion results. C Breakthrough susceptibility results. D Breakthrough severity results.
Two-dose seroconversion response was linked to seven lead independent variants distributed among two different genetic loci: rs68033958 (CHR = 6; OR = 0.82; P = 1.4 × 10−21) in HLA-DQB1 gene, and rs3094055 (CHR = 6; OR = 0.89; P = 2.1 × 10−10) as an upstream of gene UBQLN1P1 (Table 1, Fig. 3B). Other variants were found in or near NCR3, UQCRHP1, PRRC2A, BTNL2, HLA-DRA, HLA-DRB1, HLA-DQA1, XXbac-BPG254F23.7 and HLA-DQB3 (see Supplementary Table 2).
The conducted breakthrough infection GWAS identified 18 lead independent variants associated with post-vaccine SARS-CoV-2 infection susceptibility. Variants were distributed among ten different genomic loci (see Table 1, Fig. 3C): rs73062389 (CHR = 3; OR = 1.22; P = 4.0 × 10−56) in SLC6A20 gene, rs16861415 (CHR = 3; OR = 0.84; P = 6.8 × 10−55) in ST6GAL1 gene, rs11673136 (CHR = 19; OR = 1.08; P = 4.6 × 10−34) in MUC16 gene, rs112313064 (CHR = 19; OR = 1.06; P = 1.1 × 10−22) in FUT6 gene, rs681343 (CHR = 19; OR = 0.95; P = 1.4 × 10−17) in FUT2 gene, rs1977829 (CHR = 10; OR = 0.94; P = 1.4 × 10−12) in the MXI1 gene, rs2550250 (CHR = 3; OR = 1.05; P = 1.8 × 10−12) in MUC4 gene, rs6676150 (CHR = 1; OR = 1.04; P = 2.7 × 10−12) between HMGN2P18 and KRTCAP2 genes, rs17347644 (CHR = 3; OR = 0.96; P = 1.3 × 10−10) in NFKBIZ gene, rs5117 (CHR = 19; OR = 0.96; P = 4.7 × 10−9) in APOC1 gene. Other variants were located in or near LIMD1, LZTFL1, XCR1, FLT1P1, RPS20P14, RP1142D20.1, and FUT3 (see Supplementary Table 2).
Only one lead variant located in an exonic region of gene APOE was associated with breakthrough severity: rs429358 (CHR = 19; OR = 1.21; P = 1.1×10−8) (see Table 1, Fig. 3D, Supplementary Table 2).
Validation analyses
Participants that fulfilled the criteria to be part of the study cohorts but whose genetic ethnic group was not Caucasian (field 22006 in UK Biobank) were included in the validation cohort. In total, we included 8189 (2185 responders, 6020 non-responders) participants in the one-dose seroconversion response cohort, 6533 (4595 responders, 1961 non-responders) participants in the two-dose seroconversion cohort, 57,851 (12,798 infected, 45,437 non-infected) in the breakthrough susceptibility cohort and 12,727 (708 with severe infection, 12,090 with mild infection) in the breakthrough severity cohort. Population characteristics of these cohorts can be found in Supplementary Table 3.
We tested all the lead independent variants for association in the minority ethnic ancestry population. All results can be found in Supplementary Table 4, and results for top lead variants are shown in Fig. 4. Regarding top lead variants associated with one dose seroconversion, rs9275109 was validated (ORv = 0.84; Pv = 1.5·10−5), whereas rs79510369 was partially validated (ORv = 1.10; Pv = 4.5 × 10−1). Similarly, one of the top lead variants associated with two-dose seroconversion, rs68033958, was validated (ORv = 0.86; Pv = 3.3·10−3) whereas the other top lead variant, rs3094055, was partially validated (ORv = 0.98; Pv = 6.7 × 10−1).
Data are presented as OR ± 95% confidence interval. SNPs in blue are those fully validated, meaning that the OR from the main analysis and the OR from the validation analysis have the same direction, and that the validation P value ≤ 0.05. SNPs in orange are partially validated, indicating that both OR have the same direction, but the validation P Value > 0.05. SNPs in yellow were not validated, showing that both OR have opposite directions. We used the REGENIE method to perform the statistical test. Source data is provided for this figure. A One dose seroconversion results, with a sample size of 8189. B Two doses seroconversion results, sample size of 6533. C Breakthrough susceptibility results, sample size of 57,851. D Breakthrough severity results, sample size of 12,727.
Additionally, seven of the ten top lead variants associated with breakthrough susceptibility were validated: rs73062389 (ORv = 1.26; Pv = 4.2 × 10−12), rs16861415 (ORv = 0.89; Pv = 1.0 × 10−4), rs11673136 (ORv = 1.07; Pv = 1.9 × 10−6), rs112313064 (ORv = 1.04, Pv = 5.4 × 10−3), rs681343 (ORv = 0.93; Pv = 1.2 × 10−6), rs2550250 (ORv = 1.06; Pv = 1.8 × 10−4) and rs17347644 (ORv = 0.96; Pv = 2.1 × 10−2). The other top lead variants were partially validated: rs1977829 (ORv = 0.97; Pv = 1.1 × 10−1), rs6676150 (ORv = 1.02; Pv = 1.1 × 10−1) and rs5117 (ORv = 0.98; Pv = 3.8 × 10−1).
The only genetic variant found associated with breakthrough severity in our main analyses was validated in this subpopulation: rs429358 (ORv = 1.18; Pv = 4.7 × 10−2).
Colocalisation analyses
We studied the association of all the lead independent variants throughout all the traits (see Supplementary Table 5). We then calculated the probability of a shared causal variant in all the genetic loci. None of the genomic loci colocalised with other traits (see Supplementary Table 6).
Discussion
To our knowledge, this is the largest GWAS study of COVID-19 vaccine seroconversion response and vaccine effectiveness against infection and severe disease. Our study found variants in the HLA region associated with both first dose and second dose vaccine-induced seropositivity. Furthermore, the two top lead variants located in this region (rs9275109 for one dose seroconversion and rs68033958 for two dose seroconversion) were validated in the mixed ethnic cohort. The other independent variants distributed among these loci were located in/near genes such as UBQLN1P, NOTCH4, BTNL2, XXbac-BPG254F23.7, TAP2, COL11A2P1, NCR3, UQCRHP1, PRRC2A, BTNL2, and XXbac-BPG254F23.7.
Human leucocyte antigens (HLA) have been previously acknowledged as the most influential genetic factors for seroconversion to various vaccines14,15,16. Mentzer et al.17 recently investigated genetic variations associated with 28-day COVID-19 one-dose vaccination antibodies in a cohort of 1076 participants enroled in ChAdOx1 nCov-19 vaccine efficacy trials. Their study pinpointed the association between HLA locus with IgG antibody levels and risk of breakthrough infection. Although our recent study18 showed that the effect size of those genetic associations discovered from highly selective trial’s participants may not be fully generalisable to the wider community population, our findings remain supportive that HLA region plays a key role in antibody response to COVID-19 vaccines. We also found a signal within the PLA2R1 (rs79510369), but it was not fully validated in the mixed ancestry cohort.
We further discovered ten genomic loci (SLC6A20, ST6GAL1, MUC16, FUT6, FUT2, MXI1, MUC4, HMGN2P18-KRTCAP2, NFKBIZ, and APOC1) linked to breakthrough infection and one (APOE) associated with breakthrough COVID-19 severity. SLC6A20, ST6GAL1, MUC16, FUT6, FUT2, MUC4, NFKBIZ and APOE were validated in the mixed ethnic group, whereas the other loci (MXI1, HMGN2P18-KRTCAP2 and APOC1) were partially validated.
More than 50 variants have been associated with COVID-19 susceptibility and severity among unvaccinated subjects in previous studies3,4,5,6,7,8,9,10,11,12,13. SLC6A20 and FUT2 have been already related to COVID-19 susceptibility and severity in previous GWAS (see Supplementary Table 7). In our study, both were confirmed to also be associated with the risk of COVID-19 infection even after the vaccination. SLC6A20 encodes the sodium-imino acid (proline) transporter 1, also known as SIT1. SIT1 has an important role interacting with angiotensin-converting enzyme 2 (ACE2), which is the receptor found in SARS-CoV-2 virus infecting cells. Therefore, evidence suggests that the interaction between SIT1 and ACE2 might influence how the virus infects cells7,19,20. On the other hand, FUT2 gene is responsible for secretor status of ABO antigens, which has been proven to influence the susceptibility to some infectious diseases21. Taken together, these genetic associations, independent of vaccination status, suggest that viral entry or replication could be more likely to be the potential pathogenesis involved with COVID-19 susceptibility than immune responses, and can inform the focus of future therapeutic targets in the post-pandemic era.
MUC4 and MUC16 (genes associated with mucosal immunity) have been previously linked to COVID-19 severity. In a study including 125 hospital-admitted COVID-19 infected participants22, these genes were observed to be upregulated within the recovered participants, suggesting an active defence against SARS-CoV-2 infection. Similarly, NFKBIZ has not been previously associated with COVID-19 susceptibility but reported to have an association with the risk of developing severe COVID-1923.
Our study found genetic loci associated with breakthrough COVID-19 outcomes (ST6AL1, FUT6, MXI1, HMGN2P18-KRTCAP2, APOC1) that have not been previously reported. Although further investigation is required to illuminate specific mechanisms by which these genes operate, they seem to be associated with glucose metabolism pathways. ST6GAL1 is linked to sialic acid, which has been studied as a potential target for the dissemination of SARS-CoV-2 virus24,25.
Our study also suggested an association between breakthrough severity and APOE locus. APOE gene is well-known for its association with Alzheimer’s disease (AD)26,27. Previous evidence suggests that people with AD exhibit elevated morbidity and mortality of COVID-1928.Therefore, we propose that the APOE gene may influence breakthrough severity through its link to Alzheimer’s disease.
Our study has limitations. First, we only accounted for a binary seroconversion response, which may have reduced GWAS statistical power compared to quantitative antibody levels. However, this is due to the use of a rapid validated lateral flow device, which allowed for the collection of the largest cohort of individuals self-tested for vaccine immune/antibody response to date, including over 200,000 participants. Second, while seropositivity rates in our study are lower than those reported in clinical trials, this can be attributed to the relatively high age of our participants and the reliance on a single serological test result within a time window, which may not capture all individuals that seroconvert. Thirdly, we used linked routinely collected data to ascertain status of COVID-19 infections. This may lead to the outcome misclassification given that asymptomatic or mild infection cases are common particularly among the vaccinated individuals who may not seek for testing and thus falsely classified as controls. Forth, our study population is relatively old (66–71 years old), so care must be taken when generalising the results to a wider population. Fifth, we did not differentiate between different types of COVID-19 vaccines, which may have distinct breakthrough infection mechanisms. Sixth, external replication of the genetic association in other populations is currently impossible in our study due to the lack of similar linked data elsewhere. To mitigate this, we conducted an internal validation by splitting our cohort into those with European and non-European ancestry. Although we did not find apparent evidence of ethic-specific genetic effects, further studies with larger and more diverse population remain needed to guarantee the findings. Lastly, sample size in genome-wide association studies plays a key role in detecting signals. Some of our analyses, especially the breakthrough severity phenotype, might be underpowered, making it more challenging to detect significant variants.
Despite all these limitations, our study has several strengths that enhance the reliability of our findings. First, we used the largest sample size to date to perform the genome-wide association study. Second, most of the variants found to be associated with the traits in a European population were also found to be associated in a non-European population, enhancing the robustness of our findings. Third, we provide findings about the genetic mechanisms of vaccine seroconversion and breakthrough susceptibility and severity, which may provide valuable insights for the research into personalised vaccination strategies.
Methods
Study design and participants
UK Biobank
Study participants were from the UK Biobank study, a population-based cohort that recruited over 500,000 participants from England (89%), Wales (7%), and Scotland (4%). All people in the National Health Service registry who were aged 40–69 years and lived <25 miles from a study centre were invited to participate between 2006 and 2010. 503,325 participants were recruited from 9.2 million mailed invitations. Its study design and participant characteristics have been previously described in detail elsewhere29.
The UK Biobank includes information on demographics, socioeconomics, lifestyle factors, physical metrics, and medical history. It also includes genotyping data30, and follow-up data through linkages to electronic health record databases.
COVID-19 self-test antibody seroprevalence study
Alive UKB participants were invited to participate in a SARS-CoV-2 coronavirus antibody seroprevalence study from February 2021 to July 2021. At the same time, UK’s vaccination programme was being implemented. All participants, regardless of the sex, age, and health status, that met the inclusion criteria were eligible for recruitment and consequently, received an email with the invitation and brief information about the study. Participants unable or unwilling to participate were encouraged to inform about their decision. After seven days of the original invitation, a remainder email was sent to participants who had not responded yet. Participants willing to participate were asked to confirm their contact details and to give consent to receive a lateral flow self-test kit at their home address. Participants addresses were securely transferred to a third-party mailing house and shipping company, and three days prior to kit dispatch participants were sent an email to let them know that their kit was being dispatched.
Once participants have performed the test, they were asked to complete an online UK Biobank questionnaire. The questionnaire collected information about their result (IgM or IgG positive, negative, invalid), test date, their COVID-19 first and second vaccination status, and dates. Date on which participants submitted their results was automatically recorded. A reminder email was sent to participants who had not returned their test result one week after the kit was dispatched.
Recruitment was in two phases. The first phase invited ~34,713, ~22,390, and ~21,405 participants sequentially. The second phase invited the remaining ~371,985 participants who were not eligible for inclusion in phase one. More details about the study design can be found in the online document: https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=998.
COVID-19 infection seroprevalence study
The lateral flow test device used in the SARS-CoV-2 coronavirus antibody seroprevalence study could not distinguish between antibodies induced by infection or by vaccination. Hence, individuals who had previously participated in the self-test antibody study, who had reported a positive test result, and who had reported being vaccinated prior to taking the antibody test, were re-invited to provide a sample of capillary blood to test for IgG antibodies to the nucleocapsid (N) protein, which is an indicative of a past COVID-19 infection. Recruitment of participants for the COVID-19 infection study was similar to that of the antibody study and more details are provided in the online document: https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=997.
Data linkages
UK Biobank follow-up of the participants is conducted through individual-level linkages to multiple electronic health databases. The databases used in this study include primary care records (prescriptions and diagnoses), hospital inpatient admissions (diagnoses), death registrations, and national infectious diseases surveillance data (COVID-19 test results)31.
Genotyping and imputation
Genotyping and quality control of the genetic dataset of UK Biobank has been described previously30. In summary, UK Biobank genotype calling was performed by Affymetrix and includes 784,256 autosomal variants. Imputation was done by combining two different reference panels, the Haplotype Reference Consortium (HRC) and the UK10K haplotype resource. It includes 93,095,623 autosomal SNPs.
Definition of the study cohorts
In this study, we analysed four different cohorts to study the genetic variants associated with four different traits: (1) Seroconversion induced by one dose of COVID-19 vaccine, (2) seroconversion induced by two doses of COVID-19 vaccine, (3) breakthrough infection susceptibility, and (4) breakthrough infection severity.
In the main analysis, individuals with no European genetic ancestry (field 22006 in UK Biobank), with sex chromosome aneuploidy, and with different sex and genetic sex registered, were not included in the main analysis to avoid confounding effects.
Seroconversion responders and non-responders
For the seroconversion groups (one dose, two doses), we first restricted the analysis to vaccinated UK Biobank participants that enroled the SARS-CoV-2 seroprevalence study.
Responders were defined as participants with a positive serological test (from the COVID-19 self-test antibody seroprevalence study) within 8 to 56 days after latest dose of vaccination. However, individuals that after a seropositivity result had evidence of prior COVID-19 infection (positive result from the COVID-19 infection seroprevalence study), were excluded from the cohort. Non-responders were participants with a negative serological test within 8–56 days post-vaccination (see Fig. 1A).
Breakthrough susceptibility and severity
For the breakthrough groups (breakthrough susceptibility, breakthrough severity) we included UK Biobank participants that have received at least one dose of a COVID-19 vaccine.
Breakthrough susceptibility infections were defined by a positive PCR test, hospital admission with a COVID-19 diagnoses (ICD-10 Codes: U07.1, U07.2), or a death certificate listing COVID-19 as the cause of death (same ICD-10 codes). Absence of breakthrough infection were participants with no positive PCR test, no hospital admission with a COVID-19 diagnoses, neither a death certificate listing COVID-19 as the cause of death. Among breakthrough infection cases, we defined severe COVID-19 cases as those requiring hospitalisation or resulting in death. Mild infections for breakthrough severity analysis were participants with only a PCR test (see Fig. 1B). More details of the ICD-10 codes can be found in Supplementary Note 1.
Genome-wide association study
We conducted four different genome-wide association studies: one-dose seroconversion, two-dose seroconversion, breakthrough susceptibility and breakthrough infection. Associations between variants and traits were calculated using REGENIE (version 1.0.5)32, a machine-learning method. Briefly, REGENIE fits a whole genome regression model in two main steps. In step 1, a whole genome regression model is fit using a subset of the total set of available genetic markers. In step 2, a larger set of markers are tested for association conditional upon the prediction from the regression model in step 1 with the trait of interest.
In our study, we used UK Biobank genotype calls for the first step and UK Biobank imputed data in the second step. Regression models were adjusted for baseline age (at the date of antibody testing), sex, genetic batch, and the first ten genetic principal components. We also applied first-correction for imbalanced cases. Variants from both datasets (genotype calls and imputed data) underwent quality control before being used for the analysis. The quality control was performed using PLINK2 (version 1.0.6)33. Excluded variants include those with missing genotype data, that deviate from Hardy-Weinberg equilibrium (P < 1 × 10−15), and those with a minor allele frequency <1%. For imputed variants, we also removed duplicated SNPs, keeping only the first instance.
After performing the genome-wide association study, variants with a p-value ≤ 5 × 10−8 were considered to have a significant statistical association with the traits. Effect sizes of genetic association for each SNP were measured using odds ratios (OR). We used FUMA (version 1.6.0)34 to identify lead independent significant SNPs (r ≤ 0.1) and the top lead independent significant SNPs from each genetic locus (window 250 kb). We used positional mapping (window 10 kb) to map SNPs to genes.
Data curation for the generation of analytical cohorts and outcomes was done using R software (Version 4.3.0). GWAS were performed on UK Biobank RAP platform. Plots were generated with R.
Validation
To study if the obtained associations in the European population were reproduced in other ancestral groups, we tested the independent variants for association using a non-European population. For this analysis, we employed the complete genotype calls data for the 1st step of the REGENIE method. Subsequently, in the 2nd step, we exclusively tested the lead independent significant SNPs obtained in the main analysis.
Variants with the OR from the main analysis and the validated OR pointing to the same direction (OR and ORv, both being >1 or <1) and with a validated p-value (Pv) ≤ 0.05 were fully validated. Variants with both OR having the same direction but with a validated p-value > 0.05 were partially validated. Variants with OR in opposite directions were not validated.
Colocalisation
Once we have obtained the variants associated with one dose vaccine seroconversion, two dose vaccine seroconversion, breakthrough infection, and breakthrough severity, we aimed to study the genetic overlap between the different traits. Firstly, we studied the association of all the lead independent variants throughout all the traits. Afterwards, we used the R package coloc35 to perform a colocalisation analysis within ±250 kb from each genetic locus. Colocalisation analysis enables the assessment of whether the same traits have similar genetic roots. We employed the default prior probabilities for colocalisation: P1 = 10−4, P2 = 10−4, and P12 = 10−5 and considered a posterior probability for a shared common causal variant assumption (H4) greater than 50% as substantial evidence of colocalisation.
Software/implementation
We used UK Biobank RAP platform, specifically the software REGENIE32 (version 1.0.5) and PLINK233 (version 1.0.6), to perform the GWAS. We used FUMA34 (version 1.6.0) to detect independent SNPs. Data manipulation was done with R software (version 4.3.0), and the main packages used were coloc36 (version 5.2.3), dplyr37 (version 1.1.3), and ggplot2 (version 3.5.1).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
UK Biobank individual level data can be accessed by applying for access at http://ukbiobank.ac.uk/register-apply/. Ethics approval for the UK Biobank was granted by the North West Multi-Centre Research Ethics Committee in 2006 and was updated regularly after that (https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/about-us/ethics). All participants provided informed written consent to take part in the study and be followed-up through linkage to health-related records. This study received ethical approval from the UKBB Ethics Advisory Committee (EAC) under application 98358. Source data are provided in this paper. GWAS results are publicly available at the GWAS Catalogue under study accession codes GCST90448693, GCST90448694, GCST90448695 and GCST90448696. Source data are provided with this paper.
Code availability
All the analytic code is publicly available https://github.com/oxford-pharmacoepi/GeneticDeterminantsCovid19Vaxs38.
References
Cai, C. et al. A comprehensive analysis of the efficacy and safety of COVID-19 vaccines. Mol. Ther. 29, 2794–2805 (2021).
Lopez Bernal, J. et al. Effectiveness of the Pfizer-BioNTech and Oxford-AstraZeneca vaccines on covid-19 related symptoms, hospital admissions, and mortality in older adults in England: test negative case-control study. BMJ 373, n1088 (2021).
Cappadona, C., Rimoldi, V., Paraboschi, E. M. & Asselta, R. Genetic susceptibility to severe COVID-19. Infect., Genet. Evol. 110, 105426 (2023).
Pairo-Castineira, E. et al. GWAS and meta-analysis identifies 49 genetic variants underlying critical COVID-19. Nature 617, 764–768 (2023).
Ferreira, L. C., Gomes, C. E. M., Rodrigues-Neto, J. F. & Jeronimo, S. M. B. Genome-wide association studies of COVID-19: connecting the dots. Infect. Genet. Evol. 106, 105379 (2022).
Eshetie, S., Jullian, P., Benyamin, B. & Lee, S. H. Host genetic determinants of COVID-19 susceptibility and severity: a systematic review and meta-analysis. Rev. Med Virol. 33, e2466 (2023).
D, E. et al. Genomewide association study of severe Covid-19 with respiratory failure. N. Engl. J. Med. 383, 1522–1534 (2020).
Kousathanas, A. et al. Whole-genome sequencing reveals host factors underlying critical COVID-19. Nature 607, 97–103 (2022).
Thibord, F., Chan, M. V., Chen, M. H. & Johnson, A. D. A year of COVID-19 GWAS results from the GRASP portal reveals potential genetic risk factors. Hum. Genet. Genom. Adv. 3, 100095 (2022).
Shelton, J. F. et al. Trans-ancestry analysis reveals genetic and nongenetic associations with COVID-19 susceptibility and severity. Nat. Genet. 53, 801–808 (2021).
Niemi, M. E. K. et al. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).
Pathak, G. A. et al. A first update on mapping the human genetic architecture of COVID-19. Nature 608, E1–E10 (2022).
Horowitz, J. E. et al. Genome-wide analysis provides genetic evidence that ACE2 influences COVID-19 risk and yields risk scores associated with severe disease. Nat. Genet. 54, 382–392 (2022).
Pulendran, B. Immunology taught by vaccines. Science 366, 1074–1075 (2019).
Pulendran, B. & Davis, M. M. The science and medicine of human immunology. Science 369, eaay4014 (2020).
Dendrou, C. A., Petersen, J., Rossjohn, J. & Fugger, L. HLA variation and disease. Nat. Rev. Immunol. 18, 325–339 (2018).
Mentzer, A. J. et al. Human leukocyte antigen alleles associate with COVID-19 vaccine immunogenicity and risk of breakthrough infection. Nat. Med. 29, 147–157 (2023).
Xie, J. et al. Relationship between HLA genetic variations, COVID-19 vaccine antibody response, and risk of breakthrough outcomes. Nat. Commun. 15, 4031 (2024).
Vuille-Dit-Bille, R. N. et al. Human intestine luminal ACE2 and amino acid transporter expression increased by ACE-inhibitors. Amino Acids 47, 693–705 (2015).
Kuba, K., Imai, Y., Ohto-Nakanishi, T. & Penninger, J. M. Trilogy of ACE2: A peptidase in the renin–angiotensin system, a SARS receptor, and a partner for amino acid transporters. Pharm. Ther. 128, 119–128 (2010).
Rydell, G. E., Kindberg, E., Larson, G. & Svensson, L. Susceptibility to winter vomiting disease: a sweet matter. Rev. Med. Virol. 21, 370–382 (2011).
Maurya, R. et al. Human‐host transcriptomic analysis reveals unique early innate immune responses in different sub‐phenotypes of COVID‐19. Clin. Transl. Med. 12, e856 (2022).
Camblor, D. G. et al. Genetic variants in the NF-κB signaling pathway (NFKB1, NFKBIA, NFKBIZ) and risk of critical outcome among COVID-19 patients. Hum. Immunol. 83, 613–617 (2022).
Raïch-Regué, D. et al. Role of Siglecs in viral infections: a double-edged sword interaction. Mol. Asp. Med. 90, 101113 (2023).
Perez-Zsolt, D. et al. SARS-CoV-2 interaction with Siglec-1 mediates trans-infection by dendritic cells. Cell Mol. Immunol. 18, 2676–2678 (2021).
Corder, E. H. et al. Gene dose of apolipoprotein E type 4 allele and the risk of alzheimer’s disease in late onset families. Science 261, 921–3 (1993).
National Institute on Aging (NIH). Alzheimer’s Disease Genetics Fact Sheet. https://www.nia.nih.gov/health/alzheimers-causes-and-risk-factors/alzheimers-disease-genetics-fact-sheet#:~:text=One%20well%2Dknown%20gene%20that,to%20the%20development%20of%20Alzheimer’s.
Xia, X., Wang, Y. & Zheng, J. COVID-19 and Alzheimer’s disease: how one crisis worsens the other. Transl. Neurodegener. 10, 15 (2021).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Armstrong, J. et al. Dynamic linkage of covid-19 test results between public health England’s second generation surveillance system and UK biobank. Micro. Genom. 6, mgen000397 (2020).
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).
Chang, C. C. et al. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Watanabe, K., Taskesen, E., Van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
Wallace, C. Statistical testing of shared genetic control for potentially related traits. Genet. Epidemiol. 37, 802–13 (2013).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet 10, e1004383 (2014).
Wickham, H., François, R., Henry, L., Müller, K. & Vaughan, D. dplyr: A Grammar of Data Manipulation. R package version 1.1.4, https://github.com/tidyverse/dplyr, https://dplyr.tidyverse.org (2023).
Alcalde-Herraiz, M. oxford-pharmacoepi/GeneticDeterminantsCovid19Vaxs: first_release. GitHub https://doi.org/10.5281/zenodo.13309171 (2024).
Acknowledgements
D.P.A. receives funding from the UK National Institute for Health and Care Research (NIHR) in the form of a senior research fellowship. DPA’s group received partial support from the Oxford NIHR Biomedical Research Centre. J.Q.X. is funded through Jardine-Oxford Graduate Scholarship and a titular Oxford Clarendon Fund Scholarship. The authors express their sincere gratitude to all UK Biobank participants for generously providing an invaluable resource to advance scientific research.
Author information
Authors and Affiliations
Contributions
Conceptualisation (M.A.H., J.Q.X., D.P.A.); data curation (M.A.H.); statistical analysis (M.A.H.); supervision (J.Q.X., D.P.A., M.C., A.P.U., R.P.); interpretation of data (M.A.H., J.Q.X., D.P.A., R.P.); draughting of the manuscript (M.A.H., J.Q.X.); critical revision of the manuscript (J.Q.X., D.P.A., M.C., A.P.U., R.P.). All authors reviewed and approved the final version. The views expressed in this article are the personal views of the author(s) and may not be understood or quoted as being made on behalf of or reflecting the position of the regulatory agency/agencies or organisations with which the author(s) is/are employed/affiliated.
Corresponding author
Ethics declarations
Competing interests
D.P.A.’s department has received grant/s from Amgen, Chiesi-Taylor, Lilly, Janssen, Novartis, and UCB Biopharma. His research group has received consultancy fees from Astra Zeneca and UCB Biopharma. Amgen, Astellas, Janssen, Synapse Management Partners and UCB Biopharma have funded or supported training programmes organised by DPA’s department. R.P. has participated in advisory boards for Pfizer, Gilead, MSD, GSK, Atea, Lilly, Roche, Astra-Zeneca, ViiV Healthcare and Theratechnologies, has participated in lectures and seminars funded by Gilead, Pfizer, GSK and AstraZeneca, and has received research funds awarded to his institution from Gilead, Pfizer, and MSD. All other authors declare no conflicts of interest.
Peer review
Peer review information
Nature Communications thanks Rick Kennedy and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Alcalde-Herraiz, M., Català, M., Prats-Uribe, A. et al. Genome-wide association studies of COVID-19 vaccine seroconversion and breakthrough outcomes in UK Biobank. Nat Commun 15, 8739 (2024). https://doi.org/10.1038/s41467-024-52890-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-52890-6