Main

Primary hypothyroidism is a common and insidious metabolic disease. It is characterized by subtle and nonspecific symptoms, which can lead to delayed diagnosis, resulting in an underdiagnosed case burden estimated at up to 0.5%1,2. Thyroid hormone deficiency is associated with increased risk of cardiometabolic outcomes, including coronary artery disease (CAD), heart failure (HF) and metabolic syndrome1,3,4. The risk of hypothyroidism is influenced by various factors, such as subclinical hypothyroidism (SCH), autoimmunity, iodine supplementation, sex, age, ancestry and goiter1,5,6. Genetics play an important role, with twin studies estimating that 55% of the disease risk may be attributed to genetic factors7, and genome-wide association studies (GWAS) have linked hundreds of genetic loci to thyroid disease and related biomarkers8,9,10,11,12,13,14.

Screening for thyroid dysfunction is standard in clinical practice, with up to 25% of some populations undergoing annual thyroid function tests5. The diagnosis of overt hypothyroidism is straightforward. However, individuals with SCH characterized by elevated thyroid-stimulating hormone (TSH, >4 mU l−1) and free thyroxine (fT4) within the reference range pose a diagnostic challenge. Current guidelines recommend treating SCH with thyroid hormone replacement when TSH exceeds 10 mU l−1, if the patient is young, have a positive screen for thyroid-peroxidase antibodies (anti-TPO), have cardiovascular disease or exhibit symptoms of hypothyroidism4. Notably, relying solely on symptoms for treatment decisions may lead to overdiagnosis and overtreatment since classic hypothyroidism manifestations (for example, lethargy, dry skin or impaired memory) are commonly observed in euthyroid individuals4. Similarly, basing treatment decisions solely on biochemical findings may result in overtreatment since more than one-third of patients with abnormal thyroid function tests experience spontaneous remission without intervention15. In addition to biochemical testing, no risk assessment tool can distinguish between patients with high and low risk of disease progression. Given the high heritability and polygenic nature of hypothyroidism, we hypothesized that a well-powered polygenic risk score (PRS), incorporating millions of sequence variants, could aid in identifying high-risk individuals.

This GWAS meta-analysis, which included 113,393 hypothyroidism cases, 1,065,268 controls and over 190,000 individuals with measured thyroid hormone levels, offers insights into the genetic underpinnings of thyroid hormone deficiency. We characterized the hypothyroidism immunophenotype by investigating genetic associations with peripheral blood cell counts and circulating levels of inflammatory markers. We developed a PRS to improve the prediction of hypothyroidism and compared the predictive ability in incident disease relative to traditional risk factors. We then evaluated the ability of the PRS to predict progression from subclinical to overt hypothyroidism. Finally, we explored the association between the hypothyroidism PRS and common malignancies, cardiometabolic and neuropsychiatric traits.

Results

Genome-wide association results

We included 113,393 hypothyroidism cases and 1,065,268 controls from European cohorts in the GWAS meta-analysis of hypothyroidism (Copenhagen Hospital Biobank-Chronic Inflammatory Diseases/Danish Blood Donor Study (CHB-CID/DBDS), UK Biobank (UKB), FinnGen and 23andMe). The genomic inflation factor (λGC) was 1.46, and the linkage-disequilibrium score regression (LDSC)-intercept was 1.09 (s.e. = 0.03; Supplementary Table 1), indicating that most of the observed inflation was due to polygenicity. At genome-wide significance (P < 5 × 10−8), we identified 319 loci, of which 150 were previously unreported (Fig. 1a and Supplementary Tables 2 and 3) and 84 were not previously associated with other thyroid traits (Supplementary Table 4). Using a more stringent threshold of P < 1 × 10−9, we found 247 loci, of which 86 were unreported. The heritability was 14.5% (95% confidence interval (CI) = 14.0–15.0). Most lead single-nucleotide polymorphisms (SNPs) had modest effect sizes (median odds ratio (OR) = 1.03, interquartile range = 0.96–1.05). We discovered several low-frequency (minor allele frequency (MAF) < 5%) coding variants, which included a known stop-gain in TSHR (p.Trp546Ter; OR = 7.67, rs121908866) and two new and protective missense variants—rs149007883 in NFKBIZ (p.Gly102Ala; OR = 0.83, P = 4.94 × 10−8) and rs61731111 in S1PR4 (p.Arg243Cys; OR = 0.91, P = 8.66 × 10−9).

Fig. 1: Hypothyroidism lead variants and their associations with thyroid hormones.
figure 1

a, Relationships between minor allele frequencies and ORs for the 350 lead variants that were identified in the hypothyroidism genome-wide meta-analysis (113,393 cases and 1,065,268 controls) or through an endophenotype-driven analysis using thyroid-stimulating hormone genome-wide associations as priors. Coding variants are squared, new associations are turquoise, and known associations are gray. b, Relationships between hypothyroidism risk and changes in thyroid-stimulating hormone for 349 of 350 lead variants. c, Relation between hypothyroidism risk and change in free thyroxine for 348 of 350 lead variants. In b and c, the centerline represents the linear regression, and the shaded error band shows the 95% CI around the regression line. Statistical associations were assessed using two-sided Pearson correlation tests. No multiple testing correction was applied for these correlation analyses.

Endophenotype-driven analysis

We meta-analyzed GWASs of thyroid hormones from CHB-CID/DBDS, UKB and previously published summary data10,16. In a meta-analysis of up to 191,449 individuals with fT4 measurements, we identified 61 fT4 genome-wide significant loci, of which 15 were previously unreported (Supplementary Table 5). In a meta-analysis of up to 482,873 individuals with TSH measurements, 297 TSH genome-wide significant loci were identified, 126 of which have not been previously reported (Supplementary Table 6). Using LD score regression, we found that the genetic correlations with hypothyroidism were 55% (P = 3.55 × 10−122) for TSH and −23% (P = 3.95 × 10−3) for fT4. Based on the strong link between TSH and hypothyroidism, we used TSH GWAS associations as priors to enhance genomic discovery for hypothyroidism. Of the 297 TSH variants, 186 were associated with hypothyroidism at a false discovery rate (FDR) < 0.01. Of these, 96 were previously associated with hypothyroidism at genome-wide significance, 61 overlapped in positions with genome-wide hypothyroidism loci reported in this study and the remaining 29 represent new associations for hypothyroidism. In total, we identified 350 nonoverlapping loci via hypothyroidism meta-analysis or through the TSH-driven approach (Supplementary Table 7), 179 of which have not been reported previously.

Replication

We replicated unreported variants in the Estonian Biobank (EstBB) and deCODE genetics, which included 34,835 cases and 492,149 controls. Of the 179 new loci reported here, 176 (98%) were available for replication. In total, 35 of 176 (19%) variants replicated beyond the threshold for multiple testing (P < 2.79 × 10−4 (0.05/179)). A total of 110 of 176 (63%) were nominally significant (P < 0.05), and all but one had concordant direction of effect. Finally, 54/176 (31%) had P ≥ 0.05 but showed concordant direction of effect. There was a high concordance between effect estimates in the discovery and replication cohorts for the 179 risk variants (r2 = 0.85, P = 6.54 × 10−51). Given the large sample size differences between discovery and replication, we did not expect to be able to replicate all new loci at the threshold for multiple testing. Power calculations indicated that our replication analysis had limited power to detect variants with an OR of 1.03, which corresponds to the effect range of replication variants (Supplementary Fig. 1 and Supplementary Table 8). We also cross-referenced variants that replicated at nominal significance (P < 0.05) with genome-wide associations to TSH and fT4. Of the 75 variants that replicated at nominal significance (P < 0.05), 32 were previously genome-wide significant in either TSH or fT4 studies. Of the 54 variants that did not replicate (P ≥ 0.05) but had concordant direction of effect, 23 were genome-wide associated with either TSH or fT4 (Supplementary Table 9).

Correlation between hypothyroidism and thyroid hormones

Since the diagnosis of hypothyroidism is biochemical, we investigated the influence of hypothyroidism variants on thyroid hormone levels. We observed a modest correlation between the effect of hypothyroidism variants and TSH effect estimates (Pearson’s r = 0.58, P = 3.65 × 10−33), where 91% (315/348) of variants had concordant direction of effect. However, some notable differences existed. For example, the missense variant rs78534766 in ADCY7 and the FLT3 variant rs76428106 associated with large effects on hypothyroidism (OR = 1.4 and 1.37, respectively) but had a comparably small effect on TSH levels (s.d. = 0.04 and 0.08, respectively). Similarly, the variants rs2016105 in ELK3 (β = 0.17 s.d.) and rs6885099 in PDE8B (β =0.16 s.d.) had large effects on TSH but associated with a modest increase in disease risk (OR ~ 1.1; Fig. 1b and Supplementary Table 10). For fT4 levels, we found a weak correlation between disease risk and fT4 levels (Pearson’s r = −0.16, P = 0.004; Fig. 1c).

Inflammatory traits associated with hypothyroidism variants

To investigate the role of hypothyroidism variants in autoimmunity, we tested associations between lead variants, peripheral blood cell counts (for example, red blood cells, platelets, lymphocytes, eosinophils and neutrophils) and 90 inflammatory proteins17. We found that 153 of the 350 (44%) lead variants were associated with blood cell traits, and 55 of the 343 lead variants that were available in protein quantitative trait locus (pQTL) data were associated with at least one inflammatory marker at P < 1.41 × 10−6 (0.05/350 x 101 traits; Supplementary Tables 11 and 12). The inflammatory markers with the highest number of associations with hypothyroidism lead variants were IL12B (14/55), and FLT3LG (9/55), in line with previous findings8. We found 40 variants associated with both blood cell traits and inflammatory proteins, with evidence of trans-pQTL hotspots at several loci. The lead variants with the highest number of associations were the known missense variant rs3184504 (OR = 1.18, p.Trp262Arg) in SH2B3 and the intron variant rs11066320 (OR = 1.14) in PTPN11, which both associated with higher blood cell counts, and at least 30 different markers of inflammation (including several chemokines, interleukins and cytokines)17. Next, we interrogated variants associated with lower hypothyroidism risk in genes with known roles in immune system function. We highlight two missense variants, rs149007883 in NFKBIZ (p.Gly102Ala; OR = 0.83) and rs34536443 (p.Pro1104Ala; OR = 0.87) in TYK2, and two intron variants rs13181561 (OR = 0.96) in STING1 and rs113473633 (OR = 0.90) in NKFB1. These variants were associated with lower levels of a panel of inflammatory mediators (Supplementary Fig. 2), including IFN-γ, CXCL10 and CXCL9, which make up key pathogenic pathways involved in autoimmune diseases related to hypothyroidism18,19.

Gene mapping

We used five methods (polygenic priority score (PoPS), variant-to-gene (V2G), coding variants, transcriptome-wide association study (TWAS) with colocalization and Mendelian disease enrichment) using different strategies (coding impact, regulatory potential and gene–trait linkage) to prioritize genes. We found 88 coding variants in 59 genes that were either lead variants (11/88) or in high LD (r2 > 0.8) with one (Supplementary Table 13). Using PoPS, we mapped 209 of 350 (60%) hypothyroidism loci to 418 genes with a PoPS score in the >90th distribution, while V2G mapped 344 of 350 (98%) lead variants to a single gene (Supplementary Tables 14 and 15). Using TWAS with colocalization, we identified 272 genes within 135 of 350 (39%) hypothyroidism risk loci that showed evidence of colocalization between gene expression and disease risk (Supplementary Table 16). We found that 168 of 350 (48%) hypothyroidism loci overlapped in positions with 278 Mendelian disease genes implicated in autoimmunity or thyroid disease. Finally, 205 hypothyroidism loci had at least two lines of gene-mapping evidence prioritizing 259 genes (Supplementary Table 17). Gene enrichment analysis highlighted several genes involved in pathways related to a broad range of functions in the immune system (for example, differentiation, activation and regulation of myeloid and lymphoid blood cells, regulation of cell-cell adhesion, regulation of inflammatory responses and cytokine signaling; Supplementary Table 18), but only a handful genes were enriched in thyroid hormone metabolism (for example, GATA3, TPO, DIO1 and TG) or thyroid gland development (for example, FGF10, TG, NKX2-1 and THRA).

Converging effects of common and rare variants

Identifying rare coding variants in genes linked to hypothyroidism can confirm putative causal genes and increase the understanding of disease mechanisms. Using a published rare variant burden analysis including 18,362 cases and 310,690 controls20, we investigated the associations of genes with at least two lines of mapping evidence (259 genes) and hypothyroidism, using both predicted loss-of-function variants (pLoF) and protein-altering variants (PAVs; that is, deleterious missense variants and pLoF) at an FDR-adjusted P < 0.05 (Supplementary Table 19). pLoF variants in TSHR, an established monogenic cause of hypothyroidism, were associated with increased disease risk (MAF < 1%—OR = 3.02, 95% CI = 2.25–4.06, P = 2.5 × 10−13). In comparison, pLoF variants in NFATC1 (MAF < 0.001%—OR = 4.36, 95% CI = 2.11–8.99, P = 6.7 × 10−5) were associated with higher effect compared to TSHR pLoF variants, suggesting a potential monogenic role in hypothyroidism. Protective rare coding variants are particularly interesting, as they proxy effects of life-long therapeutic inhibition and may guide therapeutic developments21. Coding variants in four genes prioritized from our gene-mapping strategy associated with reduced risk of hypothyroidism—ZAP70 (PAVs, MAF < 0.001%—OR = 0.33, 95% CI = 0.19–0.57, P = 6.4 × 10−5), ARHGAP9 (PAVs, MAF < 1%—OR = 0.76, 95% CI = 0.66–0.87, P = 1.3 × 10−4), TYK2 (PAVs, MAF < 1%—OR = 0.78, 95% CI = 0.69–0.88, P = 6.9 × 10−5) and IFIH1 (pLoF, MAF < 1%—OR = 0.81, 95% CI = 0.75–0.86, P = 2.2 × 10−6).

PRS and hypothyroidism prediction

We derived a PRS of 1,107,248 variants from a meta-analysis of CHB-CID/DBDS, deCODE genetics, EstBB, FinnGen and 23andMe, including more than 116,000 hypothyroidism cases. The PRS was evaluated in the UKB and the Danish General Suburban Population Study (GESUS; Methods). In the UKB, which includes 32,304 cases and 399,000 controls, we found a strong association between the PRS and hypothyroidism (OR = 2.01 per s.d. increase in PRS, 95% CI = 1.99–2.03, P = 2.3 × 10−2790). Risk increased markedly at the extremes of the PRS distribution (Fig. 2a), with ORs of 4.14 (95% CI = 3.94–4.34), 7.49 (95% CI = 6.89–8.15) and 14.10 (95% CI = 11.44–17.38) for individuals at the upper 10th, 1st and 0.1th percentiles, respectively, compared to the 5th decile. We found a similar effect estimate in the GESUS cohort (OR = 2.0 per s.d. increase in PRS 95% CI = 1.85–2.17, P = 1.61 × 10−66). We next evaluated the predictive ability of the PRS relative to established risk factors6. Relative to a model with age, sex and principal components (PCs), the PRS yielded the largest change in area under the curve (ΔAUC) of 7.2% (95% CI = 6.7–7.6), which exceeded the impact of all other risk factors (Fig. 2b). Integrating all non-genetic risk factors into a model resulted in a ΔAUC of 0.5% (95% CI = 0.4–0.7), and a model including all risk factors (including the PRS) resulted in a ΔAUC of 7.8% (95% CI = 7.3–8.2; AUC = 0.70). Anti-TPO is a strong predictor of autoimmune hypothyroidism22. In the GESUS cohort, we identified 5,452 individuals with TSH, fT4 and anti-TPO measurements that were free of hypothyroidism at baseline. Of these, 431 were anti-TPO positive ( > 100 U ml−1). A model including age, sex and PCs yielded an AUC of 0.634 (95% CI = 0.589–0.679). A model including thyroid hormones and anti-TPO increased AUC further to 0.849 (95% CI = 0.810–0.889). By adding the PRS to the latter model, risk prediction improved significantly, increasing the AUC to 0.859 (95% CI = 0.821–0.897, P for difference = 0.03; Supplementary Table 20). For individuals who were anti-TPO negative, the PRS was able to capture residual disease risk. Anti-TPO-negative individuals in the top 10% of the PRS distribution had a nearly twofold increased risk (hazard ratio (HR) = 1.97, 95% CI = 1.06–3.68, P = 0.033) of developing hypothyroidism compared to those in the bottom 90% of the distribution.

Fig. 2: PRS association with and prediction of hypothyroidism.
figure 2

a, Associations between 10 deciles of the PRS and risk of hypothyroidism are presented as OR point estimates ± 95% CI error bars, estimated using logistic regression models adjusted for age, sex and PCs. No adjustments were made for multiple comparisons. b, Prediction of incident hypothyroidism cases. The benchmark model consisted of age, sex and four PCs. Prevalent risk factors for hypothyroidism were added iteratively to the benchmark model. The center of each error bar represents the AUC, and error bars indicate the 95% CIs, displayed in absolute terms on the right. No adjustments were made for multiple comparisons. MS, multiple sclerosis; PMR, polymyalgia rheumatica; PsA, psoriatic arthritis; RA, rheumatoid arthritis; DS, Down syndrome; SSc, systemic sclerosis; SLE, systemic lupus erythematosus; Celiac, celiac disease; Sjögren, Sjögren’s disease; T1D, type 1 diabetes.

Disease progression in SCH

The clinical course of individuals with SCH is difficult to predict5. We identified 8,114 individuals from UKB primary care data with biochemically defined SCH and investigated whether the PRS could identify individuals who are more or less likely to progress to overt disease. Compared to individuals with intermediate polygenic risk (>10th to 90th percentiles), individuals with high polygenic risk (>90th percentile) had an HR of 1.43 (95% CI = 1.37–1.61) for progressing to overt hypothyroidism, while low risk individuals (>10th percentile) had an HR of 0.76 (95% CI = 0.65–0.88). On the absolute scale, this risk translated to a 10.2% higher 10-year conversion rate for high-risk individuals (39.3%, 95% CI = 35.9–42.7%) and a 6.6% lower 10-year conversion rate for low risk individuals (22.5%, 95% CI = 19.6–25.4%) compared to those in the intermediate risk group (29.1%, 95% CI = 28.0–30.3%; Fig. 3).

Fig. 3: Progression from SCH to overt disease.
figure 3

Ten-year cumulative incidence of disease progression from SCH to overt hypothyroidism in 8,114 primary care patients from the UKB. Lines represent the cumulative incidence, and shaded bands indicate the 95% CI. The green line represents individuals with low polygenic risk (<10th percentile), yellow represents intermediate polygenic risk (10th–90th percentile), and red represents high polygenic risk (>90th percentile). Cumulative incidence was estimated using the Aalen-Johansen estimator, which accounts for the competing risk of death. HRs with 95% CIs were estimated using two-sided Cox proportional hazards models, adjusted for age, sex and four PCs. No adjustments were made for multiple comparisons.

Disease risk stratified by lifestyle factors and genetic risk

We investigated the relationship between hypothyroidism risk and lifestyle categories using a four-point scoring system based on body mass index (BMI), exercise, smoking and dietary habits. We found that healthy lifestyle characteristics were associated with a reduced risk of hypothyroidism. As expected, individuals without obesity had lower risk (HR = 0.71, 95% CI = 0.68–0.74) compared to obese individuals23. Contrary to previous findings, we found that nonsmokers had a lower risk (HR = 0.79, 95% CI = 0.75–0.84) than did current smokers24. Overall, adherence to a healthy lifestyle corresponded to an HR of 0.83 (95% CI = 0.79–0.87), while an unhealthy lifestyle corresponded to an HR of 1.26 (95% CI = 1.16–1.35; Fig. 4a and Supplementary Table 21). Finally, we explored the interplay between the PRS and lifestyle factors to identify individuals at extreme disease risk (Fig. 4b,c). The 10-year risk was greater for women, with the highest risk observed along the PRS axis. Risk increased with accumulating risk factors and higher polygenic risk, where the highest 10-year risk (50%) was observed for women above the age of 60 years, with a PRS in the >90th percentile of the distribution, who were obese, smokers and did not exercise regularly.

Fig. 4: Stratifying hypothyroidism risk using lifestyle characteristics and polygenic risk in the UKB.
figure 4

a, Risk for incident hypothyroidism according to different lifestyle characteristics and categories. Data are presented as HR point estimates ±95% CIs, derived from two-sided Cox proportional hazards models, adjusted for age, sex and PCs. The center of each error bar represents the mean HR estimate. b,c, Ten-year risk of hypothyroidism, stratified by sex (b, females; c, males), age group, obesity status (BMI > 30), exercise regularity (yes/no), smoking status (yes/no) and divisions within the PRS across ten deciles (Q1–10).

Associations with cancer and cardiometabolic phenotypes

We investigated the relationship between the hypothyroidism PRS and 50 phenotypes, including common malignancies and cardiometabolic traits in UKB. We found that a higher PRS was associated with a lower risk of skin (OR = 0.92 per s.d. increase in PRS, P = 7.7 × 10−37), prostate (OR = 0.94 per s.d. increase in PRS, P = 2.2 × 10−6) and breast cancer (OR = 0.95 per s.d. increase in PRS, P = 6.0 × 10−5). We also found that a higher PRS was associated with an increased risk of several cardiometabolic diseases, including CAD (OR = 1.06 per s.d. increase in PRS, P = 1.1 × 10−24), chronic kidney disease (OR = 1.06 per s.d. increase in PRS, P = 7.7 × 10−16) and type 2 diabetes (OR = 1.05 per s.d. increase in PRS, P = 1.5 × 10−19; Fig. 5 and Supplementary Table 22).

Fig. 5: Phenome-wide associations between the hypothyroidism PRS and cancer and cardiometabolic phenotypes in the UKB.
figure 5

The figure shows associations between the hypothyroidism PRS and 50 binary disease outcomes. OR reflect the change in disease risk per 1 s.d. increase in the PRS, estimated using logistic regression models adjusted for age, sex and four PCs. P-values were calculated using two-sided Wald tests. Each colored triangle indicates a significant association after Bonferroni correction (P < 0.001, that is, 0.05/50). Upward-pointing triangles indicate increased risk and downward-pointing triangles indicate decreased risk.

Discussion

In this study, we present a comprehensive genetic evaluation of thyroid hormone deficiency through GWAS meta-analyses of hypothyroidism and thyroid hormones. Our findings confirm and extend the understanding of the polygenic and complex nature of hypothyroidism, linking 350 genetic loci to this disease. By linking genetic loci to immune-related cells and circulating inflammation markers, we characterized potential mediators of disease. Using gene-prioritization methods, we identified putative genes with known roles in autoimmunity, which aligns with the main etiology in iodine-sufficient areas of the world25. We showed that using a hypothyroidism PRS could potentially improve the diagnostic accuracy in thyroid hormone deficiency, a condition fraught with diagnostic challenges.

We highlight associations with inflammatory markers, which may provide insight into inflammatory pathogenic mechanisms26. We emphasize four risk-mitigating variants (missense—rs149007883 in NFKBIZ, rs34536443 in TYK2; intronic—rs13181561 in STING1, rs113473633 in NKFB1) in genes encoding critical regulators of immune system function27,28,29,30. These variants were associated with lower levels of inflammatory mediators, including IFN-γ, CXCL10 and CXCL9, that make up a crucial pathway in the activation and recruitment of immune cells18,19. This is proposed to be a central pathogenic pathway in many autoimmune diseases, including vitiligo, psoriasis and psoriatic arthritis, which are closely linked to hypothyroidism. This aligns with the increased expression of IFN-γ in the serum and thyroid tissue of patients with autoimmune hypothyroidism, which is proposed to mediate thyroid hormone deficiency through lymphocyte infiltration and the exposure of thyrocytes to proinflammatory cytokines4,31,32,33. Further investigation into key signaling pathways, such as those highlighted, may be critical for understanding the mechanisms underlying disease pathogenesis34.

Human genetic evidence has been acknowledged as an important predictor of success in drug development programs34. We observed converging evidence linking both common and rare PAVs in TYK2 and ZAP70 with reduced risk of hypothyroidism. Although the therapeutic potential of inhibiting TYK2 has been used in multiple autoimmune diseases, its potential in mitigating hypothyroidism risk remains largely unexplored35,36. This could suggest a strategic direction for drug repurposing. Similarly, inhibiting the protein product of ZAP70, which is also essential for T-cell signaling, has been shown to have anti-inflammatory properties in vitro and to be effective in treating psoriasis in mice37. Given its similar pathway and risk reduction profile, ZAP70 has also emerged as a candidate for further research in the context of managing autoimmune diseases.

Due to the highly polygenic nature of hypothyroidism, we developed a PRS from more than 116,000 hypothyroidism cases to address diagnostic challenges in thyroid hormone deficiency. An estimated 0.5% of individuals with undiagnosed hypothyroidism may reflect individuals who do not seek medical attention for gradually developing nonspecific symptoms. The use of a PRS to identify individuals at greater risk could reduce the burden of undiagnosed thyroid failure. Specifically, in the top 1% and 0.1% of the PRS distribution, individuals exhibited a more than 7-fold and 14-fold risk, respectively, when compared to the middle decile. These risks are substantially greater than those observed for other complex traits38 and for known monogenic causes of hypothyroidism. Using two different validation cohorts, we were able to show that the PRS outperformed an array of clinical hypothyroidism risk factors but also improved risk prediction beyond that of thyroid hormones and anti-TPO.

The prevalence of thyroid hormone testing in clinical practice inevitably leads to a substantial number of patients being diagnosed and treated for SCH5. Early treatment is beneficial for preventing the progression to overt hypothyroidism and mitigating the risk of associated cardiovascular morbidity39. However, the clinical course of SCH to overt disease is unpredictable and relies on vague and nonspecific symptoms. We demonstrated that the PRS could identify individuals at high and low risk of progression from SCH to overt disease. If genotyping becomes a standard of care, PRS may guide clinicians in selecting patients who are more or less likely to progress from one disease state to another. Consequently, the clinical approach could shift to a genotype-guided biochemical assessment, rather than relying solely on nonspecific symptoms to guide testing. Also, we were able to show that by combining the PRS with easily accessible lifestyle factors, we could identify individuals with a 10-year risk of 50%. These accumulated risk factors are comparable to conventional risk factors investigated in The Wickham Study, where women with elevated TSH (>6 mU l−1) and positive anti-TPO had an annual progression rate of 4.3%40. Collectively, our findings underscore the potential of using genetic risk stratification to guide personalized risk assessment and prevention strategies for hypothyroidism.

Numerous observational studies have linked hypothyroidism to increased cardiovascular morbidity1,3,4. Using a phenome-wide association study approach, we found that the hypothyroidism PRS associated with a range of cardiometabolic diseases, atherosclerotic disease, chronic kidney disease and type 2 diabetes. This implies the need for a more focused approach to monitoring cardiovascular risk factors and diseases in individuals with hypothyroidism. Furthermore, we found significant associations between genetically predicted higher hypothyroidism risk and lower risk of breast, prostate and skin cancer, supporting the findings reported by several observational studies41,42. The association between the PRS and breast cancer aligns with that of a recent GWAS of thyroid function11. Interestingly, we found no association between hypothyroidism risk and thyroid cancer, despite previous GWASs showing an association between higher TSH levels and lower risk of thyroid cancer11,16. Whether the observed associations with specific cancers reflect shared pathways, where augmented immunosurveillance leads to both disease and, conversely, mitigates the risk of specific cancers, will require additional investigation.

This study has several limitations. First, the analysis was limited to individuals of European ancestry, which restricts the generalizability of our findings to other ancestries. Second, we relied on data from cohorts, where the phenotype definition was based on self-reported diagnoses, such as those from 23andMe, or on summary statistics with predefined phenotypes, which limited our ability to further refine the phenotype definitions. This may have introduced some degree of heterogeneity.

In conclusion, we found 350 genomic risk loci for hypothyroidism, underscoring the highly polygenic nature of this disease. Leveraging this insight, we developed a PRS that could identify individuals at high risk of developing disease in the general population and also predict the clinical course of subclinical disease. Our findings represent a step forward in the genetic understanding and clinical management of hypothyroidism, broadening the perspective for use in personalized medicine.

Methods

Ethics statement

This research complied with all ethical regulations and was conducted in accordance with the principles of the Declaration of Helsinki. All contributing studies received approval from the appropriate regional or institutional research ethics committees. For most cohorts, participants provided written informed consent before inclusion. For CHB participants, written informed consent was not obtained, but in accordance with Danish legislation, participants were informed about the use of residual biological material for research purposes and provided with the option to opt out. Details of ethics approvals and consent procedures for individual cohorts are provided in the Supplementary Note.

Cohorts, association testing and meta-analysis

We used four cohorts for discovery (CHB-CID/DBDS, UKB, FinnGen Freeze 10 and 23andMe). EstBB and deCODE genetics were used for validation. Cases were defined using International Classification of Diseases (ICD)-10 codes E03.8/E03.9/E06.3, ICD-9 codes 244.8/244.9 or claimed prescription of thyroid hormone substitution therapy using Anatomical Therapeutic Chemical Classification code H03A. In UKB, in addition to electronic health registries, we included individuals self-reporting hypothyroidism or use of thyroid hormone substitution as cases. Individuals with hyperthyroidism (E05(0-9)) were excluded and we otherwise used thyroid disease-free controls, excluding individuals with the following ICD-10 codes: E0(1-2), E03(0-5) and E0(4-7). In 23andMe, cases were defined based on self-reported diagnoses of hypothyroidism, elevated thyroid-stimulating hormone or taking levothyroxine. Controls were individuals who reported no other thyroid-related disorders43. Details on genotyping, imputation and quality control are provided in the Supplementary Note and Supplementary Table 23. Using CHB-CID/DBDS, UKB primary care data (the subset allowed for non-COVID research) and previously published data, we meta-analyzed GWASs for TSH and fT4. We used the first nonmissing sample value that was within the reference range. The results of individual thyroid function tests were inverse normalized. Individuals who were either on thyroid drugs or had undergone thyroid surgery before the thyroid function tests were excluded. In the UKB primary care data, thyroid hormone measurements were captured using Read2 and Read3 codes, while drug and operation codes were recorded using dm+d and OPCS-4 codes, respectively. In the CHB-DBDS, thyroid hormones were captured using NPU codes, drugs using ATC codes and surgical procedures using procedure codes (Supplementary Table 24). Each dataset underwent initial quality control (QC), imputation, post-imputation QC and logistic regression models were used for the hypothyroidism GWAS and linear regression for the thyroid hormones. All models were adjusted for age, sex and PCs. In postregression QC, we removed variants with an imputation quality score <0.6, minor allele count (MAC) < 6 or absolute β or s.e. >10. We meta-analyzed datasets using METAL44, using the fixed-effect inverse variance weighted method. To evaluate genomic inflation, we calculated the genomic inflation factor (λGC) and the LDSC-intercept using LD scores calculated in the HapMap3 CEU population (Supplementary Table 1). We observed signs of inflation in FinnGenHypo-GWAS (λGC = 1.40, LDSC-intercept = 1.21), CHB-CID/DBDSTSH-GWAS (λGC = 1.35, LDSC-intercept = 1.19) and UKBTSH-GWAS (λGC = 1.24, LDSC-intercept = 1.1) and accounted for potential bias by correcting the GWAS s.e. by the square root of the LDSC-intercept45. No additional genomic control was applied. Liftover between genetic builds was conducted using the R package MungeSumstats46

Risk locus definition

To identify independent SNPs within each risk locus, we used LD clumping from PLINK (v1.9)47. We applied a 1-Mb window (--clump kb 1000) and low LD threshold (--r2 0.1) to identify independently significant SNPs. Lead SNPs were independent SNPs with the lowest P value, and a locus was defined as a ±1-Mb region around each lead SNP. We queried the GWAS-catalog48 (on 19 April 2024) for known phenotypic associations with either the lead SNP or variants located ±1 Mb of the lead SNP. We considered a risk locus new if no genome-wide significant association (P < 5 × 10−8) with hypothyroidism or the use of thyroid hormone replacement therapy had been reported previously.

Heritability

Variance in hypothyroidism risk and levels of fT4 and TSH explained by common SNPs were estimated using LD-adjusted kinships (LDAK) SumHer BLD-LDAK model49. We used the precomputed tagging files internal to SumHer, and for hypothyroidism, assessed the heritability on a liability scale (correcting for sample and population prevalence).

Association with blood cell traits and inflammatory proteins in the UKB

Investigating the genetic imprint on the immunophenotype of hypothyroidism may help identify key functions and interactions involved in hypothyroidism risk. First, we assessed 10 blood cell counts, including basophil, eosinophil, neutrophil, monocyte, platelet, reticulocyte, high light-scattered reticulocyte, lymphocyte, red and white blood cell counts, along with C-reactive protein levels. We obtained these measurements from Europeans in the UKB and subsequently rank-inversely normalized each trait. Next, we tested the association between lead variants and blood cell traits using linear regression adjusted for age, sex and four PCs. Additionally, we used proteomics data from the UKB Pharma Proteomics Project50. We assessed the effect of lead variants on 90 inflammatory proteins, which were identified as immune-mediated drivers17. We set the threshold for multiple testing at P = 1.41 × 10−6 (0.05/101 traits x 350 variants).

Gene mapping

To identify and prioritize candidate genes at each locus (±1 Mb of the lead variant), we used the following five methods:

  1. 1.

    Coding variants—we investigated whether lead variants or proxy variants (r2 > 0.8) were annotated as coding variants using the variant effect predictor51.

  2. 2.

    V2G—we used the V2G algorithm provided by Open Targets Genetics (https://genetics.opentargets.org/), which scores and assigns each variant to a gene based on aggregated evidence from splice, expression, and protein quantitative trait loci (sQTL, eQTL and pQTL, respectively), chromatin interactions, in silico prediction and distance to transcript sites.

  3. 3.

    PoPS—we used a similarity-based gene-prioritization tool integrating GWAS summary data, gene expression data, biological pathways data and protein-protein interaction data from over 50,000 features52. The analysis involved the following three steps: (1) computing gene-level association data and gene-gene correlations using MAGMA53 with LD estimated from 1000 Genomes European data, (2) running enrichment analysis for gene features listed at https://github.com/FinucaneLab/gene_features using MAGMA and (3) calculating PoPS score for each gene by fitting a joint model for enrichment of all resulting features. Genes with a PoPS score in the top 10% of the distribution were prioritized as putative causal genes.

  4. 4.

    Mendelian disease enrichment—we used MendelVar (https://mendelvar.mrcieu.ac.uk/) to detect intersections between hypothyroidism loci and Mendelian disease genes, providing valuable clues for gene prioritization. We selected variants located within ±1 Mb of each lead variant and used the 1000 Genomes Europeans as a reference panel. Genes were annotated if they were identified with the following keywords: ‘thyroid’, ‘immune’ or ‘immuno’ in OMIM disease descriptions, or Human Phenotype Ontology/Disease Ontology descriptions.

  5. 5.

    TWAS with colocalization—using FUSION54, we performed TWASs using hypothyroidism summary data to investigate the relationship between the risk loci and effects on gene expression in GTEx v8 datasets on hypothalamus, pituitary gland, thyroid, spleen, whole blood and pancreas. We used the internal colocalization function, which uses COLOC, to detect shared causal variants between hypothyroidism risk and gene expression55. We only considered eQTLs associated with hypothyroidism at P < 2.97 × 10−6 (0.05/16,841 genes tested). Finally, we report the posterior probability of colocalized associations (PP4), which show evidence of a shared causal variant found in both GWAS and functional associations. If a hypothyroidism risk locus (sentinel variant ±1 Mb) harbored a gene with PP4 > 0.75, we considered this a mediator of hypothyroidism and evidence of gene mapping.

Genes with two or more lines of evidence for gene mapping were investigated in a Gene Ontology enrichment analysis using the R package clusterProfiler56.

Rare protein-truncating variants in prioritized hypothyroidism genes

The convergence in disease risk between common and rare truncating variants can pinpoint causal genes. This can offer insight into disease pathophysiology and potentially guide drug discovery or repurposing. Using published summary statistics from a whole-exome sequencing burden analysis of hypothyroidism20, we compared the convergence between common and rare variant effects in genes that had at least two lines of mapping evidence at FDR < 0.05.

PRS derivation and validation

We generated the hypothyroidism PRS from a meta-analysis of CHB-CID/DBDS, FinnGen, EstBB, deCODE genetics and 23andMe, comprising 116,470 cases and 1,164,733 controls. PRS weights were calculated using PRS-CS57 with an LD reference panel from the UKB. We validated the PRS in UKB, where individuals with ICD-10 E05(0-9) were removed to mitigate enrichment for participants with hyperthyroidism amongst hypothyroidism cases. In UKB, associations with hypothyroidism were first reported on a continuous scale. Next, we tested the association by deciles, and at the extremes of the PRS (99th and 99.9th percentiles) using logistic regression models adjusted for age, sex and four PCs. We assessed the predictive performance of the PRS relative to known clinical risk factors (for example, BMI and selected autoimmune diseases; Supplementary Table 25) by calculating the AUC using the R package pROC58. For each risk factor, the change in the AUC (ΔAUC) was compared to that of a model consisting of age, sex and four PCs. We evaluated prevalent risk factors (that is, events before the baseline date) and tested the predictive performance for incident hypothyroidism cases (events after the baseline date). Significant differences between the prediction models were tested using DeLong’s test for correlated ROC curves. The PRS was also evaluated in the Danish General Suburban Population Study (GESUS). This was a population-based cohort study in which 21,205 adults were recruited between 2010 and 2013. At baseline, participants underwent physical examination, completed a questionnaire and had blood samples drawn. Individuals with ICD-10 E05(0-9) were excluded from the analysis. First, we evaluated the PRS association with hypothyroidism on a linear basis and then evaluated the predictive performance of the PRS relative to thyroid hormones and anti-TPO positivity. Anti-TPO was measured on Kryptor assays. Values greater than 60 U ml−1 are considered positive in Denmark59, and we chose a more conservative cutoff of 100 U ml−1 for anti-TPO positivity to avoid misclassification of individuals with autoimmunity60. We compared the AUCs of models including sex, age and four PCs with stepwise addition of thyroid hormones, anti-TPO positivity and the PRS for hypothyroidism.

Disease progression in SCH patients

In UKB primary care data (n ~ 245,000), we defined individuals with SCH as having TSH levels greater than 4 mU l−1 and fT4 levels between 8 and 14.5 pmol l−1 using Read2 and Read3 codes. We only considered biochemical measurements available after the date of enrollment in the UKB to avoid immortal time bias. Before the date for SCH, we excluded individuals with a history of thyroid cancer, hyperthyroidism and hypothyroidism using ICD-10, Read2 and Read3 codes and individuals taking thyroid hormone substitution as indicated by dm+d codes (Supplementary Table 26). The PRS was categorized into the following three groups: (1) ≤10th percentile, representing low-risk individuals; (2) >10th and <90th percentiles, representing the general population and (3) ≥90th percentile, representing high-risk individuals. We used Cox regression models to compute HRs for risk of progression to overt hypothyroidism. Individual follow-up ended in the case of an event sampled following the date of SCH from electronic health records (defined by ICD-10 E03.8/E03.9/E06.3 or Read2/Read3 codes indicative of autoimmune myxedema/Hashimoto’s thyroiditis), death or end of follow-up, whichever occurred first. Models were adjusted for age, sex and four PCs. Absolute risks were calculated using the Aalen-Johansen estimator, which takes the competing risk of death into account.

Relationship between polygenic risk and lifestyle

Lifestyle factors such as obesity are part of the phenotypic spectrum of hypothyroidism but can also increase the risk of hypothyroidism23,24. Since inherited risk can be perceived as deterministic, we investigated whether adherence to a healthy lifestyle could offset genetic risk. Using UKB questionnaire data, we created a lifestyle scoring system with points awarded for healthy characteristics61:

  1. 1.

    No obesity (BMI < 30 kg m2)

  2. 2.

    Regular exercise (≥15 metabolic equivalent task hours per week)

  3. 3.

    Nonsmokers

  4. 4.

    Healthy diet, meeting at least three criteria:

    1. a.

      ≥3 fruit servings per day.

    2. b.

      ≥12 teaspoons of vegetables per day.

    3. c.

      ≥2 weekly servings of oily fish.

    4. d.

      ≤1 weekly serving of processed meat.

    5. e.

      ≤2 weekly servings of red meat.

A healthy lifestyle ranged from 3–4 points, intermediate 2 points and unhealthy 0–1 points. We analyzed the associations between lifestyle factors and hypothyroidism using Cox regression, adjusting for age at inclusion, sex and four PCs using the R package survival62. We also constructed a risk chart, displaying 10-year absolute risk of hypothyroidism for different combinations of age, sex, lifestyle characteristics and PRS deciles.

PRS correlation with selected malignancies, cardiometabolic and neuropsychiatric traits

We conducted a phenome-wide association study between the hypothyroidism PRS and 50 common diseases (including 12 malignant, 20 cardiovascular, 13 metabolic and 4 neuropsychiatric traits). We defined phenotypes in the UKB using a combination of ICD-9 and ICD-10 codes, cause of death registry and Office of Population Censuses and Surveys (OPCS-4; Supplementary Table 27). We tested the association between the PRS and individual phenotypes using logistic regression adjusted for age, sex and four PCs. We restricted analyses of cervical and breast cancer to females and prostate cancer to males. We set the threshold for multiple testing at P < 0.001 (0.05/50 traits).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.