Main

Approximately 40% of adults in the USA are classified as living with obesity—a chronic condition that affects the risk of several health conditions including type 2 diabetes (T2D) and cardiovascular disease. Until recently, therapeutic strategies for people with obesity were of limited efficacy2, with the main strategies for weight loss focused on lifestyle interventions, including diet and exercise3.

GLP1 and gastric inhibitory polypeptide (GIP) are two primary gastrointestinal hormones (incretins) secreted from the gastrointestinal tract after food ingestion. Incretins stimulate pancreatic β-cell proliferation and insulin secretion, delay gastric emptying and act centrally to suppress appetite. Recently, potent incretin mimetics, including semaglutide (a GLP1 receptor agonist) and tirzepatide (a dual GLP1 and GIP receptor agonist), have become widely available for patients with obesity or who have overweight and have a cardiovascular risk factor. The clinical efficacy of semaglutide and tirzepatide has led to these drugs being among the most commonly prescribed in the USA, with an estimated one in eight people having used a GLP1 receptor agonist medication4.

For people using GLP1 receptor agonists, there is large variation in weight loss1. In a study of semaglutide efficacy, the average reduction in weight from baseline was 10.2%, but 4.9% of patients achieved over 25% reduction from baseline, and 32.2% achieved less than a 5% reduction from baseline or even weight gain5. Identifying factors that predict a person’s response to GLP1 medications may help guide treatment strategy, including choice of drug, dose and speed of dose escalation.

In other contexts, genetic variation is known to have an important role in treatment response, both for intended and adverse events6,7,8. We hypothesized that some of the variation in GLP1 medication efficacy could be attributed to genetics, and surveyed 23andMe research participants regarding their use of GLP1 medications. Using these data, we conducted a large genome-wide association study (GWAS), and identified robust evidence that variants in the GLP1R locus are associated with both differential weight loss and side effects. Further analysis revealed an additional nausea and vomiting association in the GIPR locus specifically within the tirzepatide-treated population. We incorporated our genetic findings into broader models that combine demographic, clinical and genetic features to predict efficacy and side effects, and demonstrated the ability of this model to stratify patients by weight loss in a held out electronic health record (EHR) dataset. Our results highlight the opportunity for pharmacogenetics and precision medicine approaches applied to GLP1 medication.

In August 2024, we surveyed 23andMe research participants about their GLP1 medication usage, focusing on those taking weight loss medication. As of August 2025, we had collected over 27,885 survey respondents who had reported using at least one of Ozempic, Wegovy, compounded semaglutide, Mounjaro, Zepbound or compounded tirzepatide (Extended Data Table 1). Study participants were mostly female (82.4%) with a median age of 52 years. Most respondents were of European ancestry (78.3%), although the study also included substantial representation from Latino (12.9%) and African American (4.2%) ancestries.

Study participants reported a median body mass index (BMI) of 35.1 kg m−2 before initiating GLP1 treatment, with 96.8% of participants having a BMI of at least 25 at baseline. The median time that study participants reported taking GLP1 medications was 8.3 months. Following GLP1 treatment, participants reported having lost a median of 4.1 kg m−2 units of BMI (or 11.3 kg of weight), equivalent to a median of 11.7% of pre-treatment weight. Participants reported greater BMI loss using tirzepatide compared with semaglutide (median 4.75 versus 3.71 kg m−2; P = 9.7 × 10−29, median test), despite similar treatment times (8.1 versus 8.4 months), consistent with previous reports9.

Non-genetic predictors of BMI loss

We examined non-genetic factors that were predictive of GLP1 medication efficacy, as measured by the percentage change in BMI from baseline (ΔBMI%). As noted in other studies, medication seemed more effective in women10 (ΔBMI% = −12.2% in women versus −10.0% in men; P = 5.0 × 10−31, Mood’s median test; Supplementary Table 1), and in people of European ancestry11 (ΔBMI% = −12.1% in Europeans versus −10.6% in non-Europeans; P = 4.7 × 10−16, Mood’s median test; Supplementary Table 2) before correcting for other factors.

To investigate the relative contribution of non-genetic factors that predict GLP1 medication efficacy, we fit a linear model to assess the combined effect of age, sex, pre-treatment BMI, drug type, drug dose and time on drug on total BMI loss (Supplementary Table 3). In this combined analysis, we observe that the degree of BMI loss depends strongly on sex, drug type, time on treatment and drug dosage, consistent with previous reports11, and has a relatively weak dependence on pre-treatment BMI (with higher starting BMI associated with greater BMI loss). Efficacy also showed a modest reduction with age: each additional 10 years of age corresponded to a 0.5% reduction in BMI change (around 0.45 kg reduced weight loss efficacy), which is directionally consistent with recent clinical trial meta-analyses12,13. Collectively, this model explained about 21.4% of variance in percentage BMI loss (adjusted R2).

Previous literature has highlighted the role of ancestry11 and T2D status14 in determining GLP1 medication efficacy for weight loss. We therefore investigated the predictive value of these variables in our cohort, after adjusting for the non-genetic factors described above (Supplementary Information). Consistent with GLP1 receptor agonists being developed initially to treat T2D, 23.1% of survey participants reported using GLP1 medication to improve blood sugar. T2D status was found to be a highly significant predictor of weight loss efficacy (P = 2.0 × 10−73; Supplementary Table 4), with an average predicted reduction of 2.87 percentage points in BMI loss for people with a T2D diagnosis. Similarly, our data indicate differential efficacy by ancestry (Supplementary Table 4), with GLP1 medications being most effective in people of European ancestry, less effective in people of Latino ancestry and least effective in people of African American ancestry.

Comparison of self-report data with EHRs

Research participants were able to share EHR data with 23andMe through Apple HealthKit (Methods). Using this mechanism, we obtained EHR data from 909 participants with a recorded GLP1 medication prescription (Supplementary Table 5). Of these, 195 had also completed the GLP1 survey, allowing us an opportunity to investigate the relationship between EHR and self-report data in the context of GLP1 medication and weight loss.

Qualitatively, both self-report and EHR data showed similar distributions for weight loss efficacy (Extended Data Fig. 1a). However, EHR data showed smaller changes in BMI compared with baseline, with a median ΔBMI% of −5.79%, which is significantly less than that reported in the self-report surveys (−11.8%, P = 2.4 × 10−55). The EHR cohort was younger and had a larger proportion of men than the survey cohort (median age 44 years versus 52 years, and 37% male versus 18%), and both of these factors may contribute to some of the apparent difference in weight loss efficacy between self-report and EHR data. The discrepancy was reduced when restricting to the 195 participants with both survey and EHR data, but remained significant (self-report = −14.14% versus −8.43% in EHR, P = 1.1 × 10−9 paired t-test; Extended Data Fig. 1b), despite the fact that the EHR and self-reported ΔBMI% measurements were reasonably well correlated (Pearson r = 0.57).

Of the participants with both EHR and survey data, 125 self-reported using semaglutide, of whom 97.6% had a corroborating EHR record. Likewise, 70 participants self-reported using tirzepatide, of whom 85.7% had a corroborating EHR record. These data indicate that survey participants generally accurately self-reported which drug type they were using, although EHR data reported a higher fraction of users relying on compounded versions of the medications (18.9% in EHR versus 13.5% in self-report for semaglutide, and 6.1% versus 4.8% for tirzepatide). For dosage information, we found that 58.4% of semaglutide users and 64.3% of tirzepatide users self-reported a dosage equal to that recorded in the EHR. In general, the self-report data indicate that medication users often report higher doses than those recorded in their EHRs, with 88.8% of semaglutide and 84.3% of tirzepatide users self-reporting a dosage greater or equal to the EHR records. Furthermore, we found that treatment intervals reported in EHRs were reasonably consistent with self-report data (7.8 months versus 8.3 months for EHR and self-report respectively; Extended Data Fig. 1c), and were closely matched in participants with both survey and EHR data (median 9.45 months in EHR compared with 9.58 months in survey; Extended Data Fig. 1d).

GWAS of GLP1 medication efficacy

We performed a GWAS of the percentage BMI loss phenotype (n = 15,237), and identified a strong genome-wide significant association on chromosome (chr.) 6 (Fig. 1a,b and Supplementary Table 6). The index single-nucleotide polymorphism (SNP) of the main association was rs10305420 (P = 2.9 × 10−10, reference allele C, effect allele T), which conferred an additional 0.641% BMI loss per T allele, approximately equivalent to an additional 0.76 kg weight loss per allele (95% CI, [−1.27, −0.34] kg). We observed no evidence of dominance effects (P = 0.80), indicating that the allelic dosage of the T allele contributes to efficacy in an additive manner. In addition, we did not observe compelling evidence for additional independent associations within the locus (Extended Data Fig. 2a). We also confirmed that the association of rs10305420 with GLP1 medication efficacy cannot be accounted for by T2D or smoking status (Supplementary Information and Supplementary Tables 7 and 8).

Fig. 1: Genetic associations with GLP1 medication efficacy.
Fig. 1: Genetic associations with GLP1 medication efficacy.
Full size image

a, Manhattan plot of percentage BMI change (ΔBMI%) GWAS. SNPs achieving genome-wide significance (P < 5 × 10−8) are highlighted in red. b, Regional plot around the GLP1R locus. Colours indicate the strength of linkage disequilibrium (r2) relative to the index SNP (rs10305420). Non-coding variants are indicated with plus symbols; coding variants are indicated with multiplication symbols. c, Estimated effect sizes of the index variant in ΔBMI% for each ancestral population. Circles, point estimates; horizontal bars, 95% confidence intervals (CI).

The associated SNP, rs10305420, is a missense variant in the signal peptide of the GLP1R gene, changing the seventh amino acid from proline to leucine (p.Pro7Leu). To determine whether this variant is likely to be causal, we considered all 42 variants within the 99% credible set of the association, and annotated by predicted functional consequence (Supplementary Table 9). Within the credible set, rs10305420 is the only coding variant, and has the maximum posterior probability (35%). In addition, we tested for co-localization with expression quantitative trait loci in the locus, and identified none that colocalized with the GWAS signal (Supplementary Information). As such, we conclude that rs10305420 is probably the causal variant.

In the gnomAD database15, the rs10305420 T allele is most common in the European (40%) and Middle Eastern (38%) ancestry groups, followed by the Admixed American (28%), South Asian (20%) and East Asian (16%) ancestry groups, and least common in the African (7%) ancestry group. We retested rs10305420 for association in each of five additional populations; Latino, African American, East Asian, South Asian and Middle Eastern. Although the SNP did not achieve significance (P < 0.001) in any non-European population, the estimated effects were all directionally consistent (Fig. 1c) and a fixed effect meta-analysis increased the significance of the observed association (P = 1.1 × 10−12).

We compared the SNP effect among people treated with semaglutide versus tirzepatide (Extended Data Fig. 3a,b). When testing each drug type separately, we estimate the effect of the SNP on ΔBMI% to be larger in tirzepatide (effect = −0.95%) than semaglutide (effect = −0.51%), although the 95% confidence intervals were overlapping (Extended Data Fig. 3c). In a joint regression model including a snp:drug type interaction term, we determined the difference to be weakly significant (P = 0.02).

We further replicated the SNP association in the All of Us cohort, using data derived from 4,855 people with EHR data. The association between percentage BMI change and rs10305420 was replication significant, with directionality consistent with that observed in the 23andMe cohort (P = 0.001; effect = −0.47%; Supplementary Table 10). The association did not replicate in UK Biobank, although the expected power for replication in this cohort was low (Supplementary Table 11 and Supplementary Information).

GWAS of GLP1 medication side effects

We performed a GWAS of 11 side effect phenotypes that contrast patients with moderate or severe side effects to those who experienced mild or no side effects. We identified associations within each of the nausea and vomiting side effect GWAS, both of which occurred within the vicinity of GLP1R locus (Extended Data Fig. 4). The index SNP for the vomiting signal was rs11760106 (P = 2.5 × 10−27; T allele odds ratio = 1.57), whereas the index SNP for the nausea signal was rs9357296 (P = 2.6 × 10−28; G allele odds ratio = 1.36). We did not observe any difference in the SNP effect on side effects for semaglutide versus tirzepatide (Extended Data Fig. 3d,e). Conditional analysis did not identify independent associations within the locus for either trait (Extended Data Fig. 2b,c).

Although both of these index SNPs were in moderate linkage disequilibrium with the BMI-loss coding variant (rs11760106 versus rs10305420r2 = 0.57; rs9357296 versus rs10305420r2 = 0.75), neither signal included the coding variant within the 99% credible set16. To determine whether the signals share the same causal variant, we applied co-localization analysis17 and found that the signals co-localize with high probability (ΔBMI% versus nausea, H4 = 96.6%; ΔBMI% versus vomiting, H4 = 88.5%; nausea versus vomiting, H4 = 92.1%). Multi-trait co-localization supports the conclusion that these signals probably represent the same signal, with a 72.6% posterior probability of co-localization (Supplementary Information). From the co-localization analysis, we infer that increased nausea or vomiting is associated with greater BMI loss efficacy (Extended Data Fig. 5).

Unlike semaglutide, tirzepatide is a dual receptor agonist targeting both GLP1R and GIPR. By performing a GWAS in the tirzepatide-treated population alone (Supplementary Table 12), we identified an association between the vomiting side effect and GIPR (rs71338792, P = 4.2 × 10−9, odds ratio = 1.84; Fig. 2a). The index variant is in near perfect linkage disequilibrium (r2 = 0.99) with a missense variant within GIPR (rs1800437, P = 5.1 × 10−9; Fig. 2b), which we conclude is the causal variant for this association (Supplementary Table 13). This variant alters the 354th amino acid of the protein sequence from glutamic acid (G allele) to glutamine (C allele; p.Glu354Gln). The G allele (our reported effect allele, odds ratio = 0.546) is protective, whereas the C allele is associated with a higher risk of the vomiting side effect (odds ratio = 1/0.546 = 1.83).

Fig. 2: Genetic associations with GLP1 medication side effects.
Fig. 2: Genetic associations with GLP1 medication side effects.
Full size image

a, Manhattan plot of experiencing moderate-to-severe vomiting while on tirzepatide treatment. SNPs achieving genome-wide significance (P < 5 × 10−8) are highlighted in red. b, Regional plot around the GIPR locus. Colours indicate strength of linkage disequilibrium (r2) relative to the index SNP (rs71338792). Note that the probable causal missense variant, rs1800437, is located within the cluster of significant SNPs and is in very high linkage disequilibrium with the index SNP (r2 = 0.99). Non-coding variants are indicated with plus symbols; coding variants are indicated with multiplication symbols. c, Estimated effect sizes of rs1800437 in the vomiting side effect, partitioned by drug type, for Europeans, Latinos and a fixed effect meta-analysis. Circles, point estimates; horizontal bars, 95% confidence intervals (CI). OR, odds ratio.

In the gnomAD database15, the rs1800437 C allele is most common in the East Asian (21%) and European (20%) ancestry groups, followed by the Middle Eastern (18%) and South Asian (14%) ancestry groups, and least common in the African or African American (11%) and Admixed American (11%) ancestry groups. We retested rs1800437 for association with the vomiting side effect in the Latino population, which was the only non-European population with sufficient sample size, and found it to be directionally consistent (P = 0.03, odds ratio = 0.49). A fixed effect meta-analysis increased the significance of the observed association (P = 1.1 × 10−10, odds ratio = 0.54). No evidence of effect was observed within the semaglutide population (Fig. 2c).

We observed that 11.8% of tirzepatide-treated rs1800437-CC carrier individuals experienced vomiting, compared with 9.4% for semaglutide-treated people carrying the same genotype, although the difference was not significant (P = 0.12, two-sample z-test; Supplementary Table 14). We observed no evidence of dominance at rs1800437 (P = 0.16). However, when testing for an interaction between the GLP1R variant (rs10305420) and the GIPR variant (rs1800437), we identified weak evidence for interaction (P = 0.018; Supplementary Table 15). We estimate that people who are homozygous for the risk alleles at both the GLP1R and GIPR loci have a 14.8-fold increased odds (95% confidence interval [6.2, 35.8]) of tirzepatide-mediated vomiting, compared with those homozygous for the non-risk alleles.

Across phenotypes, our data indicate an association between rs1800437 and tirzepatide treatment for both the vomiting and nausea phenotypes, and potentially constipation, but the variant does not seem to influence efficacy (P = 0.73; Extended Data Fig. 6).

Phenome-wide associations

In the GWAS catalogue18, the index BMI-loss SNP in GLP1R, rs10305420, has been reported as associated with T2D19,20, smoking initiation21, fasting glucose20 and haemoglobin A1c (HbA1c) measurements20. These data are consistent with the T allele decreasing risk for T2D, as well as decreasing glucose and HbA1c levels, but slightly increasing the risk of smoking initiation (Supplementary Table 16). Although the SNP is not associated with any traits in the UK Biobank with P < 1 × 10−4, rs10305420 is associated with obesity related-traits at weak significance (P value around 1 × 10−5) in FinnGen. In the MVP cohort20, rs10305420 is associated significantly with BMI-associated traits (P value around 2 × 10−15), although it is in linkage with an upstream association (Extended Data Fig. 7).

In the 23andMe database, we observed rs10305420 to be associated with a number of traits (Supplementary Table 17), including T2D and traits linked to glucose metabolism. The T allele was associated with reduced risk of T2D, and increased likelihood of having been a smoker—observations that are consistent with public data. In addition, we observed association with diet-related phenotypes, including preference of sugary foods, red meat consumption and number of cavities. Finally, the T allele was associated with reduced risk of morning sickness during pregnancy. Notably, the 23andMe data did not reveal an association with BMI or weight, indicating that the association of the SNP may be identifiable more strongly in the context of GLP1 treatment.

The GIPR missense variant, rs1800437, has been associated with a large number of traits in the public domain, including BMI and body mass phenotypes, T2D, blood glucose and a number of metabolic or haematologic traits including blood pressure, reticulocyte and/or erythrocyte volume and urinary calculus (Supplementary Table 18). This broad pleiotropy was supported by associations observed in the 23andMe database (Supplementary Table 19).

Modelling of treatment response

We aimed to build models of treatment response that combined both genetic and non-genetic predictors. As such, we integrated treatment, clinical, demographic, disease diagnosis and genetic variables as predictors in models of both efficacy and side effects.

The model of BMI-loss efficacy demonstrated good performance in the self-report data, explaining 25% of the variance (R2) in both the training set and held out test set, with most of the variance explained by non-genetic factors (Supplementary Table 20). T2D, non-alcoholic fatty liver disease (also known as metabolic dysfunction-associated steatotic liver disease) and hypertension diagnoses were all associated with lower weight loss efficacy, consistent with previous reports22,23,24. Calibration plots confirmed the model was well calibrated in the test set (Extended Data Fig. 8).

To assess the utility of the model in a real world setting, we applied the model to a set of EHR data not used during model construction, and predicted efficacy at 6 months, assuming that the drug type and final dose was unknown at time point zero. We observe that people predicted to achieve higher weight loss did indeed achieve higher weight loss in the longitudinal EHR data (Fig. 3a).

Fig. 3: Combined genetic and non-genetic model performance in test set.
Fig. 3: Combined genetic and non-genetic model performance in test set.
Full size image

a, Validation of ΔBMI% model in longitudinal HealthKit EHR data. Circles, point estimates; horizontal bars, 95% confidence intervals. b, Receiver operating characteristic curve for nausea side effect, as assessed in held out test data. c, Receiver operating characteristic curve for vomiting side effect, as assessed in held out test data. AUC, area under the curve; CI, confidence interval.

For the side effect models, we focused on the two phenotypes for which GWAS SNPs were identified (Supplementary Table 21). The nausea model achieved a receiver operating characteristic area under the curve of 65.4% in the test set, whereas the vomiting model achieved a receiver operating characteristic area under the curve of 68.0% (Fig. 3b,c), although, in both cases, the contribution from genetics was relatively modest compared with that from non-genetic factors (Supplementary Table 22).

Discussion

We report a large-scale GWAS of treatment response to GLP1 medications and identify a missense variant in GLP1R, which encodes the therapeutic target of semaglutide and tirzepatide that associates with BMI loss. In addition, variants linked to both GLP1R and GIPR are associated with GLP1 medication-related nausea and vomiting.

The fact that the loci we identified map to the genes encoding pharmacodynamic targets of GLP1 medications provides compelling biological plausibility for their role in treatment response. Genetic variants that alter the function of a therapeutic receptor are known to influence drug response through several mechanisms.

The GLP1R variant rs10305420 is a missense variant in the signal peptide region, and may influence stability of the peptide secondary structure (Extended Data Fig. 9). Specifically, the effect allele encodes for leucine, which is more hydrophobic than the reference proline allele and may therefore increase stability of the hydrophobic region of the signal peptide (Supplementary Information). As such, the association with efficacy may be the result of enhanced trafficking of the protein within the cell that ultimately modulates the cell surface abundance of the receptor, rather than the result of alterations in the receptor protein structure that may influence ligand affinity.

Previous literature has also identified rs10305420 as influencing GLP1 receptor agonist efficacy25,26, although with opposite directionality to our study. It is unclear why directionality would vary between studies, although we note these studies were performed in specific disease contexts, and are in substantially smaller samples. Power calculations suggest that previous studies would have had limited power to detect an effect size similar to that identified in our study (Supplementary Table 23), and also differed in terms of drug type, disease cohort and analytical choices. Given the independent replication of our observed directionality in an external cohort using EHR data, the directionality inferred by our study is unlikely to be the result of a data quality issue.

Similarly, the GIPR variant rs1800437 corresponds to the missense substitution p.Glu354Gln. The Gln variant has been well characterized as a partial loss-of-function mutation27,28. Our data indicate an increased risk of vomiting and nausea side effects associated with the Gln allele in tirzepatide-treated patients, which may result from a diminished ability of the GIP component of the drug to buffer the nausea-inducing effects of the GLP1 component. Recent preclinical studies have demonstrated that central GIP receptor activation can attenuate the aversive effects, such as nausea and vomiting, typically induced by GLP1 receptor agonism29.

Understanding the genetic basis of GLP1 medication treatment response is of substantial clinical importance. There is large inter-person variability in response to GLP1 medications both in terms of weight loss and side effects, with the two potentially being related (that is, people who experience more nausea may lose more weight30). The discovery of robust predictors of treatment response could enable the prediction, from the outset, of a person’s probable treatment journey, and thereby pave the way for precision medicine through the clinical implementation of combined phenotypic and genotypic prediction models.

We have shown that self-report data is a complementary instrument for collecting data regarding treatment response that is qualitatively consistent with medical records, while also enabling collection of information regarding side effects that may not be readily available in EHRs. Although EHR-recorded weight loss was generally less than the self-reported values, several factors may explain this discrepancy. Previous work has noted that self-report data remain a valid and useful measure in many contexts31, but downward biases in self-reported weight are common32 and may be influenced by factors including health awareness, culture and social norms33. Conversely, EHR data may offer an incomplete picture of a person’s weight variation. If patients receive GLP1 medications outside their primary health system (for example, from telehealth) or change providers during the treatment, the EHR data collected in our study may not capture the full course of treatment, and this may explain our observation of generally shorter treatment durations, and hence less weight loss, recorded in EHRs.

Our study detected a robust genetic association with GLP1 medication weight loss efficacy and associated side effects. Although the genetic effect sizes we detected are modest, it is likely that additional data will reveal further associations and increase the predictive utility of genetics in this context. Future research should develop longitudinal datasets to enable analysis of weight loss and side effects in the context of dosage escalations, while exploring genetic factors, which could then inform how genetics might be leveraged in clinical decision-making at treatment initiation and beyond.

Methods

Overview of study recruitment

Participants in this study were recruited from the customer base of 23andMe. Participants provided informed consent and volunteered to participate in the research online, under a protocol approved by the external Association for the Accreditation of Human Research Protection Programs-accredited Salus Institutional Review Board (https://www.versiticlinicaltrials.org/salusirb). Participants were included in the analysis on the basis of consent status verified at the time data analyses were initiated.

The 23andMe GLP1 survey was launched to research participants in August 2024. The survey aimed to capture participants’ experiences with GLP1 receptor agonist medication, and was targeted to 23andMe participants who had previously responded in the affirmative to the question, ‘Have you ever taken prescription medications to help you lose weight?’. The survey included questions regarding drug brand, dosing regimen, time on treatment, efficacy (including pre-treatment weight and weight on treatment), and side effects, as well as reasons for pursuing or stopping GLP1 treatment. We focused the survey and subsequent analysis on primarily six drug varieties; Ozempic, Wegovy, compounded semaglutide, Mounjaro, Zepbound and compounded tirzepatide, the first three of which represent variations of semaglutide, and the last three represent variations in tirzepatide. A full list of survey questions can be found in Supplementary Table 24.

Phenotype definitions

Using the information derived from the surveys, we defined phenotypes that aimed to capture aspects of drug efficacy and side effects. We defined our efficacy phenotype as the contrast between pre-treatment BMI to post-treatment BMI (or current BMI, if treatment is on-going). In general, for study participants who reported taking more than one GLP1 medication, we selected the GLP1 medication that they reported taking for the longest period of time. Specifically, we defined a percentage BMI change phenotype as:

$${{\rm{\Delta BMI}}}_{ \% }=100({\mathrm{BMI}}_{2}-{\mathrm{BMI}}_{1})/{\mathrm{BMI}}_{1}$$

where BMI1 and BMI2 represent pre-treatment and post-treatment BMI, respectively, measured in weight in kilograms per height in metres squared. We applied quality control filters to people with weight less than 36 kg or greater than 181 kg, height less than 1.39 m or greater than 2.06 m, BMI less than 14 kg m−2 or greater than 70 kg m−2, or age less than 18 years. In aggregate, these initial filters removed 80 people (0.29%). Inspection of the ΔBMI% phenotype revealed a heavy tailed distribution, so we further quality controlled the ΔBMI% phenotype to remove outlier participants with BMI changes above 20% or below −45% (Extended Data Fig. 10). The ΔBMI% estimates were set to missing for participants who did not pass quality control.

To enable genetic associations to be interpreted in units of weight rather than ΔBMI%, we also defined a corresponding Δweight phenotype, defined as the change in weight from baseline in kilograms. We note that, because adult height is treated as constant during the treatment window, the percentage change in BMI (ΔBMI%) is mathematically identical to the percentage change in weight (Δweight%).

For the side effect phenotypes, we defined separate case–control phenotypes for each side effect recorded in the survey, contrasting those who self-rated their side effects as moderate or severe (cases) to those who self-rated their side effects as mild or non-existent (controls). As before, for study participants who reported taking more than one GLP1 medication, we selected the GLP1 medication that they reported taking for the longest period of time.

We further defined phenotypes to represent covariates, specifically for drug type (semaglutide = 1 versus tirzepatide = 0), dosage and days on treatment. For the dosage phenotype, we used the reported most recent weekly dosage in milligrams; this was either the final dose or the current dose for people still taking medication.

Comparison of self-report and EHR data

As part of the 23andMe experience, research participants are offered the opportunity to share EHR information collected on their Apple iPhone devices. Specifically, the Apple Health application enables connection to healthcare providers for the purposes of sharing EHR information with third parties through Apple HealthKit (https://developer.apple.com/documentation/healthkit). 23andMe research participants can elect to share their EHR information for research purposes. We used these data to perform comparisons with the self-report survey data. Full details of comparison analyses are provided in Supplementary Information.

Non-genetic predictors of BMI loss

To analyse the dependence of achieved BMI loss on non-genetic factors such as drug type, dosage and time on treatment, we fit the following model:

$$\begin{array}{c}{{\rm{\Delta BMI}}}_{ \% }\sim \mathrm{age}+\mathrm{sex}+{\mathrm{BMI}}_{1}+\mathrm{drugType}+\mathrm{dose}+{\mathrm{days}}_{\mathrm{treat}}\\ \,+\,\mathrm{drugType}:\mathrm{dose}+\mathrm{drugType}:{\mathrm{days}}_{\mathrm{treat}}\\ \,+\,\mathrm{dose}:{\mathrm{days}}_{\mathrm{treat}}+\mathrm{dose}:{\mathrm{days}}_{\mathrm{treat}}:\mathrm{drugType}\end{array}$$
(1)

where ‘drugType’ is an indicator variable that equals 1 for individuals using semaglutide and 0 for tirzepatide, ‘dose’ represents the dose in milligrams, daystreat represents the total days on the relevant drug and ‘:’ represents an interaction term between two or more variables. Note that semaglutide and tirzepatide typically have different standard dosing levels, which is handled in the regression model by the ‘drugType:dose’ interaction term.

Genotyping and SNP imputation

DNA extraction and genotyping were performed on saliva samples by Clinical Laboratory Improvement Amendments-certified and College of American Pathologists-accredited clinical laboratories of Laboratory Corporation of America. Samples were genotyped on one of five genotyping platforms. The V1 and V2 platforms were variants of the Illumina HumanHap550 BeadChip and contained a total of about 560,000 SNPs, including about 25,000 custom SNPs selected by 23andMe. The V3 platform was based on the Illumina OmniExpress BeadChip and contained a total of about 950,000 SNPs and custom content to improve the overlap with our V2 array. The V4 platform was a fully custom array of about 950,000 SNPs and included a lower redundancy subset of V2 and V3 SNPs with additional coverage of lower-frequency coding variation. The V5 platform was based on the Illumina Global Screening Array, consisting of approximately 654,000 preselected SNPs and approximately 50,000 custom content variants. Participant genotype data were imputed against a reference panel composed of data from the Haplotype Reference Consortium34 and augmented with additional sequences to boost imputation performance (Supplementary Information).

Association testing

We performed a GWAS of ΔBMI% in people of European ancestry using methods that have been described previously35. In brief, unrelated participants were included in the GWAS analyses on the basis of European ancestry as determined by a genetic ancestry classification algorithm36. The GWAS was performed including covariates as described in equation 1 above, with the addition of five genetic principal components to account for fine-scale genetic ancestry, and indicator variables to account for variation in the genotyping platform. Among 21,822 people of European ancestry, we required participants to have complete data needed to construct the target phenotype and GWAS covariates (that is, data available for pre-treatment weight, post-treatment weight, drug type, dosage, time on treatment and factors such as age, sex and height), resulting in 18,488 participants. Finally, participants were filtered on relatedness such that no two people shared more than 700 cM identity by descent37, which corresponds approximately to the minimal expected sharing between first cousins in an outbred population, resulting in a final GWAS sample size of 15,237. An equivalent procedure was used for a GWAS of side effect phenotypes. For the purposes of testing drug-specific associations, we repeated the GWAS procedure for the semaglutide and tirzepatide-treated populations separately, removing the drug-type covariate and interaction terms as appropriate. All GWASs were adjusted for inflation using genomic control, with the inflation factor being no more than 1.035 in all phenotypes.

Given the smaller sample sizes available in non-European populations, we did not perform genome-wide association testing in these populations, and instead focused analyses on variants discovered as associated in the European GWAS. For these variants, we tested for association in non-European populations following a similar approach to that described above.

Replication

We performed replication of the identified efficacy association in the All of Us cohort38, using Controlled Tier Dataset v.8. We extracted genomic data, EHR data and a drug code referring to either semaglutide or tirzepatide from 9,579 participants. After filtering to retain participants with information regarding pre-treatment and post-treatment BMI and genotype data passing quality control, we obtained 4,889 participants, of which 3,948 had complete data when incorporating covariates akin to those used in the GWAS. For the replication analysis, we tested for association between the EHR-derived ΔBMI% and the genotype, including covariates. We repeated the replication analysis having performed mean-imputation of missing drug dose data, allowing a larger sample size of 4,855 to be analysed.

We also attempted replication analysis in the UK Biobank cohort, although the available data predate the availability of semaglutide or tirzepatide, and hence relied on earlier variants of GLP1 receptor agonists. Full details of the replication analysis methodology is provided in Supplementary Information.

Genetic and non-genetic risk modelling

To construct combined genetic and non-genetic models of ΔBMI% and risk of treatment-related side effects, we selected treatment, clinical, demographic, disease diagnosis and genetic variables as predictors. In addition to the covariates included in the GWAS, we also included years of education as a proxy for socio-economic status, and binary indicators of previous disease diagnosis for T2D, hypertension and non-alcoholic fatty liver disease. All continuous predictor variables were standardized before modelling to allow for the comparison of effect sizes.

We used a linear multi-variable model to fit ΔBMI%. Given the binary nature of side effect phenotype definitions, we fitted multi-variable logistic regression models (equivalent to a generalized linear model with a binomial family and logit link function). The dataset was partitioned randomly into training (70% of the sample) and held out test (30%) sets, with the test set being used to assess model performance. Further details are outlined in Supplementary Information.

Model performance of efficacy was further assessed by applying the model derived from our self-report data in a sample of 642 people who had provided HealthKit EHR data but had not completed the GLP1 survey, and hence were not used in the construction of the model. To replicate the situation where efficacy predictions are made before treatment, we assumed the dose, treatment duration and drug type variables were unknown, and imputed these values in the model to an arbitrary constant value for all participants.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.