Introduction

Variability in drug effectiveness and/or safety greatly impacts therapeutic outcome, with drug-response rates varying widely from 25% to 80% among the commonly used drugs1. Twin and case/control studies have established that genetic factors contribute to variability in drug response. Pharmacogenetics (PGx) explores the influence of an individual’s genetic makeup on drug metabolism, efficacy, and adverse events2. Incorporating PGx into clinical practice is a promising strategy for healthcare clinicians to tailor drug selection and dosing to maximize therapeutic benefits while minimizing the risk of drug-related adverse events2,3. PGx also gains increasing attention in pharmaceutical industry for its potential in drug development and drug repurposing4.

To facilitate the clinical implementation of PGx, experts from regulatory agencies and consortia such as the United States Food and Drug Administration (US FDA)5,6 and Clinical Pharmacogenetics Implementation Consortium (CPIC)7 have published clinical practice recommendations based on level of evidence and advocated for incorporating pharmacogenetic testing when a relevant drug is being prescribed. In a comprehensive study examining PGx variation within the UK Biobank, it was found that the average participant carried genetic variants that would affect their response to around 10 drugs based on CPIC guidelines8. In Australia, merely 4% of the study participants lacked actionable PGx variants, and 42% of them had more than 2 actionable PGx variants9. These findings argue for large-scale implementation of PGx-guided therapy. However, the discordant recommendations from different agencies and consortia make clinical implementation of PGx challenging10,11,12. Furthermore, the benefits of PGx-guided therapy have not been established in large, population-based studies, especially in non-European populations. Therefore, it is crucial to perform studies of large cohorts to evaluate the influences of PGx variants before clinical implementation.

The Taiwan Precision Medicine Initiative (TPMI), a consortium of researchers from the Academia Sinica and 33 partner hospitals across Taiwan, has enrolled 486,956 participants and obtained genetic and longitudinal clinical data from each person13. With access to their drug prescription and drug-related adverse event history on several commonly prescribed drugs, we conducted a retrospective study to analyze four PGx gene-drug pairs with dosing recommendations from CPIC and US FDA to determine the impact of PGx risk variants on drug response and toxicity in the largest Asian cohort ever studied in PGx. Specifically, we evaluated the association between genetic variants and risk for adverse events, including NUDT15/TPMT and azathioprine (AZA)-induced myelosuppression, CYP2C19 and clopidogrel-related major adverse cardiovascular events (MACEs), ABCG2/CYP2C9/SLCO1B1 and statin-associated myopathy (SAMs), and CYP2C9 and non-steroidal anti-inflammatory drugs (NSAID)-linked gastrointestinal (GI) and renal toxicity. These pairs were selected based on sufficient sample size for statistical power, and the availability of relevant clinical data. To ensure both clinical relevance and robust methodological design, we focused on gene-drug pairs that could be reliably analyzed. As discussed below, limitations in CYP2D6 genotyping meant that several important pairs could not be analyzed and were excluded from the study.

Results

Landscape of clinically actionable pharmacogene variants in the cohort

The 486,956 Han Chinese participants of the TPMI were genotyped with one of two SNP arrays (TPMv1 with 686,463 SNPs or TPMv2 with 743,227 SNPs) that contained 3949 and 2911 PGx markers, respectively. In this study, we extracted for the cohort the risk variant (star alleles) status of clinically actionable PGx markers or human leukocyte antigen (HLA) types in 19 pharmacogenes together with the associated phenotypes, including their effects on metabolic enzymes, transporters, immune mediators, and mitochondria proteins (Supplementary Tables 1, 2, and Fig. 1a). Variants in these 19 genes affected the response of 58 commonly prescribed drugs. Overall, 99.9% of TPMI participants possessed at least one PGx variant, which was mainly due to the highly prevalent VKORC1 rs9923231 (–1639 G > A) variant, as previously seen in other Han Chinese cohorts14,15. On average, each TPMI participant carried 4.3 clinically actionable PGx risk variants (Fig. 1b).

Fig. 1: TPMI participants with actionable variants/HLA types in 19 pharmacogenes.
figure 1

a The fraction of TPMI participants with actionable variants/HLA types in pharmacogenes; (b) The number of individuals carrying actionable PGx risk variants or HLA types.

Drug use in people carrying PGx

We extracted and analyzed the drug prescription data from the electronic medical record (EMR) of TPMI participants to determine their drug usage. Among the TPMI participants, 48.7% of the TPMI participants have been prescribed at least one of the 58 drugs with clinical practice recommendations based on their genetic status in the 19 pharmacogenes. Additionally, 28.4% took two or more drugs with PGx information (Supplementary Table 3). Individuals with CYP2C19 loss-of-function (LoF) alleles and SLCO1B1 decreased or poor function alleles have been exposed to more drugs with clinical practice recommendations. Among the individuals carrying clinically actionable PGx variants, 17.8% of them have been prescribed the responding high-risk drugs, defined as those requiring dose adjustments, alternative therapies, or additional monitoring when prescribed to individuals with actionable PGx variants (Fig. 2 and Supplementary Table 2). The top 20 most commonly prescribed drugs included statins, NSAIDs, proton-pump inhibitors (PPI), anti-platelet drugs, and antibiotics (Table 1).

Fig. 2: The distribution of risk drugs prescription in people carrying actionable PGx variants.
figure 2

The number of people carrying actionable PGx risk variants or HLA types who took (orange) or did not take (blue) the drug for which they were at risk; number above bar denotes those who took the drug for which they were at risk.

Table 1 Clinical practice recommendations for the TPMI participants who took the drug based on their PGx phenotype

Based on the PGx clinical practice recommendations from US FDA and CPIC, those carrying actionable PGx variants needed to have adjusted dosage, to be given alternative drugs, or to be paid extra attention when they were prescribed with high-risk drugs (Table 1). Because the PGx risk variant status of these individuals was unknown when they were prescribed the drugs, the PGx clinical practice recommendations were not followed. However, based on the available medical data, most patients did not suffer from any of the predicted adverse events. For instance, 85.9% of CYP2C19 LoF allele carriers tolerated clopidogrel without MACE, 78–83% of NUDT15/TPMT risk allele carriers tolerated azathioprine, and over 98% of individuals with high-risk statin-related PGx profiles did not develop muscle-related adverse events.

Treatment outcomes of those with PGx risk variants/HLA types

We examined the clinical outcomes of TPMI participants who were prescribed drugs known to be affected by PGx variants/HLA types by assessing their EMRs for the drug prescribed, any dosage/drug changes or clinical intervention, and lab test results. By comparing the therapeutic outcomes of patients with or without clinically actionable PGx variants/HLA types, we aimed to determine the impact of PGx screening in reducing adverse drug events (ADEs) and improving therapeutic responses. We selected 4 gene-drug pairs for which the TPMI data had large enough sample size and comprehensive clinical data for outcome assessment.

Impact of CYP2C19 risk variants on clopidogrel-related MACE

Clopidogrel is an anti-platelet drug commonly used in patients with unstable angina (UA), myocardial infarction (MI), stroke, and peripheral arterial disease to prevent MACE. Clopidogrel is a prodrug that requires enzymatic activation, primarily through CYP2C19. A meta-analysis showed that clopidogrel exhibited a significantly higher risk of MACE in patients with CYP2C19 LOF alleles16.

Data from 28,055 clopidogrel users who met the criteria for inclusion were used in this study (Supplementary Fig. 1). Among them, 12.4% developed MACE, including MI (2.1%), UA (2.5%), heart failure (HF, 4.7%), target lesion revascularization (TLR, 3.7%), stroke (2.3%), or cardiovascular death (CV death, 0.4%). The demographics, clinical characteristics, and CYP2C19 status of the patients were found in Supplementary Table 4. There was no significant difference in age, gender, and comorbidities between individuals with or without MACE. Of note, concomitant use of the proton pump inhibitors showed significant association with clopidogrel-related MACE (35.3% vs. 40.8%, P = 1.6×10-10), which aligned with previous findings from a nationwide population-based study using the Taiwan National Health Insurance database17.

Patients taking clopidogrel with either one or two CYP2C19 LoF alleles were significantly associated with MACE compared to those with no LoF alleles (P = 2.97 × 10-27, OR = 1.53, 95% CI = 1.42–1.65), including MI (P = 2.09 × 10-31, OR = 3.98, 95% CI = 3.15–5.02), UA (P = 7.24 × 10-13, OR = 1.88, 95% CI = 1.58–2.23), HF (P = 1.77 × 10-4, OR = 1.37, 95% CI = 1.22–1.55), TVR (P = 1.91 × 10-4, OR = 1.29, 95% CI = 1.13–1.47), and stroke (P = 6.31 × 10-5, OR = 1.41, 95% CI = 1.19–1.67) under multivariate analysis (Fig. 3a and Supplementary Table 5). Both CYP2C19 intermediate metabolizers (IM) and poor metabolizers (PM) had higher MACE incidence under clopidogrel treatment; however, 85.9% of people with CYP2C19 LOF alleles tolerated clopidogrel treatment well. There was no significant difference in risk of CV death under clopidogrel therapy for those with or without CYP2C19 LoF alleles (Fig. 3b, c and Supplementary Table 5).

Fig. 3: Impact of CYP2C19 in clopidogrel-related MACE.
figure 3

Forest plot of the MACE risk in clopidogrel users with different CYP2C19 phenotypes: (a) people who carried at least one CYP2C19 LoF alleles vs non-carriers, (b) CYP2C19 IM vs NM, and (c) CYP2C19 PM vs NM. The case number, OR, 95% CI, and P value were listed in the table. The ORs and 95% CIs were estimated using logistic regression adjusting for covariates (two-sided test). Data points represent odds ratios; error bars represent 95% confidence intervals. Significant associations are shown in red. CI = confident interval; CV_death = cardiovascular death; HF = heart failure; IM = intermediate metabolizer; LoF = loss-of-function; MACE = major adverse cardiovascular events; MI = myocardial infarction; NM = normal metabolizer; OR = odds ratio; PM = poor metabolizer; TLR = target lesion revascularization; UA = unstable angina.

Impact of NUDT15/TPMT in AZA-related adverse events

AZA is a commonly prescribed immunosuppressive antimetabolite in the management of acute lymphoblastic leukemia, autoimmune conditions, and organ transplantation. AZA has a narrow therapeutic index and a high potential for ADEs, like bone marrow toxicity, hepatotoxicity, massive hair loss, nausea and vomiting. Nudix hydrolase 15 (NUDT15) and thiopurine-S-methyltransferase (TPMT) are two important enzymes that decrease the concentration of AZA active metabolites, so higher AZA toxicity risk is commonly observed in people with deficient NUDT15 and TPMT18,19,20. Both CPIC and US FDA recommend a significant or substantial dose reduction for NUDT15 or TPMT IM and PM6,21.

We studied 8451 participants using AZA for prevention of renal transplant rejection and treatment of inflammatory conditions, including systemic lupus erythematosus (SLE), Sjogren syndrome, rheumatoid arthritis (RA), other connective tissue disorders, Crohn’s disease, ulcerative, or severe atopic dermatitis (Supplementary Fig. 2). A total of 1503 (17.8%) patients stopped AZA treatment due to intolerable ADEs such as leukopenia (10.7%), thrombocytopenia (6.8%), hepatitis (4.6%), GI discomfort (1%), alopecia (0.7%), and allergy (0.6%). The distribution of sex, comorbidity, and usage of concurrent drugs (aspirin, allopurinol, corticosteroids, and methotrexate) were not significantly different in the patients with AZA-induced ADEs compared with AZA tolerant controls, but concomitant use of allopurinol and AZA increased risk for adverse events development (Supplementary Table 6). The age of initial use of AZA in the ADE group was slightly younger than the tolerant control group.

The NUDT15 IM (P = 5.44 × 10-4, OR = 1.28, 95% CI = 1.11–1.48) and NUDT15 PM (P = 2.88 × 10-8, OR = 4.05, 95% CI = 2.47–6.64) phenotypes were associated with AZA-induced adverse events under multivariant regression analysis (Fig. 4a–c and Supplementary Table 7). The incidences of leukopenia in intermediate/poor NUDT15 metabolizers (14.9% and 37.7%, respectively) patients were significantly higher than that in extensive NUDT15 metabolizers (NM; 9.4%) (P  =  4.47×10-14, OR = 1.85, 95% CI = 1.58–2.17). The significant associations between AZA discontinuation due to thrombocytopenia (P  =  4.91×10-3, OR = 2.67, 95% CI = 1.35–5.3) or massive hair loss (P  =  1.24×10-5, OR = 10.84, 95% CI = 3.72–31.58) and NUDT15 were only observed in PMs. AZA discontinuation rates due to adverse events did not differ between patients with normal and decreased TPMT activity (Fig. 4d–e and Supplementary Table 7).

Fig. 4: Influence of NUDT15 and TPMT for AZA discontinuation due to ADE.
figure 4

Forest plot of the ADE risk in AZA users with different NUDT15 and TPMT phenotypes: (a) people who carried at least one NUDT15 LoF alleles vs non-carriers, (b) NUDT15 IM vs NM, (c) NUDT15 PM vs NM, (d) people who carried at least one TPMT LoF alleles vs non-carriers, (e) TPMT IM vs NM. The case number, OR, 95% CI, and P value were listed in the table. The ORs and 95% CIs were estimated using logistic regression adjusting for covariates (two-sided test). Data points represent odds ratios; error bars represent 95% confidence intervals. Significant associations are shown in red. ADE = adverse events; CI = confident interval; GI = GI discomfort; IM = intermediate metabolizer; LoF = loss-of-function; NM = normal metabolizer; OR = odds ratio; PM = poor metabolizer.

Again, the clinical impact was limited, with only 21.4% of NUDT15 IM/PM and 16.7% of TPMT IM having severe AZA-induced ADEs, whereas 16.9% of NUDT15 NM and 17.8% of TPMT NM suffered from AZA-induced ADEs.

Impact of ABCG2/CYP2C9/SLCO1B1 in SAMs

Statins are used in reducing plasma low-density lipoprotein cholesterol (LDL-C) levels and preventing the risk of atherosclerotic cardiovascular diseases22. However, statins use is often associated with side-effects, particularly those related to musculoskeletal and hepatic systems. SAMs stand out as the most frequently reported adverse events, encompassing a spectrum of clinical presentations ranging from mild symptoms, such as muscle pain and weakness, to severe muscle injury, such as rhabdomyolysis23. Establishing the causal relationship between muscle complaints and statin use is challenging, particularly in the case of subjective symptoms such as myalgia24,25. To mitigate the risk of SAMs over-diagnosis, we reviewed and extracted EMR data on symptom relief following statin withdrawal or symptom recurrence upon statin rechallenge. Data from 127,197 TPMI participants who ever treated with any statin agent (atorvastatin, fluvastatin, lovastatin, pitavastatin, pravastatin, rosuvastatin, or simvastatin) were analyzed for this study (with individuals previously suffering from muscular disorders excluded). Of these participants, 34,411 participants took more than one statin agent, and 7 were treated with 6 different statin agents (Supplementary Table 8). The switching between statins in these rare cases was attributed to reasons such as intolerance and inefficacy. Specifically, these seven patients experienced adverse events with certain statins, such as muscle toxicity, prompting switches to alternative statins. In some cases, the next statin was well-tolerated but failed to adequately control LDL levels, necessitating further changes. Ultimately, all seven patients were transitioned to non-statin therapies, including restricted diet control. Although statins shared similar structures, many participants who suffered from muscle toxicity caused by one statin could take another statin without any adverse events (Supplementary Table 8). Therefore, genetic associations with myopathy induced by different statins were analyzed separately (Fig. 5 and Supplementary Table 9).

Fig. 5: Influence of ABCG2/SLCO1B1/CYP2C9 in SAMs.
figure 5

Forest plot of the SAM risk in statin users with different ABCG2, SLCO1B1, or CYP2C9 phenotypes: (a) atorvastatin users who carried two ABCG2 LoF alleles vs non-carriers, (b) fluvastatin users who carried one ABCG2 LoF alleles vs non-carriers, (c) simvastatin users who carried two ABCG2 LoF alleles vs non-carriers, (d) atorvastatin users who carried at least one SLCO1B1 LoF alleles vs non-carriers, (e) simvastatin users who carried at least one SLCO1B1 LoF alleles vs non-carriers, and (f) fluvastatin users who carried at least one CYP2C9 LoF alleles vs non-carriers. The case number, OR, 95% CI, and P value were listed in the table. The ORs and 95% CIs were estimated using logistic regression adjusting for covariates (two-sided test). Data points represent odds ratios; error bars represent 95% confidence intervals. Significant associations are shown in red. CI = confident interval; IM = intermediate metabolizer; LoF = loss-of-function; NM = normal metabolizer; OR = odds ratio; PM = poor metabolizer. SAM = statin-associated muscle events, including myalgia, myositis, and rhabdomyolysis; sSAM = severe forms of statin-associated muscle events, specifically myositis and rhabdomyolysis.

Our study showed that people with poor function of ABCG2 had higher risk of myositis when taking atorvastatin (P  =  3.58×10-3, OR = 2.27, 95% CI = 1.31–3.94) or simvastatin (P  =  1.26×10-2, OR = 3.43, 95% CI = 1.3–9.05); whereas people with ABCG2 decreased function phenotype had higher risk of myalgia while taking fluvastatin (P  =  5.39×10-3, OR = 1.97, 95% CI = 1.22–3.18) (Fig. 5a–c and Supplementary Table 9). Patients with SLCO1B1 decreased or poor function phenotype (c.521 T > C) had increased risk in developing myalgia with atorvastatin (P  =  3.35×10-3, OR = 1.27, 95% CI = 1.08–1.48) and severe SAM with simvastatin (P  =  3.14×10-3, OR = 2.91, 95% CI = 1.43–5.91) (Fig. 5d, e and Supplementary Table 9). Furthermore, although the pharmacokinetic of fluvastatin was known to be affected by CYP2C9 phenotypes, the frequency of CYP2C9 LOF was not significantly higher in patients who experienced SAM after receiving fluvastatin in our cohort (P  =  8.63×10-1, OR = 0.93, 95% CI = 1.22–3.18) (Fig. 5f and Supplementary Table 9).

Despite the significant association in some of the ADEs, the frequencies were so low that the vast majority of the “high risk” individuals (98% or more) did not suffer from any ADEs.

Impact of CYP2C9 in NSAID-associated adverse events

NSAIDs are known for their ability to reduce pain, inflammation, and fever by inhibiting the production of prostaglandins. Although substantial evidence links CYP2C9 deficient phenotype to altered NSAID plasma concentrations, clinical evidence directly substantiating an increased risk of ADEs in individuals with reduced CYP2C9 metabolism of NSAIDs (such as celecoxib, flurbiprofen, ibuprofen, lornoxicam, meloxicam, piroxicam and tenoxicam) remains limited26. However, since the NSAID toxicity is dose- and duration-dependent27, a recommendation for NSAIDs dose adjustment and selection based on CYP2C9 genotype was issued by CPIC26. To assess the impact of CYP2C9 in NSAID-related ADEs, we analyzed the TPMI data to evaluate the association between CYP2C9 activity and NSAID-related upper GI and renal events (Supplementary Table 10). We found that those with the CYP2C9 PM phenotype were predisposed to NSAID-induced upper GI bleeding, although the sample size is relatively small (Fig. 6 and Supplementary Table 11). It was important to note that comorbidities (such as hypertension, diabetes, and cardiovascular disease) and concurrent drugs (e.g., diuretics, ACE inhibitors) contribute much more significantly to the adverse events of NSAIDs than CYP2C9 status, as observed in Supplementary Table 10.

Fig. 6: Influence of CYP2C9 in NSAID-associated adverse events.
figure 6

Forest plot of the ADE risk in statin users with different CYP2C9 phenotypes: (a) people who carried at least one CYP2C9 LoF alleles vs non-carriers, (b) CYP2C9 IM vs NM, and (c) CYP2C9 PM vs NM. The case number, OR, 95% CI, and P value were listed in the table. The ORs and 95% CIs were estimated using logistic regression adjusting for covariates (two-sided test). Data points represent odds ratios; error bars represent 95% confidence intervals. Significant associations are shown in red. CI = confident interval; GI = GI discomfort; IM = intermediate metabolizer; LoF = loss-of-function; NM = normal metabolizer; OR = odds ratio; PM = poor metabolizer.

Discussion

The benefits of pharmacogenetic testing in preventing severe adverse events are well documented2. Several randomized controlled trials and studies have highlighted the benefit of PGx-guided therapy, showcasing its potential in optimized drug selection and dosing that leads to improved efficacy and safety28,29,30,31. However, these small, mostly European studies do not address the applicability of PGx-guided therapy in non-European populations such as the Han Chinese, where literally everyone has PGx risk variants that affect his/her response to drugs developed with clinical trials conducted mostly with subjects of European ancestry. Well-recognized differences in drug response across populations, driven by genetic and clinical factors, have resulted in distinct optimal dosing recommendations for Asians and Europeans in current clinical practice. For instance, the US FDA notes a higher risk of myopathy in patients with decreased or poor function of SLCO1B1 when taking 80 mg of simvastatin6, and CPIC recommends a daily dosage of simvastatin below 20 mg for patients with decreased function of SLCO1B121. However, the post prescribed and effective dose of simvastatin in Taiwan ranges from 10-20mg32,33,34. These observations emphasize the need for expanding PGx studies in underrepresented populations to refine guidelines and improve precision medicine globally. Our findings, along with previous studies, confirm the prevalence of clinically actionable PGx variants in Asian populations. For instance, studies on Sri Lankan, Indian, South Korean, Thai, and Chinese populations highlight significant differences in the frequency of PGx variants compared to Europeans. Variants like NUDT15, associated with AZA-induced myelotoxicity, and CYP2C19, influencing clopidogrel metabolism, are notably more prevalent in Asians35,36,37,38,39.

Our retrospective study of the 4 gene-drug pairs in 486,956 participants of the TPMI, for whom we have both genetic and longitudinal clinical data, shows that PGx-guided therapy is not straightforward. First, while our findings validate the published results that PGx risk variants increase the risk of adverse events, and the increase is statistically significant, the relative risk is low or moderate. Second, the vast majority of those with PGx risk variants who take the drug in question do not suffer from the predicted side-effects. Conversely, a significant fraction of those without PGx risk variants suffer from adverse events. This underscores that PGx is not an absolute predictor of ADEs but a critical tool that complements other clinical factors, such as patient history, comorbidities, and drug-drug interactions, to guide treatment decisions. When used in conjunction with these factors, PGx information can help identify at-risk patients and tailor therapies more effectively. Third, many of the adverse events are reversible, non-life-threatening events, and can be managed easily. For example, statin-associated myalgia, characterized by muscle pain or weakness, occurs in up to 15% of treated patients. Most cases are mild and completely resolved upon discontinuation of the statin, with symptoms improving within an average of 2–3 months. In rare cases, rechallenging with a different statin may result in successful tolerance without recurrent symptoms25,40,41. Finally, as there are no equally effective drug alternatives available in many cases, avoiding adverse events by not taking a drug means that one is taking a drug of lower efficacy. For instance, clopidogrel is a widely used P2Y12 inhibitor in elderly patients with ACS or those undergoing percutaneous intervention (PCI). While alternatives like ticagrelor or prasugrel offer enhanced antiplatelet effects, they are associated with significantly higher risks of bleeding, making clopidogrel the safer and more practical option for patients older than age 70 or with higher bleeding risk42,43,44. This underscores the clinical challenge of balancing efficacy, safety, and patient-specific factors when managing pharmacogenetic risks.

Given our findings, one must proceed with caution when implementing PGx-guided therapy. Existing resources, such as the clinical practice recommendations from CPIC and US FDA, already provide valuable recommendations for risk mitigation and management strategies in some scenarios. However, further studies must be conducted to (1) identify additional (genetic and non-genetic) factors that cause ADEs that explain the baseline occurrence of such events in those without PGx risk variants; (2) explore protective factors that allow carriers of PGx risk variants to tolerate drugs without adverse events; and (3) develop and refine risk-management strategies for PGx risk variant carriers, particularly for scenarios where CPIC and US FDA have not yet offer comprehensive recommendations. Such efforts will ensure that individuals can safely benefit from the most effective therapies while minimizing risks.

Although our findings and conclusions are strong for the 4 gene-drug pairs studied, and the results can be extrapolated to other drugs, our study has several limitations. First, the TPMI obtains clinical data for each participant only from the hospital where they are enrolled. In Taiwan, due to the convenience of the National Health Insurance system and the close proximity of hospitals, patients often receive care from multiple hospitals. For instance, a patient might visit one hospital for diabetes management and another for renal disease. However, the TPMI dataset does not capture clinical data from other hospitals, which could lead to under-reporting and incomplete datasets. Second, some drugs, such as NSAIDs, may be available over the counter (OTC) in Taiwan. Since the TPMI dataset is based on prescription records obtained from hospitals, OTC drug use is not captured. This could lead to underestimation of the use for drugs that are commonly obtained without a prescription, potentially impacting the assessment of drug-related outcomes. Third, the genetic data are based on SNP-array data, which means that not all the PGx risk variants in each pharmacogene are represented. For example, some variants are embedded in repeat regions where no suitable probes can be designed on the array. In addition, variants of extremely low allele frequency (<0.1% minor allele frequency) cannot be typed accurately on the array and are therefore excluded from the design. Fourth, the clinical phenotypes are extracted from the clinical data based on chart-review and the variability in clinical note style and substance across hundreds of doctors from 33 hospitals create noise in the data. Some cases and controls are likely excluded due to this reason. Apart from genetic variants, the heterogeneity in drug response may also stem from other factors, including environmental and nutritional influences, disease severity, comorbidities, concomitant drugs, and individual patient lifestyles. To overcome these limitations, future studies will be conducted with comprehensive clinical data from the national health insurance database (that collects information from all hospitals and clinics the participants obtain their care), comprehensive genetic data from whole genome sequencing (WGS), and prospective study with standardized recording of clinical outcomes.

Phenoconversion, the alteration of an individual’s observed drug-response phenotype due to external or environmental factors such as concurrent drugs, comorbidities, or lifestyle, is another important consideration45,46,47. Although we adjusted for comorbidities and concurrent drugs, phenoconversion was not explicitly evaluated. This limitation could potentially confound the observed associations between PGx variants and clinical outcomes. For example, concurrent drugs that inhibit or induce drug-metabolizing enzymes could mask or amplify the effects of PGx variants, leading to misclassification of phenotypes. Future studies should incorporate phenoconversion explicitly by leveraging detailed drug histories and environmental data to better distinguish genetic from non-genetic influences on drug response. Addressing phenoconversion will help refine genotype-phenotype associations and improve the clinical utility of PGx-guided therapy. Finally, the current study does not include analysis of the risk variants in CYP2D6, the gene that encodes a pivotal enzyme in the metabolic pathways of nearly 20% of frequently prescribed drugs48. The challenge of accurately typing CYP2D6 is due to the high degree of polymorphism and complex nature with structural variations (SV) in the gene49. The lack of multiplex PCR and SV analysis algorithm make accurately determining CYP2D6 status from SNP array data impossible at this time. An optimal pipeline to overcome this challenge is under construction.

In conclusion, we report results from the largest retrospective study in non-Europeans on the distribution of PG risk variants and their impact for 4 widely accepted gene-drug pairs on drug-related adverse events and treatment responses. Our findings show that implementing PGx-guided therapy in large populations is not simple and should be done with caution. We speculate that these conclusions apply to many other gene-drug pairs and to non-European populations, highlighting the need for comprehensive, integrative strategies to enhance the safety and efficacy of precision medicine.

Methods

Data source

This study was conducted with genetic and clinical datasets from the TPMI cohort, with participants recruited from 16 medical centers (encompassing 33 hospitals) in Taiwan. Genetic data includes the whole genome genotyping data on custom-designed TPMI SNP arrays, TPM and TPM2 array, genome-wide imputation data, and HLA imputation data. TPM (686,463 SNPs) and TPM2 (743,227 SNPs) arrays are designed by TPMI and Thermo Fisher Scientific for the TPMI project, specifically for the Han Chinese population with superior coverage for GWAS-grid and previously published health related variants. The EMRs were provided by the hospital through which the participants were enrolled. They included outpatient, inpatient, and emergency room visiting records, drug prescription records, discharge summaries and operation notes, laboratory test results, and reports of pathology, surgery, imaging, and Mini-Mental State Examination. The 1498 WGS data from the Taiwan Biobank (TWB) were used for the imputation reference panel and for genotype validation50. The DeepVariant variant calling pipelines were used, and SHAPEIT5 and IMPUTE5 were applied for phasing and imputation51,52,53. For the imputation of G6PD, which is located on chromosome X, we utilized a reference panel derived from samples of TPM2 due to the absence of relevant data points in TPM1. The imputation process was conducted using Shapeit4 and Impute5. Male samples were imputed with the—haploid parameter in Impute5 to account for the haploid nature of the X chromosome in males, while female samples underwent standard diploid imputation. The training and validating dataset for Hibag encompassed 1359 HLA typing data sourced from multiple repositories, including the Adverse Drug Reaction Project, the Collaborative Study to Establish a Cell Bank and a Genetic Database on Non-Aboriginal Taiwanese, and the TWB. Utilizing the Hibag algorithm, classical HLA alleles such as HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQB1, and HLA-DPB1 were imputed. Each HLA allele underwent training with a 500-kb flanking region and utilized 500 classifiers to construct models, ensuring comprehensive coverage and robust predictive accuracy50. This study was approved by TPMI’s committees, and the data usage followed the approval from the ethical committees of the Academia Sinica (AS-IRB01-18079), the TWB (TWBR10806-04), and all participating hospitals: Taipei Veterans General Hospital (2020-08-014 A), National Taiwan University Hospital (201912110RINC), Tri-Service General Hospital (2-108-05-038), Chang Gung Memorial Hospital (201901731A3), Taipei Medical University Healthcare System (N202001037), Chung Shan Medical University Hospital (CS19035), Taichung Veterans General Hospital (SF19153A), Changhua Christian Hospital (190713), Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUHIRB-SV(II)−20190059), Hualien Tzu Chi Hospital (IRB108-123-A), Far Eastern Memorial Hospital (110073-F), Ditmanson Medical Foundation Chia-Yi Christian Hospital (IRB2021128), Taipei City Hospital (TCHIRB-10912016), Koo Foundation Sun Yat-Sen Cancer Center (20190823 A), Cathay General Hospital (CGH-P110041), andFu Jen Catholic University Hospital (FJUH109001).

Pharmacogenomic variants analysis

The pharmacogenomic variants analyzed in this study were primarily obtained through genotyping performed using custom-designed TPMI SNP arrays, TPMv1 (686,463 SNPs) and TPMv2 (743,227 SNPs). These arrays were tailored for the Han Chinese population to optimize coverage for GWAS studies and pharmacogenetically relevant markers. Imputation was then performed using the TWB WGS dataset as the reference panel. This process utilized advanced phasing and imputation tools, SHAPEIT5 and IMPUTE5, ensuring robust data quality and coverage. The drug-gene pairs were adopted from US FDA Table of Pharmacogenetic Associations6 and the Clinical Guideline Annotations table of CPIC guidelines from PharmGkB54 (accessed on 2024/02/27). The drugs with fewer than 10 prescription records were removed from the analysis. The curation of PGX variants and actionable PGx phenotype were based on the instructions from PharmGkB55 and PharmVar56. The PGx variants were first screened with WGS data to remove the ones with allele frequency lower than 0.1%. To further ensure the accuracy of key actionable PGx variants, we conducted a validation process on a representative subset of samples. This targeted validation utilized multiple technologies, including WGS, Sanger sequencing, and the Sequenom MassARRAY platform. This step was not performed for the entire cohort of 486,956 participants but focused on confirming the reliability of genotype calls and imputed variants for clinically actionable alleles. The variants with sensitivity and specificity higher than 99% were saved for further analysis. This multi-layered approach of genotyping, imputation, and targeted validation was designed to enhance data reliability while balancing the scale of the study. The final list of drug-gene pairs evaluated in this study is found in Supplemental Table 2, and the variants information in Supplemental Table 1.

Drug prescription and treatment outcome assessment

When analyzing the number of individuals encountered drugs with PGx warnings, 58 approved drugs with clear genetic-base clinical utility recommendations were selected based on their availability in the TPMI dataset. Individuals with only a single prescription record for each drug were excluded.

The treatment outcome assessment focused on 4 drug-gene pairs, azathioprine (NUDT15/TPMT), clopidogrel (CYP2C19), statins (ABCG2/CYP2C9/SLCO1B1), and NSAIDs (CYP2C9)6,21,57,58,59. Data from people taking AZA, clopidogrel, statins, and NSAID metabolized by CYP2C9 were analyzed to assess the influence of PGx variants and phenotypes on drug response. To assess the impact of pharmacogenomic (PGx) variants on drug response, we analyzed prescription records and treatment outcomes for four gene-drug pairs: azathioprine (NUDT15/TPMT), clopidogrel (CYP2C19), statins (ABCG2/CYP2C9/SLCO1B1), and NSAIDs (CYP2C9). Prescription records were examined to define the duration of drug exposure, and ADEs were identified by reviewing corresponding time intervals following drug discontinuation. Both structured and unstructured EMRs were utilized, including hospitalization records, surgical reports, imaging findings, laboratory results, and physician notes. Unstructured physician notes provided valuable insights into ADEs and reasons for drug discontinuation. Collaborating physicians and pharmacists reviewed these records to validate reported outcomes. Data cleaning and standardization processes were conducted by the TTPMI team, integrating clinical test results and structured data, such as ICD codes and laboratory values, to reduce variability across participating institutions.

Clopidogrel and CYP2C19

To study the influence of CYP2C19 on clopidogrel response, we included clopidogrel users aged ≥18 years who demonstrated good compliance and lacked severe allergic reactions. The primary endpoint was MACE, including CV death, non-fatal HF, non-fatal UA, acute MI, acute ischemic stroke or transient ischemic attack, or TLR requiring clinical interventions such as PCI or surgical bypass (extracted from image and operation reports). Compliance was assessed by reviewing descriptions extracted from outpatient medical records. Since the TPMI medical records contain unstructured physician notes, collaborating physicians and pharmacists reviewed these records to identify information regarding compliance. When clarification was needed, we consulted with the physicians to confirm the content.

Azathioprine and NUDT15/TPMT

The AZA study cohort comprised of AZA tolerant controls and patients who discontinued AZA due to massive hair loss, GI discomfort (nausea, vomiting, and diarrhea), allergic reactions, hepatitis (defined as ALS/AST > 3x ULN), leucopenia (defined as WBC < 3500/mm3), and thrombocytopenia (defined as platelet count <150,000 /uL), as documented by the responsible physicians and/or laboratory test results. Allergic reactions were defined as hypersensitivity events documented by the responsible physicians in the medical records, including symptoms such as rash, urticaria, or other similar reactions attributed to AZA use. Patients with pre-existing hematological malignancies or poor liver function were excluded. The study design is shown in Supplementary Fig. 2.

Statins and ABCG2/CYP2C9/SLCO1B1

The TPMI participants taking atorvastatin, fluvastatin, lovastatin, pitavastatin, pravastatin, rosuvastatin, or simvastatin without any muscular disorder history were included to assess the role of ABCG2, CYP2C9, and SLCO1B1 in the risk of SAMs (CPK elevation with muscle complaints, myalgia, myositis, and rhabdomyolysis).

NSAIDs and CYP2C9

For NSAID users, we assessed adverse renal events and upper GI discomfort in different CYP2C9 phenotype groups if they had prescriptions of NSAIDs mainly metabolized via CYP2C9 (celecoxib, flurbiprofen, ibuprofen, meloxicam, piroxicam, and tenoxicam) but without history of end stage kidney disease (ESRD, ICD10 code: N18.5, and N19), poor renal function (<15 ml/min/1.73 m2), receiving dialysis (NHIRD order code: 58001 C, 58002 C, 58009B, 58010B, 58011 C, 58012B, 58013 C, 58017 C, 58018 C, and 58026 C), renal replacement therapy (NHIRD order code: 76020B and N26028 and ICD10 code: T86.1 and Z94.0), alcoholism (ICD10 code: F10), esophageal varices (ICD10 code: I85 and I98.2), Mallory–Weiss syndrome (ICD10 code: K22.6), liver cirrhosis (ICD10 code: K70, K72–74, and K76), GI tract cancer (ICD10 code: C15, C16, and C17), and coagulation defects (ICD10 code: D65-D68). Comorbidities included diabetes mellitus (ICD10 code: E08-E13), hypercholesterolemia (ICD10 code: E78.0-E78.5), hypertension (I10), ischemic heart diseases (ICD10 code: I20-I25), HF (ICD10 code: I50), and cerebrovascular diseases (ICD10 code: I60-I69) were analyzed as well to identify their influence on treatment outcome60,61.

Statistical analysis

The frequencies of PGx variants from genotype and WGS data were calculated by PLINK and VCFtools. Fisher exact test and chi-square tests were conducted to compare differences between tolerant and non-tolerant individuals who carry the different PGx variants. Continuous variables were shown as means along with their standard deviations, whereas categorical variables were displayed as numerical counts and percentages. For quantitative data, comparison between 2 groups was performed by 2-tailed student t-test. Logistic regression analysis was utilized to determine the odds ratio (OR) and 95% confidence interval of the allele model. The multivariate logistic regression was adjusted for clinical factors, including sex, age, comorbidities, and concurrent drugs. A P value  ≤  0.05 was considered significant in this study. Statistical analyses were conducted using Python (version 3.11).