Population-specific polygenic risk scores for people of Han Chinese ancestry

Chen, Hung-Hsin; Chen, Chien-Hsiun; Hou, Ming-Chih; Fu, Yun-Ching; Li, Ling-Hui; Chou, Che-Yu; Yeh, Erh-Chan; Tsai, Ming-Fang; Chen, Chun-houh; Yang, Hsin-Chou; Huang, Yen-Tsung; Liu, Yi-Min; Wei, Chun-yu; Su, Jen-Ping; Lin, Wan-Jia; Wang, Elin H. F.; Chiang, Chi-Lu; Jiang, Jeng-Kai; Lee, I-Hui; Liang, Kung-Hao; Chen, Wei-Sheng; Tsai, Hung-Cheng; Lin, Shih-Yao; Chang, Fu-Pang; Ho, Hsiang-Ling; Yeh, Yi-Chen; Tseng, Wei-Cheng; Lin, Ming-Hwai; Chang, Hsiao-Ting; Tseng, Ling-Ming; Liang, Wen-Yih; Chen, Paul Chih-Hsueh; Hsieh, Yu-Cheng; Chen, Yi-Ming; Hsiao, Tzu-Hung; Lin, Ching-Heng; Chen, Yen-Ju; Chen, I-Chieh; Mao, Chien-Lin; Chang, Shu-Jung; Chang, Yen-Lin; Liao, Yi-Ju; Lai, Chih-Hung; Lee, Wei-Ju; Tung, Hsin; Yen, Ting-Ting; Yen, Hsin-Chien; Chen, Ming-Yao; Lin, Ying-Chin; Kao, Yung-Ta; Kao, Bi-Zhen; Lee, Jing-Er; Chung, Chi-Li; Liu, Ju-Chi; Chan, Paul; Lin, Chang-Hsien; Chen, Chia-Hsin; Wu, I-Chen; Lin, Lung-Chang; Wang, Jiunn-Wei; Shih, Shen-liang; Hsieh, Sun-Wung; Hung, Chih-Hsing; Li, Wei-Ming; Yang, Chih-Jen; Yang, Cheng-Shin; Weng, Ru-Hui; Chen, Yu-Chi; Chang, Chun-Ping; Wu, Tai-Hsun; Lin, Yu-Chang; Sheen, Yi-Jing; Wang, Shi-Heng; Chen, Sye-Pu; Raben, Timothy; Widen, Erik; Hsu, Stephen; Hsieh, Feng-Jen; Ho, Dong-Ru; Huang, Yu-Huei; Yang, Chung-Han; Huang, Yu-Shu; Chen, Yen-Fu; Wu, Hsien-Ming; Tsai, Ping-Han; Huang, Kuan-Gen; Chien, Chih-Yen; Ho, Yi-Lwun; Wu, Ming-Shiang; Kao, Jia-Horng; Liu, Yen-Bin; Juang, Jyh-Ming Jimmy; Lin, Mao-Hsin; Lin, Yen-Hung; Lee, Ji-Yuh; Lu, Hsueh-Ju; Lu, Chieh-Hua; Feng, An-Chieh; Liu, Jhih-Syuan; Chiang, Chien-Ping; Chu, Nain-Feng; Lin, Jung-Chun; Yeh, Yi-Wei; Meng, En; Huang, Chih-Yang; Li, Chi-Cheng; Wang, Tso-Fu; Su, Kuei-Ying; Wang, Jia-Kang; Chen, Mei-Hsiu; Chen, Hua-Fen; Ma, Gwo-Chin; Chang, Ting-Yu; Chiang, Fu-Tien; Chang, Hsing-Jung; Kao, Kuo-Jang; Hung, Chen-Fang; Tsai, Ching-Yao; Chen, Po-Yueh; Tsui, Kochung; Chen, Yuan-Tsong; Kwok, Pui-Yan; Sheu, Wayne Huey-Herng; Yang, Shun-Fa; Liou, Jyh-Ming; Wang, Jaw-Yuan; Chiou, Jeng-Fong; Wu, Jer-Yuarn; Fann, Cathy S. J.

doi:10.1038/s41586-025-09350-y

Download PDF

Article
Open access
Published: 15 October 2025

Population-specific polygenic risk scores for people of Han Chinese ancestry

Hung-Hsin Chen ORCID: orcid.org/0000-0002-1921-2797^1,2^na1,
Chien-Hsiun Chen ORCID: orcid.org/0000-0003-2140-033X¹^na1,
Ming-Chih Hou³^na1,
Yun-Ching Fu^4,5,6^na1,
Ling-Hui Li¹,
Che-Yu Chou¹,
Erh-Chan Yeh ORCID: orcid.org/0009-0005-6501-407X¹,
Ming-Fang Tsai¹,
Chun-houh Chen ORCID: orcid.org/0000-0003-0899-7477⁷,
Hsin-Chou Yang ORCID: orcid.org/0000-0001-6853-7881^7,8,
Yen-Tsung Huang ORCID: orcid.org/0000-0001-7657-0040^7,9,10,
Yi-Min Liu¹,
Chun-yu Wei ORCID: orcid.org/0009-0002-1664-3396^1,11,
Jen-Ping Su¹,
Wan-Jia Lin ORCID: orcid.org/0009-0008-8343-6577¹,
Elin H. F. Wang¹,
Chi-Lu Chiang^12,13,
Jeng-Kai Jiang^13,14,
I-Hui Lee ORCID: orcid.org/0000-0002-5344-6685^13,15,16,
Kung-Hao Liang^17,18,19,
Wei-Sheng Chen^20,21,
Hung-Cheng Tsai^20,21,
Shih-Yao Lin^13,22,
Fu-Pang Chang²²,
Hsiang-Ling Ho^22,23,
Yi-Chen Yeh^13,22,
Wei-Cheng Tseng^24,25,
Ming-Hwai Lin ORCID: orcid.org/0000-0001-6825-6692²⁶,
Hsiao-Ting Chang ORCID: orcid.org/0000-0003-0421-8499^25,26,
Ling-Ming Tseng^13,27,28,
Wen-Yih Liang²²,
Paul Chih-Hsueh Chen²²,
Yu-Cheng Hsieh^29,30,31,
Yi-Ming Chen^29,30,
Tzu-Hung Hsiao²⁹,
Ching-Heng Lin²⁹,
Yen-Ju Chen²⁹,
I-Chieh Chen ORCID: orcid.org/0000-0002-8345-5304²⁹,
Chien-Lin Mao²⁹,
Shu-Jung Chang²⁹,
Yen-Lin Chang ORCID: orcid.org/0000-0003-2076-0990³²,
Yi-Ju Liao³²,
Chih-Hung Lai ORCID: orcid.org/0000-0003-1409-4979³³,
Wei-Ju Lee^30,34,
Hsin Tung^30,34,
Ting-Ting Yen³⁵,
Hsin-Chien Yen³⁶,
Ming-Yao Chen^37,38,39,
Ying-Chin Lin^40,41,42,
Yung-Ta Kao^43,44,45,
Bi-Zhen Kao³⁹,
Jing-Er Lee⁴⁶,
Chi-Li Chung ORCID: orcid.org/0000-0003-0656-976X^47,48,
Ju-Chi Liu^43,45,49,
Paul Chan⁵⁰,
Chang-Hsien Lin⁴¹,
Chia-Hsin Chen^51,52,
I-Chen Wu ORCID: orcid.org/0000-0002-6260-0402^53,54,
Lung-Chang Lin^55,56,
Jiunn-Wei Wang^53,57,58,
Shen-liang Shih ORCID: orcid.org/0000-0001-8488-5931^59,60,
Sun-Wung Hsieh^61,62,63,
Chih-Hsing Hung^56,64,65,
Wei-Ming Li^66,67,68,
Chih-Jen Yang^69,70,
Cheng-Shin Yang ORCID: orcid.org/0009-0005-5232-1310¹,
Ru-Hui Weng¹,
Yu-Chi Chen¹,
Chun-Ping Chang¹,
Tai-Hsun Wu¹,
Yu-Chang Lin¹,
Yi-Jing Sheen^30,71,72,
Shi-Heng Wang ORCID: orcid.org/0000-0002-8466-2698⁷³,
Sye-Pu Chen¹,
Timothy Raben ORCID: orcid.org/0000-0003-2681-4179⁷⁴,
Erik Widen^74,75,
Stephen Hsu^74,75,
Feng-Jen Hsieh^1,76,
Dong-Ru Ho^77,78,79,
Yu-Huei Huang^80,81,
Chung-Han Yang ORCID: orcid.org/0000-0001-8833-6155⁸²,
Yu-Shu Huang^83,84,
Yen-Fu Chen⁸²,
Hsien-Ming Wu⁸⁵,
Ping-Han Tsai ORCID: orcid.org/0000-0002-8134-9022^82,86,
Kuan-Gen Huang⁸⁵,
Chih-Yen Chien^87,88,
Yi-Lwun Ho^89,90,
Ming-Shiang Wu ORCID: orcid.org/0000-0002-1940-6428^89,90,
Jia-Horng Kao ORCID: orcid.org/0000-0002-2442-7952^89,91,92,
Yen-Bin Liu^89,90,
Jyh-Ming Jimmy Juang ORCID: orcid.org/0000-0003-4767-7636^90,93,
Mao-Hsin Lin^89,90,
Yen-Hung Lin^89,90,94,
Ji-Yuh Lee⁹⁵,
Hsueh-Ju Lu^96,97,
Chieh-Hua Lu⁹⁸,
An-Chieh Feng⁹⁹,
Jhih-Syuan Liu⁹⁸,
Chien-Ping Chiang^100,101,
Nain-Feng Chu⁹⁸,
Jung-Chun Lin¹⁰²,
Yi-Wei Yeh¹⁰³,
En Meng¹⁰⁴,
Chih-Yang Huang ORCID: orcid.org/0000-0003-2347-0411^105,106,
Chi-Cheng Li^107,108,109,
Tso-Fu Wang^109,110,111,
Kuei-Ying Su ORCID: orcid.org/0000-0002-7927-2310^108,112,
Jia-Kang Wang^113,114,115,
Mei-Hsiu Chen^114,116,117,
Hua-Fen Chen^118,119,120,
Gwo-Chin Ma¹²¹,
Ting-Yu Chang¹²¹,
Fu-Tien Chiang^119,122,
Hsing-Jung Chang^123,124,
Kuo-Jang Kao ORCID: orcid.org/0000-0001-6501-7987¹²⁵,
Chen-Fang Hung¹²⁵,
Ching-Yao Tsai^126,127,128,
Po-Yueh Chen^129,130,
Kochung Tsui^131,132,133,
Yuan-Tsong Chen¹,
Pui-Yan Kwok ORCID: orcid.org/0000-0002-5087-3059^{1,134,135,136},
Wayne Huey-Herng Sheu^71,137,138^na2,
Shun-Fa Yang ORCID: orcid.org/0000-0002-0365-7927^139,140^na2,
Jyh-Ming Liou ORCID: orcid.org/0000-0002-7945-5408^90,141,142^na2,
Jaw-Yuan Wang ORCID: orcid.org/0000-0002-7705-2621^58,143^na2,
Jeng-Fong Chiou ORCID: orcid.org/0000-0002-5274-9131^144,145^na2,
Jer-Yuarn Wu¹^na2 &
…
Cathy S. J. Fann ORCID: orcid.org/0000-0001-9025-2276¹^na2

Nature volume 648, pages 128–137 (2025) Cite this article

55k Accesses
16 Citations
77 Altmetric
Metrics details

Subjects

Abstract

Predicting complex disease risks on the basis of individual genomic profiles is an advancing field in human genetics^1,2. However, most genetic studies have focused on populations of European ancestry, creating a global imbalance in precision medicine and underscoring the need for genomic research in non-European groups^3,4. The Taiwan Precision Medicine Initiative recruited more than half a million Taiwanese residents, providing a large dataset of genetic profiles and electronic medical record data for people with Han Chinese ancestry. Using extensive phenotypic data, we conducted comprehensive genomic analyses across the medical phenome with individuals genetically similar to Han Chinese reference populations. These analyses identified population-specific genetic risk variants and new findings for various complex traits. We developed polygenic risk scores, demonstrating strong predictive performance for conditions such as cardiometabolic diseases, autoimmune disorders, cancers and infectious diseases. We observed consistent findings in an independent dataset, Taiwan Biobank, and among people of East Asian ancestry in the UK Biobank and the All of Us Project. The identified genetic risks accounted for up to 10.3% of the overall health variation in the Taiwan Precision Medicine Initiative cohort. Our approach of characterizing the phenome-wide genomic landscape, developing population-specific risk-prediction models, assessing their performance and identifying the genetic effect on health serves as a model for similar studies in other diverse study populations.

The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies

Article Open access 15 October 2025

Population-scale genomic medicine with the Hong Kong Genome Project

Article Open access 15 May 2026

A computational framework for defining and validating reproducible phenotyping algorithms of 313 diseases in the UK Biobank

Article Open access 09 July 2025

Main

A principal promise of modern genetics is the ability to predict complex disease risk on the basis of a person’s genetic profile. If successful, health management strategies can be developed to mitigate risk (disease prevention) and to optimize care (early diagnosis and effective treatment). Large-scale studies by the UK Biobank (UKB) and the Electronic Medical Records and Genomics Network show that risk prediction on the basis of genetics holds promise, and several countries are exploring ways to implement risk-based management in clinical practice^1,2. Using polygenic risk scores (PRS) to predict disease risk and identify individuals at high risk is an emerging ‘precision medicine’ approach to leverage genetic findings in clinical practice. However, a substantial limitation is that current PRS models are predominantly based on genome-wide association studies (GWAS) with participants of European ancestry (EUR)^4,5, often leading to reduced predictive performance in groups of other ancestry^6,7. To fully realize the potential of precision medicine for diverse global populations, population-specific phenome-wide genomic discovery must be performed at scale and clinically applicable polygenic risk models must be optimized in and across populations. To fill this research gap in a population of East Asian ancestry (EAS), we characterized the complex genetic architecture of the population of Han Chinese ancestry phenome wide, developed population-specific PRS and assessed the external validity of the models across populations with varying degrees of genetic similarity.

Populations of EAS represent nearly a quarter of the global population, but they account for only 3.95% of the participants in previous GWAS³. Although several biobanks have been built to recruit subjects from East Asia, they have moderate sample size (72,000–212,000), and many focus on specific conditions^8,9,10,11,12. By contrast, biobanks with predominantly EUR participants^13,14,15,16 have significantly larger sample sizes (224,000–635,000) and access to more comprehensive clinical data. The moderate sample size and limited phenotypes in existing EAS biobanks hamper discovery of unique genetic effects and preclude the development of robust and clinically useful PRS models for EAS.

We assembled a large non-EUR cohort, the Taiwan Precision Medicine Initiative (TPMI), and genotyped more than half a million participants across 16 medical centres in Taiwan from 2019 to 2023. All the participants, who are overwhelmingly of Han Chinese ancestry, contributed DNA samples for genetic profiling with a custom-designed genotyping array and consented to provide their longitudinal electronic medical records (EMRs) from 5 years before enrolment and into the future. The EMR dataset includes rich and accurate health-related phenotypes, including medical diagnoses and biochemical examinations¹⁷. Here we present the results of comprehensive genomic analyses with extensive genetic and medical data derived from the TPMI cohort, including phenome-wide GWAS and PRS model development. We identified numerous population-specific risk variants/genes, observed evidence of genetic pleiotropy and pinpointed clusters of traits that shared similar genetic aetiology. Then, we developed and validated PRS prediction models for numerous conditions against external datasets including those from the Taiwan Biobank (TWB), the UKB and the All of Us Project. Our results show the benefits of leveraging a large cohort from an understudied population to identify unique genetic underpinnings of the human phenome, interpret causal effects by means of fine mapping and colocalization and improve the performance of population-specific PRS models, which together better illuminate the clinical implications of genetic risk.

Diseases and quantitative traits in TPMI

We performed comprehensive genomic analyses, including GWAS, heritability estimation and PRS model building and evaluation, across a wide range of diseases and quantitative traits using 463,447 individuals genetically similar to Han Chinese reference populations from TPMI. We examined 695 dichotomized phenotypes (phecodes; case n > 2,000) and 24 quantitative traits (sample size > 100,000), spanning numerous disease categories (defined by phecode groupings^18,19), such as neoplasms, metabolic disorders, circulatory conditions, autoimmune diseases and more (Fig. 1). The phecodes, derived from International Classification of Diseases codes^18,19, alongside quantitative traits such as blood pressure, body mass index (BMI), liver enzymes and lipid levels, provide a robust dataset for exploring genetic contributions to human health (Supplementary Tables 1 and 2). The log-transformed case proportion identified from EMRs showed a moderate but significant correlation with the log-transformed 5-year disease prevalence from the National Health Insurance Research Database (NHIRD) in Taiwan²⁰ (r = 0.656, P = 2.69 × 10⁻⁸⁴) (Fig. 1a and Extended Data Fig. 1), indicating that the TPMI’s hospital-based design may not fully capture mild and common illnesses, which are primarily observed in local and primary care clinics. Figure 1b displays the sample sizes for 24 quantitative traits in the TPMI and highlights sample size variation across traits, a key measure affecting the power and precision of association analyses in the cohort.

**Fig. 1: Scatter plot of the case proportion for phecodes and bar chart of sample size for quantitative traits in TPMI dataset.**

GWAS, fine mapping and results

Our GWAS identified at least one significant locus (P < 5 × 10⁻⁸) for 265 phecodes of the 695 tested and all 24 quantitative traits. Highlighting the robustness of the TPMI data, we observed a high replication rate of reported disease loci from EAS GWAS on the GWAS catalogue (actual/expected ratio (AER) = 78.17%, considering the statistical power with the published tool, PGRM²¹), particularly for endocrine and metabolic/hematopoietic diseases (AER = 88.68% and 84.62%, respectively; Extended Data Fig. 2 and Supplementary Table 3). Lower replication rates for respiratory disease (AER = 23.53%) may reflect limited case numbers; untyped genetic variants, such as rare variants, copy number variation and structural variants; or recruitment bias such as age distribution.

We applied the sum-of-single-effects model for fine mapping to identify the independent variant–trait associations and reported the genetic variant with highest posterior inclusion probability of identified credible sets as well as the single lead variant for the failed fine-mapping regions and major histocompatibility complex region (MHC region; chromosome 6: 25,391,792–33,424,245). Our analyses showed a total of 2,656 fine-mapping-identified independent association signals, including 1,309 from phecodes GWAS and 1,347 from quantitative traits. Notably, 95 new associations, defined as having no previously reported results within 1 Mb in the NHGRI-EBI GWAS Catalogue²¹ of relevant GWAS, were identified across 50 phecodes and seven quantitative traits. In addition, we identified 217 new hits from previously reported regions, defined as having low linkage disequilibrium (r² < 0.1) with any variant observed in the NHGRI-EBI GWAS Catalogue within 1 Mb for the same phenotype (Supplementary Tables 4 and 5). After applying the multiple testing correction, 1,502 fine-mapped associations passed a Bonferroni-adjusted threshold (5 × 10⁻⁸/(695 + 24) = 6.95 × 10⁻¹¹), as well as 21 previously unreported variants and 115 new hits.

Of the 95 new genetic associations, 30 variants are rare (minor allele frequency (MAF) < 0.05) in populations with other ancestry (African or African American (AFR), Admixed American (AMR), South Asian (SAS) and non-Finnish European) in the Genome Aggregation Database (gnomAD)) and 33 variants less than 0.01 in EUR, the most extensively studied population, which explains why they were not reported in previous GWAS. For example, single nucleotide polymorphism (SNP) rs17089782, a missense variant in PIBF1 (p.R405Q) on chromosome 13 is significantly associated with thyroid cancer (P = 2.8 × 10⁻⁹) in the TPMI cohort. This SNP has a MAF of 5.65% in TPMI but 0.01% in EUR, which may explain why this association was only detectable in TPMI. However, PIBF1 is essential for immune regulation, especially during pregnancy, and is relevant to autoimmune diseases and cancer²². Another variant identified in our analysis of BMI (rs761018157; P = 4.8 × 10⁻⁹, MAF in TPMI = 4.34%, MAF in EUR < 0.01%) maps to PHOX2B. This gene, highly expressed in the nervous system, had previously been linked to obesity hypoventilation syndrome in a small study (n = 30)²³ and associated with bone mineral density²⁴. In addition, when we compare the effect size in TPMI and UKB for the rest of the new findings, 25 exhibit a significant different effect size (P < 0.05). For instance, a TPMI-identified platelet-count-associated variant, rs12955741, located in the intergenic region between TGIF1 and DLGAP1, exhibits a different effect size (β) compared to that in UKB (β_TPMI = 0.044, β_UKB = −0.005, P = 1.7 × 10⁻⁹). Moreover, the high hepatitis B virus carrier rate in Taiwan²⁵ contrasts sharply with its rarity in European cohorts, enabling TPMI to identify new loci associated with viral hepatitis B (case number in TPMI = 23,618 versus UKB = 132). Among the 26 independent loci identified in our analysis of hepatitis B, 19 fine-mapped loci are new (Extended Data Fig. 3). Notably, 18 of these 19 loci were found to be associated with liver function or diseases (Supplementary Table 5). These new associations highlight the uniqueness of certain disease loci in the TPMI cohort, presenting opportunities for developing population-specific therapeutic interventions and advancing precision medicine.

All identified independent associations are summarized in Fig. 2. The identification of the MHC region as a significant hotspot on chromosome 6 emphasizes its extensive involvement in immune-related diseases across several categories. Similarly, the short arm of chromosome 11 (INS-KCNQ1 region) also affect various traits, including metabolic, endocrine and genitourinary diseases. These hotspots of trait-relevant variants implied the shared genetic mechanism among diseases and potential of pleiotropic effects.

**Fig. 2: Phenome-wide independent variant–trait associations.**

Heritability and colocalization

Linkage disequilibrium score regression analysis (LDSC)²⁶ showed strong liability-scaled SNP heritability (h²) for conditions such as alcoholism (h² = 0.213), retention of urine (h² = 0.163) and open-angle glaucoma (h² = 0.160). Among quantitative traits, body height (h² = 0.323), BMI (h² = 0.218) and high-density lipoprotein cholesterol (h² = 0.191) exhibited the highest heritability estimates (Supplementary Table 6 and Supplementary Information), highlighting the significant role of genetics in these traits. These results have far-reaching implications for precision medicine, as higher heritability signals indicate the potential for more accurate genetic risk-prediction models that could improve personalized disease risk assessments.

We then partitioned heritability at the gene level and identified 329 unique genes contributing significantly to phenotypic variation (h² > 0.1% and Z-score > 1.64). Of these, 45 affected more than one phecode and/or quantitative category, including key genes such as APOE, APOC1, TOMM40, ABCG2 and KCNQ1 (Fig. 3 and Supplementary Table 7). We also conducted a colocalization analysis to elucidate the potential molecular function of identified GWAS signals with three expression quantitative traits locus (eQTL) datasets, including the Genotype-Tissue Expression Project (GTEx)²⁷, Multi-ancestry Analysis of Gene Expression (MAGE)²⁸ and the Japan COVID-19 Task Force (JCTF)²⁹ (Fig. 3 and Supplementary Table 8). Our results identified 391 unique genes that potentially mediate the outcome through their expression level (posterior probability > 0.9), including GBAP1, which colocalized with five different traits (uric acid, serum creatinine, hematocrit, hypertension and gout). Among the colocalized genes, 75 of them can be identified only in the multi-ancestry lymphoblastoid cell lines eQTL (MAGE; 20 genes) and/or Japanese whole blood eQTL (JCTF; 59 genes). Our findings demonstrate the effect of these genes (such as APOE, ABCG2 and KCNQ1) on several traits and disorders. By elucidating shared genetic effects, these results offer opportunities to develop precision medicine approaches that address comorbidities, such as treating hyperlipidaemia and reducing dementia risk through a single intervention targeting APOE, which influences both lipid metabolism and Alzheimer’s disease risk. This gene-level understanding emphasizes the potential to optimize therapeutic strategies by leveraging genetic pleiotropy in disease management.

**Fig. 3: Gene-level heritability and colocalization with gene expression.**

Genetic correlation and clusters

Pairwise genetic correlation and clustering analyses showed three main phenotype clusters: cardiometabolic traits, autoimmune and infectious diseases, and kidney-related traits (Fig. 4 and Extended Data Fig. 4). The cardiometabolic cluster, which includes type 2 diabetes, hypertension and BMI, reinforces the interconnected phenotypic and genetic architectures of cardiovascular and metabolic diseases. The cluster of autoimmune and infectious diseases, which includes viral hepatitis B, psoriasis and systemic lupus erythematosus, illuminates shared immune system pathways and potential gene–pathogen interaction. The kidney-related cluster involved gout, chronic kidney disease, calculus of kidney and ureter, ankylosing spondylitis and measures of urea nitrogen, creatinine and uric acid. The shared genetic architecture provides opportunities to leverage the genetic risk of correlated traits while developing the PRS model.

**Fig. 4: Genetic correlation among three identified trait clusters.**

Cross-population comparison

Cross-population comparisons³⁰ with EUR GWAS from UKB showed varying degrees of transethnic genetic-effect correlation (ρ_ge), with strong, statistically significant correlations for traits like cholelithiasis (ρ_ge > 0.999), type 2 diabetes (ρ_ge = 0.829) and ischaemic heart disease (ρ_ge = 0.756), but moderate correlations for gout (ρ_ge = 0.616) and psoriasis (ρ_ge = 0.418) (Supplementary Table 6). The moderate correlations indicate the differentiated genetic mechanism and disease distribution across populations (gout case n = 24,411 in TPMI and 3,179 in UKB; psoriasis case n = 4,166 in TPMI and 2,197 in UKB). Therefore, these findings demonstrate the importance of population-specific genetic studies, as differences in genetic architectures between populations can significantly affect the accuracy of PRS models.

PRS development

Building on these insights, we developed and validated PRS models that demonstrated strong predictive performance for a wide range of diseases. Although we used five PRS tools, including LDpred2³¹, Lassosum2³², PRS-CS³³, SBayesR³⁴ and MegaPRS³⁵ (Supplementary Tables 9–13), we found that LDpred2 outperformed the others for most traits (Extended Data Fig. 5). Therefore, we took the results of LDpred2 for further comparisons. Of the 265 PRS models for phecodes, area under the receiver operating characteristic curve (AUC) values exceeded 0.55 for 105 dichotomized phecodes with a significant P value (P < 0.05). Additionally, the explained variance of models for 24 quantitative traits ranged from 0.028 (aspartate aminotransferase) to 0.227 (height). (Supplementary Table 9 and Extended Data Fig. 6). The most predictive PRS models included highly heritable traits such as ankylosing spondylitis (AUC = 0.812 ± 0.016), psoriasis (0.709 ± 0.016), atrial fibrillation (0.702 ± 0.014), prostate cancer (0.696 ± 0.018), systemic lupus erythematosus (0.696 ± 0.015), rheumatoid arthritis (0.646 ± 0.011), type 2 diabetes (0.640 ± 0.005), female breast cancer (0.611 ± 0.010) and hypertension (0.610 ± 0.004). Interestingly, the PRS for hepatitis B also demonstrated high genetic predictability (0.654 ± 0.008). Because h² represents the upper bound of variance that can be explained by PRS, we examined the proportion of heritability captured by our models (r²/h²). A total of 36 traits, including prostate cancer (r²/h² = 0.054/0.070), type 2 diabetes (0.066/0.126) and high-density lipoprotein cholesterol (0.136/0.191), reached more than 50% of their SNP heritability, indicating that PRS can achieve near-optimal predictive accuracy for highly heritable traits. However, for complex diseases influenced by both genetic and environmental factors, PRS performance is inherently constrained by the fraction of heritability attributable to common variants. These findings reinforce the importance of SNP heritability as a reference point for evaluating PRS utility and highlight the need for larger, ancestrally diverse datasets to further enhance genetic prediction models (Extended Data Fig. 6).

Leveraging the identified clusters, we performed a multitrait PRS training, PRSmix+³⁶, for the traits in each cluster (Fig. 5 and Supplementary Table 14). Notably, multitrait PRS models improved prediction accuracy for the cardiometabolic disease cluster with a 0.040 increase in AUC (from 0.608 to 0.648) and a 1.770-fold improvement in phenotypic variance explained (r²). The performances of autoimmune and kidney-related disease clusters were also enhanced, with average AUC improvements of 0.018 and 0.009, respectively (from 0.641 to 0.659 and 0.601 to 0.610, respectively) and 1.351-fold and 1.349-fold improvements in r². The significant enhancement of multitrait PRS prediction (comparing r² of LDpred2 and PRSmix+ with a paired t-test, P = 1.07 × 10⁻¹³) highlights the potential of leveraging shared genetic architecture to enhance disease risk prediction. Figure 5 demonstrates the performance of single-trait and multitrait PRS across three disease clusters, as well as the differing effectiveness of PRS in predicting genetic risk across various disease categories.

**Fig. 5: PRS performance for the three identified trait clusters.**

PRS external validation and comparison

To evaluate the robustness and generalizability of our PRS models, we performed an external validation of the models (hypertension, type 2 diabetes, viral hepatitis B, gout, calculus of kidney from PRSmix+ and others from LDpred2) in TWB, unrelated individuals genetically similar to Han Chinese reference populations, n = 88,628), UKB (self-reported EAS, n = 9,893) and All of Us (genetically inferred EAS, n = 6,895). We found that the prediction accuracy, AUC, of our models ranged from 0.548 (glaucoma) to 0.712 (prostate cancer) in TWB, 0.557 (female breast cancer) to 0.634 (hypertension) in UKB and 0.520 (migraine) to 0.709 (gout) in All of Us (Extended Data Fig. 7). Although the TWB questionnaire did not contain specific details on hepatitis B status, we used antihepatitis B core total antibodies (Anti-HBc) as an indicator of infection or past infection and hepatitis B surface-antigen (HBsAg) as a marker of acute/chronic infection. Intriguingly, the AUCs for the TPMI-derived model of hepatitis B were 0.674 ± 0.003 for HBsAg and 0.530 ± 0.002 for Anti-HBc in TWB. These results demonstrate the high predictive value of the PRS for hepatitis B for predicting symptoms and severity of the disease.

TPMI-derived PRS models perform better than the UKB EUR-derived models when applied to EAS for viral hepatitis B, type 2 diabetes, hypertension, gout and migraine. (Extended Data Fig. 7) For the other traits, TPMI-derived models consistently outperform the UKB ones, although the confidence intervals overlap. However, the overlapping confidence intervals in UKB and All of Us may be due to their limited sample size of EAS. These results indicate that population-specific PRS models allow for more accurate risk stratification and enable personalized healthcare interventions for EAS. Additionally, we assessed the performance of TPMI-derived PRS and TPMI-included cross-population PRS across various ancestry groups, including populations of EUR, AFR, AMR and SAS ancestry from the UKB and All of Us cohorts (Extended Data Fig. 8). Performance varied by diseases, but consistent results were observed for female breast cancer and glaucoma across populations, and TPMI-included cross-population PRS slightly but not significantly improved in the populations other than EAS and EUR.

Genetic risks on overall health measures

Although overall health is hard to define with a few metrics, herein, we used the count of clinical visits and duration of hospitalization to roughly describe individuals’ overall health. We found that 131 of the top-performing PRS models (LDpred2 models with AUC > 0.55 and all PRSmix+ models for phecodes and all models for quantitative traits) are significantly associated with overall health indices, explaining 8.47% of the variation in clinical visit frequency (P = 2.69 × 10⁻¹⁴) and 10.29% of the variation in hospitalization duration (P = 5.62 × 10⁻²⁷; Extended Data Table 1 and Supplementary Table 15) in the comparison between top and bottom 5% groups after adjusting for sex, age and recruiting hospital. Among the identified clusters, the cardiometabolic disease cluster contributed the most to the indices, accounting for 1.32% of clinical visits (P = 0.02) and 3.55% of hospitalizations (P = 7.10 × 10⁻⁹). This may reflect the high prevalence of cardiometabolic diseases in the hospital-based TPMI cohort. In short, quantification of the effect of PRS for various diseases and traits on human health opens up opportunities for developing precision health management strategies.

Discussion

This study represents a large-scale GWAS in the population of Han Chinese ancestry, using data of around 500,000 individuals recruited from 16 medical centres across Taiwan. We investigated the genetic architecture of 695 dichotomized phecodes and 24 quantitative traits, identifying 2,656 independent variant–trait associations and showed that population-specific genetic risk-prediction PRS models for a wide range of diseases performed well in the population. Indeed, for the traits with sufficient sample size in the cohort, PRS performance rivals those developed for EUR using UKB data. These findings show that population-specific PRS models can be developed successfully for populations of non-EUR, and our project serves as a model for large-scale genetic studies in other populations.

Recent large-scale projects that emphasize ancestral diversity in human genetic studies have discovered new findings with the inclusion of subjects of non-EUR. MVP conducted multi-ancestry GWAS on 635,000 participants, identifying more than 2,000 signals unique to populations with non-EUR¹⁶. With the TPMI dataset, we performed larger GWAS in subjects genetically similar to Han Chinese reference populations for several traits than published studies. For instance, the previous largest meta-analysis for type 2 diabetes included 20,573 cases who were of Han Chinese ancestry³⁷. By contrast, our GWAS included 59,289 cases of type 2 diabetes, almost tripling the number of cases ever tested and identified five unreported type 2 diabetes associated loci from known regions, demonstrating the power of TPMI sample size. Identification of new and population-specific risk variants may lead to further understanding of their molecular mechanism and underline the need for population-specific weightings in PRS models. Moreover, population-specific findings also better explain the performance of population-specific PRS models in the population in question. In short, our population-specific genomic profiles for comprehensive phenotypes provide a solid foundation for PRS development.

Our understanding of how the genetic factors influencing hepatitis B, an endemic infectious disease in Taiwan with an estimated hepatitis B virus carrier rate of 9.78% among the unvaccinated cohort (born before 1984)²⁵, also benefited from the large dataset. With 23,618 cases, a significant increase from previous studies of only a few thousand cases^38,39,40, we identified 26 fine-mapped signals, including 19 new loci, and showed a significant negative correlation between hepatitis B and other autoimmune diseases, such as Sicca syndrome, psoriasis and systemic lupus erythematosus. Our well-performed and validated PRS model for hepatitis B demonstrated that the host genome may determine the severity and symptoms of this infectious disease. This is similar to that previously reported in COVID-19 and pneumonia, where genetic factors have been shown to influence disease outcomes^41,42,43,44. Our unexpected success of GWAS and PRS for hepatitis B not only demonstrates the power of the large sample size of TPMI but also shows the necessity of population-specific genetic study for population-enriched diseases. The benefits extend beyond differences in ancestry to include environmental factors such as pathogen exposure, food intake and lifestyle influences. Exploring how the human genome interacts with these diverse external and environmental factors can greatly enhance our understanding of how genetic variants contribute to disease susceptibility or severity.

In addition, the comprehensive phenotypic data allow us to investigate the genetic correlation among several traits that have substantial implications for clinical applications and leverage them to improve the performance of PRS models. By identifying shared genetic risks across diseases, at-risk individuals can be alerted to pursue early detection of comorbidities and targeted prevention strategies. For example, the clustering of cardiometabolic traits, such as type 2 diabetes, hypertension and BMI, highlights their interconnected genetic basis and indicates that individuals with a high genetic risk for one condition may benefit from early screening and intervention for related conditions⁴⁵. Additionally, the shared genetic architecture allows the development of multitrait PRS models that integrate genetic risks across correlated traits, improving prediction accuracy and enabling precision medicine approaches that address several health outcomes simultaneously⁴⁶. Including the correlated traits in PRS model development improved performance, resulting in an average 1.55-fold increase in the explained percentage of phenotypic variation (P = 1.07 × 10⁻¹³). Although previous studies have proven the utility of multitraits on target diseases^36,47,48, we have extended the use this approach on a phenome-wide level and demonstrated the improvement across different types of traits. As a result, we produced well-performed PRS for various categories of diseases, including cardiometabolic diseases, autoimmune disorders and infectious diseases.

We evaluated our PRS models across several large cohorts, including the TWB, UKB and All of Us. The TPMI-derived PRS models consistently outperformed those developed from EUR when applied to diseases in people with Han Chinese or EAS ancestry from the three large cohorts. When comparing with EUR-derived PRS models, we also observed better performance across several traits in EAS, particularly for cardiometabolic and autoimmune diseases. Similarly, the TPMI-included cross-population model slightly improves performance in populations of other ancestries. These results highlight the need for population-specific models and emphasize the importance of genetic data from diverse populations to advance cross-population models. By integrating these well-developed PRS models, we estimate that genetics account for 10.3% of variation of hospitalization duration in TPMI. Although the estimates of genetic contributions to health measure may be influenced by disease prevalences and ascertainment biases, our result indicate integrating genetic risk-based health management strategies with traditional risk factors, such as age, sex, smoking and BMI, may enhance prediction models and refine personalized risk stratification.

As with other large-scale epidemiological studies, ascertainment bias is also observed in our study. TPMI’s case proportion shows a significant but moderate correlation with the prevalence from NHIRD, implying potential ascertainment bias of TPMI’s hospital-based design. Compared to the general population in Taiwan (NHIRD), TPMI participants are overrepresented in the middle-aged group (year of birth 1940–1970, 54.3% versus 38.3%), include slightly more females (55.1% versus 50.6%) and have a higher proportion of participants from northern Taiwan (59.5% versus 47.3%). These demographic differences, along with the volunteer-based recruitment process, probably contribute to the lower case proportions observed in TPMI (Fig. 1a). Importantly, a significant portion of disease records in the NHIRD originate from local clinics and primary care settings, which are not covered in TPMI. Notably, the EMRs of the participants are incomplete, as some participants receive care from several health providers, but the TPMI only has access to EMRs from their enrolment hospitals. We acknowledge that ascertainment biases may influence disease prevalence and heritability estimates. Thus, we accounted for case-control ascertainment by applying liability-scale transformations using population prevalence data from NHIRD and used independent validation for PRS (TWB, UKB and All of Us) to mitigate the effect of ascertainment biases, while acknowledging that residual biases may persist. Methods like inverse probability weighting could mitigate such biases⁴⁹, but these require detailed external reference data that are unavailable at present. Additionally, we observed a relatively low estimated heritability for body height and BMI in TPMI, compared to values reported in the literature^50,51. These estimates may be affected by factors such as inconsistencies in assessment across EMRs, variations in statistical approaches and reduced bias of assortative mating in TPMI population^52,53,54. These limitations also emphasize the need for future adjustments to enhance generalizability.

In addition to ascertainment bias, our study has other commonly found limitations. First, the TPMI cohort size is not sufficiently large to study some of the severe subtypes of many diseases, such as diabetes insipidus and neurofibromatosis. Second, we attempted to use eQTLs to elucidate the molecular mechanism of diseases, but the underrepresentation of EAS in current eQTL datasets, such as GTEx, poses challenges²⁷. Gene expression regulation varies across ancestries^55,56, and differences in LD structures further complicate colocalization analyses. Comparing to GTEx whole blood eQTL, the multi-ancestry lymphoblastoid eQTL and Japanese whole blood eQTL showed 309 more gene–trait pairs. Therefore, ancestral diversity is an urgent need not only in genomic data but also in transcriptomic, proteomic, metabolomic and epigenomic datasets. Third, the current project retrieved EMRs from an average of 5 years before enrolment, so some important data such as age of disease onset for the older participants are not available. Incomplete EMRs lead to less precise case definition of some participants. Fourth, some of the younger participants have high-risk genetic profiles but are disease free for those diseases. The duration of the project is too short to determine whether they will eventually develop those diseases.

An effort is underway to gain access to the complete EMRs of the TPMI participants and to recruit more participants with severe subtypes of common diseases. The high-risk participants who are symptom-free are being followed to monitor disease development. Future studies are being planned to study the high-risk individuals who escape disease development to identify genetic and non-genetic factors that mitigate their disease risk. Furthermore, the meta-analysis integrating TPMI with other large-scale EAS biobanks, such as TWB^11,12, Korean Genome and Epidemiology Study⁹, China Kadoorie Biobank¹⁰ and Biobank Japan⁸, may further enhance our understanding of the genetic aetiology in EAS and improve prediction models.

This study demonstrates that population-specific risk-prediction models, such as those developed for EAS in this work, can achieve strong predictive performance for traits with high relevance in that population. The PRS we developed for EAS performed well for several traits, including diseases with significant public health implications, such as type 2 diabetes and systemic lupus erythematosus. However, for certain traits, such as female breast cancer and glaucoma, PRS derived from both UKB and TPMI performed comparably, indicating that the genetic architecture of some traits allows for generalizable models. These findings emphasize the importance of developing and validating PRS models in diverse global populations to maximize their utility and equity in genetic risk prediction. Although our results emphasize the utility of developing population-specific PRS, further research is needed to directly compare their performance with multipopulation models and assess their generalizability and to assess their effect on disease prevention and management. In particular, longitudinal studies and real-world implementations will be critical to determine the extent to which PRS-guided interventions can delay disease onset or improve health outcomes. Furthermore, it is hoped that if all can obtain their genetic profiles and determine their risk for major diseases, many diseases can be prevented or their onset can be delayed significantly, thereby fulfilling the promise of modern genetics.

In conclusion, we used a large-scale dataset of individuals genetically similar to Han Chinese reference populations produced by the TPMI to conduct phenome-wide genetic analyses and leverage these genetic findings to train risk-prediction models for several diseases and traits. The developed models are validated in EAS of different biobanks and demonstrate a consistent performance that bodes well for their use in populations of Han Chinese and EAS ancestry. Our approach can serve as a template for developing PRS models in populations that are currently without such resources, anticipating the time when all populations around the world can benefit from risk-based health management as part of the precision health movement.

Methods

Study population and phenotyping

We used the TPMI dataset, which links extensive EMRs with genotypic data for 486,956 individuals. Dichotomized disease status was defined by phecodes, which were based on information extracted from the EMR using International Classification of Diseases codes^18,19. To ensure robustness, cases were defined by having the diagnosis of the relevant condition on two or more clinical visits. We also extracted quantitative traits from the EMR, including anthropometric, vital sign and laboratory measurements; we excluded the extreme outliers and removed or adjusted the treated and/or medicated measures on the basis of previous research; and the median value was kept if the participant had several qualified measures⁵⁹ (Supplementary Information). In this study, we focused on 695 phecodes that had at least 2,000 cases and 24 quantitative traits that were measured in at least 100,000 individuals. These phecodes spanned 16 disease categories, including but not limited to infectious diseases, neoplasms, endocrine/metabolic disorders and circulatory system diseases. The 24 quantitative traits were categorized into anthropometric, circulatory, hematological, kidney-related, liver-related and metabolic measurements.

Genotyping and quality control

We performed genotyping using two customized high-density Axiom SNP arrays produced by Thermo Fisher, TPMv1 and TPMv2. The genotyping experiments were conducted in six genotyping centres in Taiwan¹⁷. The raw genotypic data underwent quality control measures, and the genetic variants were excluded when they had a call rate less than 0.98, MAF < 0.01, or Hardy–Weinberg equilibrium test P < 1 × 10⁻⁶. We also excluded individuals with overall call rate less than 0.95, failed heterozygosity check, or inconsistent documented versus genetically determined sex. For this study, we only included the genetic variants found on both genotyping arrays and excluded variants with a significant batch effect in GWAS. The proportion of genetic ancestry was determined by ADMIXTURE⁶⁰, and the projected principal component scores with 1000 Genomes as a reference panel were applied to determine individuals’ ancestry⁶¹. As a result, 401,710 genetic variants and 463,447 Han Chinese participants passed all quality control measures and were used in the subsequent studies. Details are found in Supplementary Information and on GitHub (https://github.com/TPMI-Taiwan/tpmi-qc).

Phasing and imputation

Phasing was conducted on quality-control-passed genotype data with SHAPEIT5⁶². Genome imputation was carried out with IMPUTE5 using a reference panel of 1,498 whole-genome-sequenced TWB subjects^12,63. We also conducted postimputation quality control with exclusion criteria INFO score ≤ 0.7 and MAF ≤ 0.01. In addition, we also performed a chip-GWAS for minimizing the bias from different chips, resulting in a dataset of 8,046,864 well-imputed common genetic variants.

Population structure and relatedness estimation

We performed a principal component analysis (PCA) on the basis of genotyped variants to capture the effect of population structure. To diminish the effect of close relatives, the main PCA was conducted in a genetically unrelated subset, and other subjects were projected with the calculated PC weightings. Then these PCA scores were leveraged to accurately quantify the proportion of identity by descent and degree of relatedness. The maximum unrelated set was determined on the basis of these estimated degrees of relatedness. PC-AiR and PC-Relate were used for PCA and relatedness estimation, and PRIMUS was used for identifying the maximum unrelated set with the third degree as threshold^64,65,66.

GWAS

The entire dataset was divided into three subgroups: the GWAS set (n = 363,447), the training set (n = 80,000) and the testing set (n = 20,000). To maximize the statistical power, we used a mixed-effect regression model to examine the association between genotype and outcome of interest, logistic regression for dichotomized phecode and linear regression for quantitative traits. The quantile-normalization was applied to quantitative traits to ensure the normal distribution. The mixed-effect model accounted for relatedness among individuals by including a random effect for pairwise kinship. The model was also adjusted for key covariates, including age, sex, age², interactions between age/age² and sex, genotyping chip, enrolment hospital and ten genetic principal components to control for population stratification. SAIGE was applied for the mixed-effect model GWAS⁶⁷. In the GWAS set, we selected an unrelated subset (n = 248,754) to perform GWAS using a generalized linear model with PLINK2, and we conducted 1:10 age, sex-matching for the traits with imbalanced case/control ratio (less than 1/20). These PLINK2 GWAS statistics were then used for heritability and genetic correlation estimation⁶⁸.

Replication evaluation

To systematically evaluate the performance of our GWAS, we leveraged a presummarized phenotype–genotype reference map⁶⁹, which collected 5,879 genetic associations for 149 unique phecodes from 523 published GWAS, including 1,215 associations from EAS. We calculate the overall and power-adjusted replication rates and actual over expected ratio for each available phecode and categories, respectively. The R package PGRM was used to measure the quality of biobank data through replication⁶⁹.

Fine mapping

We performed fine mapping to identify the independent GWAS signals in all genomic regions containing any variant with a P value less than 5 × 10⁻⁸ and plus or minus 1.5 Mb of the regional lead variant¹⁴, except the MHC region (chromosome 6: 25,391,792–33,424,245) because of its complex linkage disequilibrium structure. We used the reported 95% credible set to determine the independent signals, and up to ten signals were allowed for each region. The genome-wide significant threshold was applied for defining a credible set as an independent hit, and a further requirement of log Bayes factor > 2 was applied for the second hit. For the failed fine-mapping regions and MHC region, we used the lead SNP as the hit of each significant region. SuSiE was conducted for this summary statistics-based fine mapping with linkage disequilibrium derived from our imputation reference panel⁷⁰, which reflects our study population’s genetic architecture. Although using linkage disequilibrium from the GWAS sample might improve accuracy, we used the imputation panel because of the computation efficiency.

New association identification

We comprehensively compared our GWAS results with reported significant signals on the NHGRI-EBI GWAS Catalogue²¹, downloaded on 11 March 2024. The mapping of phecodes and quantitative traits to GWAS catalogue phenotypes is summarized in Supplementary Table 16. We classified a variant as new if the fine-mapped independent signal was not located within 1 Mb of any reported genome-wide significant association (P < 5 × 10⁻⁸) for the corresponding phenotype. Additionally, a variant was considered a new hit if the highest linkage disequilibrium r² was less than 0.1, with any reported significant association within 1 Mb. Associations derived from uncertain and umbrella phecodes were excluded, and for duplicated genetic variants or regions, we only reported the association with the smaller P value or from the phecode with the more specific definition. Finally, we used ANNOVAR to annotate the new variants with data from the RefSeqGene database (updated 17 August 2020)^71,72. For the new variants, we explored their allele frequencies in non-Finnish European, AFR, SAS and AMR from gnomAD⁷³. We also compared the effect size between TPMI and UKB with a t-test for investigating the ancestry-specific effect.

Heritability, genetic correlation and clustering

To quantify the genomic contribution of the specific traits, we applied linkage disequilibrium score regression to estimate the SNP-based heritability with LDSC²⁶. The GWAS summary statistics and the precalculated linkage disequilibrium score from the EAS superpopulation of 1000 Genomes were used⁶¹. For the dichotomized traits, we performed a liability-scaled transformation on the observed heritability using the 5-year population prevalence from the NHIRD of the Health and Welfare Data Science Center^20,74. For traits with a higher prevalence in our dataset (TPMI) than the population (NHIRD), we applied the equation from ref. ⁷⁵. For other traits, we used the adapted equation from ref. ⁷⁴. Additionally, we conducted LDSC to obtain pairwise genetic correlations to assess the similarity of genetic mechanisms between traits⁷⁶. On the basis of the genetic correlation matrix, we used a hierarchical cluster analysis to identify groups of traits that share genetic mechanisms. We used the weighted pair group method with arithmetic mean for clustering, and the resulting cluster tree was used for group identification. Moreover, we estimate the genetic correlation across populations, TPMI and UKB, to demonstrate varied genetic architecture in different ancestry populations. For the UKB GWAS, we applied a generalized linear model from PLINK2 with the predefined phecode (https://github.com/umich-cphds/createUKBphenome) and corresponding baseline quantitative measures among the identified unrelated set (n = 378,544). We used Popcorn for the cross-population genetic correlation, and two correlation coefficients were calculated: the transethnic genetic-effect correlation (ρ_ge) and transethnic genetic-impact correlation (ρ_gi)³⁰.

Gene-level heritability and colocalization

We used both gene-level heritability estimation and colocalization analysis to map our GWAS findings to functional units, specifically genes. We conducted h2gene analysis to partition SNP-based heritability to the gene level⁵⁷. We estimated heritability for genes that overlapped with fine-mapped regions, where gene regions were defined as the gene body plus or minus 10 kb for gene-level heritability. Additionally, to illustrate the molecular functions of genes of interest, we used colocalization analysis to examine whether there are shared common genetic causal variants between tissue-specific gene expression and traits of interest. We used eQTL resources from 49 tissues in GTEx v.8²⁷, lymphoblastoid in MAGE²⁸ and whole blood in JCTF²⁹, testing any gene with genome-wide significant signals in the cis-regulation region (transcription start site plus or minus 1 Mb). The posterior probabilities were used to evaluate colocalization between gene expression and the trait of interest. The R package coloc was used with SuSiE, relaxing the single causal variant assumption^58,77,78.

Single-trait and multitrait PRS

The preserved dataset of 100,000 unrelated TPMI subjects was split into two subsets, training (n = 80,000) and validation (n = 20,000), for PRS model building. Five popular PRS tools were used—LDpred2³¹, Lassosum2³², PRS-CS³³, SBayesR³⁴ and MegaPRS³⁵—and the training subset was applied for parameter selection and model optimization if needed. LDpred2, PRS-CS and SBayesR assumed the effect of genetic variants following a mixture distribution with different predefined parameters and applied a Bayesian framework for distribution estimation. Lassosum2 used a penalized regression (LASSO) for weight estimating, and MegaPRS leveraged MAF and linkage disequilibrium for model building. We then used the validation subset to evaluate the performance of PRS models. Individual scores were calculated with PLINK2⁶⁸. The explained variance (r²) was used to evaluate the performance of PRS for quantitative traits^75,79, and two indices, AUC and liability-scaled r², were used for PRS of phecodes. We followed the approach of ref. ⁷⁹ and report both raw r² and r² adjusted for covariates (sex, age and PCs). Additionally, we include partial r² estimates, calculated using the R package rsq. To account for population stratification in cross-cohort predictions, we also report r² with PCs as covariates in Supplementary Tables 9–14. For AUC comparisons, we include a baseline model incorporating standard covariates (sex, age and PCs) to better assess the added predictive power of PRS. We used the likelihood ratio test to obtain the significance for r² with the R package lmtest, and we calculated the standard error for AUC with the R package auctestr. To further leverage the gene’s pleiotropy and shared genetic mechanism among traits, we conducted multitrait PRS model building for the traits in the same genetic cluster based on pairwise genetic correlation identified in the previous step. We pooled all PRS models from the five tools for those identified traits and applied an elastic net regression to combine their weighting and find the most optimized model for the target trait. We performed PRSmix+ for multiple-traits PRS model building³⁶. The cross-population PRS models were based on both TPMI and UKB European GWAS (https://pheweb.org/UKB-TOPMed/), and PRS-CSx was applied⁸⁰.

External validation and comparison

We conducted an external validation of our developed PRS using data from the TWB, EAS from UKB and All of Us. TWB is a community-based biobank, and it has recruited over 200,000 participants in Taiwan. Herein, we used 88,628 unrelated subjects (greater than third degree; we removed 5,242 overlapped individuals), who were genotyped with the Axiom customized chip TWB2 (equivalent to TPMv1); their genotyping quality control, phasing and imputation followed the same protocol as described above. The self-reported disease condition was queried from their baseline questionnaire, except for cancer. Because the study design of TWB excluded cancer patients at recruitment, we used both baseline and follow-up self-reporting data to define cancer cases and controls. UKB has enroled approximately 500,000 participants since 2006 and linked their genetic data with enriched phenotypic data. For UKB validation, we used their inpatient record for case definition. Their ancestral population was determined by self-reported ethnic background, such as self-reported Chinese as EAS (n = 1,572); white, British, Irish and any other white background as EUR (n = 472,869); Black or Black British, Caribbean, African and any other Black background as AFR (n = 8,074); and Asian or Asian British, Indian, Pakistani, Bangladeshi and any other Asian background as SAS (n = 9,893). All of Us intends to enrol more than 1 million participants in the United States and has released whole-genome genotyping data for approximately 312,000 participants as of the first quarter of 2024. We applied ADMIXTURE with 1000 Genomes as a reference panel to assign the genetically inferred ancestral populations, including EAS (n = 6,895), EUR (n = 152,754), AFR (n = 60,964), AMR (n = 32,394) and SAS (n = 2,334). The genetically confirmed EAS as well as other superpopulations and their linked EMR were used for validating our PRS models. Moreover, we compared the TPMI-derived PRS model with UKB-derived models to investigate the performance of population-specific PRS. The UKB-derived models were based on published UKB European GWAS (https://pheweb.org/UKB-TOPMed/), and LDpred2-auto was applied for model building.

Overall health measures evaluation

We evaluated the genetic effect on overall health measures. We used the number of clinical visits and the aggregate duration of hospitalization as overall health indices. Owing to collinearity among PRS for different traits, we used a partial least square-generalized linear model to extract components from the PRS of qualified traits with the R package plsRglm⁸¹. The number of extracted components was determined by the Akaike Information Criterion. We then estimated the covariate-adjusted proportion of genetic contribution (r²) by comparing the full model with the null model, which included only covariates such as sex, age and hospital. We used a likelihood ratio test to obtain the significances of regression models. For each index, we used three models to compare the top and bottom 5%, 10% and 20%. We selected covariate-matched controls from subjects without hospitalization records as the bottom group for hospitalization models.

Ethics statement

This study was approved by the Institutional Review Boards of Taipei Veterans General Hospital (2020-08-014A), National Taiwan University Hospital (201912110RINC), Tri-Service General Hospital (2-108-05-038), Chang Gung Memorial Hospital (201901731A3), Taipei Medical University Healthcare System (N202001037), Chung Shan Medical University Hospital (CS19035), Taichung Veterans General Hospital (SF19153A), Changhua Christian Hospital (190713), Kaohsiung Medical University Chung-Ho Memorial Hospital (KMUHIRB-SV(II)-20190059), Hualien Tzu Chi Hospital (IRB108-123-A), Far Eastern Memorial Hospital (110073-F), Ditmanson Medical Foundation Chia-Yi Christian Hospital (IRB2021128), Taipei City Hospital (TCHIRB-10912016), Koo Foundation Sun Yat-Sen Cancer Center (20190823 A), Cathay General Hospital (CGH-P110041), Fu Jen Catholic University Hospital (FJUH109001) and Academia Sinica (AS-IRB01-18079). Written informed consent was obtained from the subjects in accordance with institutional requirements and the Declaration of Helsinki principles. All collected information was de-identified before statistical data analysis. The analysis with TWB was approved by Institutional Review Boards of Academia Sinica (AS-IRB-BM-19014), and the NHIRD analysis with the Health and Welfare Data Science Center (HWDC) was approved by Institutional Review Boards of Academia Sinica (AS-IRB-BM-23056). This research has been conducted using the UKB Resource under UKB Main Application 15326. We worked with All of Us data using the All of Us Researcher Workbench under the workspace ‘Duplicate of Prediction of Polygenic Traits’.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All PRS models and GWAS results (summary statistics from SAIGE and PLINK) are available from the TPMI website (https://pheweb.ibms.sinica.edu.tw/). The UKB phecode GWAS was obtained from UK Biobank TOPMed-imputed PheWeb (https://pheweb.org/UKB-TOPMed/). The eQTL resources are downloaded from the websites of GTEx (https://www.gtexportal.org), MAGE (https://doi.org/10.5281/zenodo.10535719) and JCTF (https://humandbs.dbcls.jp/en/hum0343-v4). A detailed description of data availability and the application process of TWB, NHIRD, UKB and All of Us can be found on their websites (TWB, https://www.biobank.org.tw/english.php; NHIRD, https://dep.mohw.gov.tw/DOS/cp-5119-59201-113.html; UKB, https://www.ukbiobank.ac.uk/; All of Us: https://www.researchallofus.org/).

Code availability

Code for genotyping quality control process and analysis is available at GitHub (https://github.com/TPMI-Taiwan/).

References

Lennon, N. J. et al. Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations. Nat. Med. 30, 480–487 (2024).
Article CAS PubMed PubMed Central Google Scholar
Thompson, D. J. et al. A systematic evaluation of the performance and properties of the UK Biobank Polygenic Risk Score (PRS) Release. PLoS ONE 19, e0307270 (2024).
Article CAS PubMed Google Scholar
Mills, M. C. & Rahal, C. The GWAS Diversity Monitor tracks diversity by disease in real time. Nat. Genet. 52, 242–243 (2020).
Article CAS PubMed Google Scholar
Duncan, L. et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10, 3328 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bentley, A. R., Callier, S. & Rotimi, C. N. Diversity and inclusion in genomic research: why the uneven progress? J. Community Genet. 8, 255–266 (2017).
Article PubMed PubMed Central Google Scholar
Smith, J. L. et al. Multi-ancestry polygenic risk score for coronary heart disease based on an ancestrally diverse genome-wide association study and population-specific optimization. Circ. Genom. Precis. Med. 17, e004272 (2024).
Article CAS PubMed Google Scholar
Ishigaki, K. et al. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat. Genet. 52, 669–679 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nam, K., Kim, J. & Lee, S. Genome-wide study on 72,298 individuals in Korean biobank data for 76 traits. Cell Genom. 2, 100189 (2022).
Article CAS PubMed PubMed Central Google Scholar
Walters, R. G. et al. Genotyping and population characteristics of the China Kadoorie Biobank. Cell Genom. 3, 100361 (2023).
Article CAS PubMed PubMed Central Google Scholar
Feng, Y. A. et al. Taiwan Biobank: a rich biomedical research database of the Taiwanese population. Cell Genom. 2, 100197 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wei, C. Y. et al. Genetic profiles of 103,106 individuals in the Taiwan Biobank provide insights into the health and history of Han Chinese. NPJ Genom. Med. 6, 10 (2021).
Article CAS PubMed Central Google Scholar
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed Google Scholar
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature 627, 340–346 (2024).
Article Google Scholar
Verma, A. et al. Diversity and scale: genetic architecture of 2068 traits in the VA Million Veteran Program. Science 385, eadj1182 (2024).
Article CAS PubMed Google Scholar
Yang, H.-C. et al. The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies. Nature https://doi.org/10.1038/s41586-025-09680-x (2025).
Bastarache, L. Using phecodes for research with the electronic health record: from PheWAS to PheRS. Annu. Rev. Biomed. Data Sci. 4, 1–19 (2021).
Article PubMed PubMed Central Google Scholar
Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1110 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lin, L. Y., Warren-Gash, C., Smeeth, L. & Chen, P. C. Data resource profile: the National Health Insurance Research Database (NHIRD). Epidemiol. Health 40, e2018062 (2018).
Article PubMed PubMed Central Google Scholar
Sollis, E. et al. The NHGRI-EBI GWAS catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
Article CAS PubMed Google Scholar
Zhou, Y. et al. Performance of multigene testing in cytologically indeterminate thyroid nodules and molecular risk stratification. PeerJ 11, e16054 (2023).
Article PubMed PubMed Central Google Scholar
Tyagi, A., Goyal, A., Chaware, P. & Rathinam, B. A. D. Mutations of PHOX2B gene in patients of obesity hypoventilation syndrome in central India. J. Lab Physicians 14, 164–168 (2022).
Article CAS PubMed Google Scholar
He, D. et al. A longitudinal genome-wide association study of bone mineral density mean and variability in the UK Biobank. Osteoporos. Int. 34, 1907–1916 (2023).
Article CAS Google Scholar
Chang, K. C. et al. Survey of hepatitis B virus infection status after 35 years of universal vaccination implementation in Taiwan. Liver Int. 44, 2054–2062 (2024).
Article CAS PubMed Google Scholar
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article Google Scholar
Taylor, D. J. et al. Sources of gene expression variation in a globally diverse human cohort. Nature 632, 122–130 (2024).
Article ADS CAS PubMed Google Scholar
Wang, Q. S. et al. The whole blood transcriptional regulation landscape in 465 COVID-19 infected samples from Japan COVID-19 Task Force. Nat. Commun. 13, 4830 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Brown, B. C., Asian Genetic Epidemiology Network Type 2 Diabetes Consortium, Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Article CAS PubMed PubMed Central Google Scholar
Prive, F., Arbel, J. & Vilhjalmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2021).
Article PubMed Google Scholar
Prive, F., Arbel, J., Aschard, H. & Vilhjalmsson, B. J. Identifying and correcting for misspecifications in GWAS summary statistics and polygenic scores. HGG Adv. 3, 100136 (2022).
PubMed PubMed Central Google Scholar
Ge, T., Chen, C. Y., Ni, Y., Feng, Y. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10, 1776 (2019).
Article ADS PubMed Central Google Scholar
Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 10, 5086 (2019).
Article ADS Google Scholar
Zhang, Q., Prive, F., Vilhjalmsson, B. & Speed, D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat. Commun. 12, 4192 (2021).
Article ADS CAS PubMed Central Google Scholar
Truong, B. et al. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. Cell Genom. 4, 100523 (2024).
Article CAS PubMed PubMed Central Google Scholar
Spracklen, C. N. et al. Identification of type 2 diabetes loci in 433,540 East Asian individuals. Nature 582, 240–245 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Chang, S. W. et al. A genome-wide association study on chronic HBV infection and its clinical progression in male Han-Taiwanese. PLoS ONE 9, e99724 (2014).
Article ADS PubMed Central Google Scholar
Zeng, Z. et al. Genome-wide association study identifies new loci associated with risk of HBV infection and disease progression. BMC Med. Genomics 14, 84 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Genome-wide association study identifies 8p21.3 associated with persistent hepatitis B virus infection among Chinese. Nat. Commun. 7, 11664 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, H. H. et al. Host genetic effects in pneumonia. Am. J. Hum. Genet 108, 194–201 (2021).
Article CAS PubMed Google Scholar
Covid- Host Genetics Initiative. A second update on mapping the human genetic architecture of COVID-19. Nature 621, E7–E26 (2023).
Article Google Scholar
Covid- Host Genetics Initiative. A first update on mapping the human genetic architecture of COVID-19. Nature 608, E1–E10 (2022).
Article Google Scholar
Asgari, S. & Pousaz, L. A. Human genetic variants identified that affect COVID susceptibility and severity. Nature 600, 390–391 (2021).
Article ADS CAS PubMed Google Scholar
Kember, R. L. et al. Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine. Pac. Symp. Biocomput. 29, 611–626 (2024).
PubMed Google Scholar
Khunsriraksakul, C. et al. Multi-ancestry and multi-trait genome-wide association meta-analyses inform clinical risk prediction for systemic lupus erythematosus. Nat. Commun. 14, 668 (2023).
Article ADS CAS PubMed Central Google Scholar
Kelemen, M., Vigorito, E., Fachal, L., Anderson, C. A. & Wallace, C. shaPRS: leveraging shared genetic effects across traits or ancestries improves accuracy of polygenic scores. Am. J. Hum. Genet. 111, 1006–1017 (2024).
Article CAS PubMed PubMed Central Google Scholar
Zhai, S., Guo, B., Wu, B., Mehrotra, D. V. & Shen, J. Integrating multiple traits for improving polygenic risk prediction in disease and pharmacogenomics GWAS. Brief. Bioinform. 24, bbad181 (2023).
Article Google Scholar
Schoeler, T. et al. Participation bias in the UK Biobank distorts genetic associations and downstream analyses. Nat. Hum. Behav. 7, 1216–1227 (2023).
Article PubMed Google Scholar
Chen, C.-Y. et al. Analysis across Taiwan Biobank, Biobank Japan, and UK Biobank identifies hundreds of novel loci for 36 quantitative traits. Cell Genom. 3, 100436 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wainschtein, P. et al. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54, 263–273 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018).
Article PubMed PubMed Central Google Scholar
Border, R. et al. Assortative mating biases marker-based heritability estimators. Nat. Commun. 13, 660 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, M. X. et al. A major gene model of adult height is suggested in Chinese. J. Hum. Genet. 49, 148–153 (2004).
Article PubMed Google Scholar
Zhong, Y., Perera, M. A. & Gamazon, E. R. On using local ancestry to characterize the genetic architecture of human traits: genetic regulation of gene expression in multiethnic or admixed populations. Am. J. Hum. Genet. 104, 1097–1115 (2019).
Article CAS PubMed Central Google Scholar
Gay, N. R. et al. Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol. 21, 233 (2020).
Article CAS PubMed PubMed Central Google Scholar
Burch, K. S. et al. Partitioning gene-level contributions to complex-trait heritability by allele frequency identifies disease-relevant genes. Am. J. Hum. Genet. 109, 692–709 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wallace, C. A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet. 17, e1009440 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kirby, J. C. et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J. Am. Med. Inf. Assoc. 23, 1046–1052 (2016).
Article Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article ADS Google Scholar
Hofmeister, R. J., Ribeiro, D. M., Rubinacci, S. & Delaneau, O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat. Genet. 55, 1243–1249 (2023).
Article CAS PubMed PubMed Central Google Scholar
Rubinacci, S., Delaneau, O. & Marchini, J. Genotype imputation using the positional Burrows Wheeler transform. PLoS Genet. 16, e1009049 (2020).
Article CAS PubMed PubMed Central Google Scholar
Conomos, M. P., Reiner, A. P., Weir, B. S. & Thornton, T. A. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet. 98, 127–148 (2016).
Article CAS PubMed PubMed Central Google Scholar
Conomos, M. P., Miller, M. B. & Thornton, T. A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 39, 276–293 (2015).
Article PubMed PubMed Central Google Scholar
Staples, J. et al. PRIMUS: rapid reconstruction of pedigrees from genome-wide estimates of identity by descent. Am. J. Hum. Genet. 95, 553–564 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Bastarache, L. et al. The phenotype-genotype reference map: improving biobank data science through replication. Am. J. Hum. Genet. 110, 1522–1533 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zou, Y., Carbonetto, P., Wang, G. & Stephens, M. Fine-mapping from summary data with the “Sum of Single Effects” model. PLoS Genet. 18, e1010299 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central Google Scholar
Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 (2023).
Article CAS PubMed Google Scholar
Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).
Article ADS CAS PubMed Google Scholar
Ojavee, S. E., Kutalik, Z. & Robinson, M. R. Liability-scale heritability estimation for biobank studies of low-prevalence disease. Am. J. Hum. Genet. 109, 2009–2017 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lee, S. H., Goddard, M. E., Wray, N. R. & Visscher, P. M. A better coefficient of determination for genetic profile analysis. Genet. Epidemiol. 36, 214–224 (2012).
Article PubMed Google Scholar
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wallace, C. Statistical testing of shared genetic control for potentially related traits. Genet. Epidemiol. 37, 802–813 (2013).
Article PubMed Central Google Scholar
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B 82, 1273–1300 (2020).
Article MathSciNet Google Scholar
Ni, G. et al. A comparison of ten polygenic score methods for psychiatric disorders applied across multiple cohorts. Biol. Psychiatry 90, 611–620 (2021).
Article PubMed PubMed Central Google Scholar
Ruan, Y. et al. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet. 54, 573–580 (2022).
Article CAS PubMed Google Scholar
Bertrand, F. & Maumy-Bertrand, M. plsRglm: partial least squares linear and generalized linear regression for processing incomplete datasets by cross-validation and bootstrap techniques with R. Preprint at https://arxiv.org/abs/1810.01005 (2018).

Download references

Acknowledgements

We thank all the participants and researchers of the Taiwan Precision Medicine Initiative and the Taiwan Biobank. We acknowledge the use of data from the National Health Insurance Research Database, provided by the Ministry of Health and Welfare of Taiwan. This study was funded in part by the Academia Sinica (grant nos. 40-05-GMM, AS-GC-110-MD02 and 236e-1100202 to P.-Y.K. and J.-Y.W.) and the National Development Fund, Executive Yuan (grant no. NSTC 111-3114-Y-001-001 to P.-Y.K.). This work used ASGC (Academia Sinica Grid-computing Center) Distributed Cloud resources, which is supported by Academia Sinica. Analysis using UK Biobank data used computational resources hosted by the Michigan State University High-Performance Computing Center. Data from the UK Biobank include data provided by patients and collected by the National Health Service (NHS) England as part of their care and support. UK Biobank data also include data assets made available by National Safe Haven as part of the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (research that commenced between 1 October 2020 and 31 March 2021, grant no. MC_PC_20029; 1 April 2021 to 30 September 2022, grant no. MC_PC_20058). Also, we acknowledge the contributions of the All of Us participants who make this project possible and the work of the National Institutes of Health’s All of Us Research Program for making this data available.

Author information

These authors contributed equally: Hung-Hsin Chen, Chien-Hsiun Chen, Ming-Chih Hou, Yun-Ching Fu
These authors jointly supervised this work: Wayne Huey-Herng Sheu, Shun-Fa Yang, Jyh-Ming Liou, Jaw-Yuan Wang, Jeng-Fong Chiou, Jer-Yuarn Wu, Cathy S.-J. Fann

Authors and Affiliations

Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
Hung-Hsin Chen, Chien-Hsiun Chen, Ling-Hui Li, Che-Yu Chou, Erh-Chan Yeh, Ming-Fang Tsai, Yi-Min Liu, Chun-yu Wei, Jen-Ping Su, Wan-Jia Lin, Elin H. F. Wang, Cheng-Shin Yang, Ru-Hui Weng, Yu-Chi Chen, Chun-Ping Chang, Tai-Hsun Wu, Yu-Chang Lin, Sye-Pu Chen, Feng-Jen Hsieh, Yuan-Tsong Chen, Pui-Yan Kwok, Jer-Yuarn Wu & Cathy S. J. Fann
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
Hung-Hsin Chen
Division of Gastroenterology and Hepatology, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
Ming-Chih Hou
Department of Pediatric Cardiology, Taichung Veterans General Hospital, Taichung, Taiwan
Yun-Ching Fu
Children’s Medical Center, Taichung Veterans General Hospital, Taichung, Taiwan
Yun-Ching Fu
Department of Pediatrics, School of Medicine, National Chung-Hsing University, Taichung, Taiwan
Yun-Ching Fu
Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
Chun-houh Chen, Hsin-Chou Yang & Yen-Tsung Huang
Biomedical Translation Research Center, Academia Sinica, Taipei, Taiwan
Hsin-Chou Yang
Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan
Yen-Tsung Huang
Department of Mathematics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
Yen-Tsung Huang
Core Laboratory of Neoantigen Analysis for Personalized Cancer Vaccine, Office of R&D, Taipei Medical University, Taipei, Taiwan
Chun-yu Wei
Department of Chest Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
Chi-Lu Chiang
School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Chi-Lu Chiang, Jeng-Kai Jiang, I-Hui Lee, Shih-Yao Lin, Yi-Chen Yeh & Ling-Ming Tseng
Division of Colon and Rectal Surgery, Department of Surgery, Taipei Veterans General Hospital, Taipei, Taiwan
Jeng-Kai Jiang
Department of Neurology, Neurological Institute, Taipei Veterans General Hospital, Taipei, Taiwan
I-Hui Lee
Institute of Brain Science, National Yang Ming Chiao Tung University, Taipei, Taiwan
I-Hui Lee
Department of Medical Research, Taipei Veterans General Hospital, Taipei, Taiwan
Kung-Hao Liang
Institute of Food Safety and Health Risk Assessment, National Yang Ming Chiao Tung University, Taipei, Taiwan
Kung-Hao Liang
Institute of Biomedical Informatics, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Kung-Hao Liang
Division of Allergy, Immunology & Rheumatology, Taipei Veterans General Hospital, Taipei, Taiwan
Wei-Sheng Chen & Hung-Cheng Tsai
Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Wei-Sheng Chen & Hung-Cheng Tsai
Department of Pathology and Laboratory Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
Shih-Yao Lin, Fu-Pang Chang, Hsiang-Ling Ho, Yi-Chen Yeh, Wen-Yih Liang & Paul Chih-Hsueh Chen
Department of Biotechnology and Laboratory Science in Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Hsiang-Ling Ho
Division of Nephrology, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
Wei-Cheng Tseng
School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Wei-Cheng Tseng & Hsiao-Ting Chang
Department of Family Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
Ming-Hwai Lin & Hsiao-Ting Chang
Division of General Surgery, Department of Surgery, Taipei Veterans General Hospital, Taipei, Taiwan
Ling-Ming Tseng
Comprehensive Breast Health Center, Taipei Veterans General Hospital, Taipei, Taiwan
Ling-Ming Tseng
Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
Yu-Cheng Hsieh, Yi-Ming Chen, Tzu-Hung Hsiao, Ching-Heng Lin, Yen-Ju Chen, I-Chieh Chen, Chien-Lin Mao & Shu-Jung Chang
Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, Taiwan
Yu-Cheng Hsieh, Yi-Ming Chen, Wei-Ju Lee, Hsin Tung & Yi-Jing Sheen
Department of Medical Research, Institute of Clinical Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Yu-Cheng Hsieh
Department of Pharmacy, Taichung Veterans General Hospital, Taichung, Taiwan
Yen-Lin Chang & Yi-Ju Liao
Department of Medicine and Cardiovascular Center, Taichung Veterans General Hospital, Taichung, Taiwan
Chih-Hung Lai
Neurological Institute, Taichung Veterans General Hospital, Taichung, Taiwan
Wei-Ju Lee & Hsin Tung
Department of Otolaryngology, Taichung Veterans General Hospital, Taichung, Taiwan
Ting-Ting Yen
Division of Pediatric Genetics and Metabolism, Children’s Medical Center, Taichung Veterans General Hospital, Taichung, Taiwan
Hsin-Chien Yen
Division of Gastroenterology and Hepatology, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
Ming-Yao Chen
TMU Research Center for Digestive Medicine, Taipei Medical University, Taipei, Taiwan
Ming-Yao Chen
Division of Gastroenterology, Department of Internal Medicine, Shuang Ho Hospital, Taipei Medical University, New Taipei City, Taiwan
Ming-Yao Chen & Bi-Zhen Kao
Department of Family Medicine, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
Ying-Chin Lin
Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
Ying-Chin Lin & Chang-Hsien Lin
Department of Occupational Medicine, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
Ying-Chin Lin
Division of Cardiology, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
Yung-Ta Kao & Ju-Chi Liu
Division of Cardiology, Department of Internal Medicine, Taipei Medical University Hospital, Taipei, Taiwan
Yung-Ta Kao
Taipei Heart Institute, Taipei Medical University, Taipei, Taiwan
Yung-Ta Kao & Ju-Chi Liu
Department of Neurology, Wan Fang Hospital and Taipei Medical University, Taipei, Taiwan
Jing-Er Lee
Division of Pulmonary Medicine, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
Chi-Li Chung
School of Respiratory Therapy, College of Medicine, Taipei Medical University, Taipei, Taiwan
Chi-Li Chung
Division of Cardiology, Department of Internal Medicine, Shuang Ho Hospital, Taipei Medical University, New Taipei City, Taiwan
Ju-Chi Liu
Division of Cardiovascular Medicine, Department of Internal Medicine, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
Paul Chan
Department of Physical Medicine and Rehabilitation, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
Chia-Hsin Chen
Regenerative Medicine and Cell Therapy Research Center, Kaohsiung Medical University, Kaohsiung, Taiwan
Chia-Hsin Chen
Division of Gastroenterology, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
I-Chen Wu & Jiunn-Wei Wang
Center for Cancer Research, Kaohsiung Medical University, Kaohsiung, Taiwan
I-Chen Wu
Departments of Pediatrics, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Lung-Chang Lin
Department of Pediatrics, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
Lung-Chang Lin & Chih-Hsing Hung
Department of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
Jiunn-Wei Wang
Graduate Institute of Clinical Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
Jiunn-Wei Wang & Jaw-Yuan Wang
Division of Breast Oncology and Surgery, Department of Surgery, Kaohsiung Medical University Chung-Ho Memorial Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Shen-liang Shih
Center for Medical Education and Humanizing Health Professional Education, Kaohsiung Medical University, Kaohsiung, Taiwan
Shen-liang Shih
Department of Neurology, Kaohsiung Municipal Siaogang Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Sun-Wung Hsieh
Department of Neurology, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Sun-Wung Hsieh
Department of Neurology, Faculty of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
Sun-Wung Hsieh
Research Center for Precision Environmental Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
Chih-Hsing Hung
Department of Pediatrics, Kaohsiung Municipal Siaogang Hospital, Kaohsiung, Taiwan
Chih-Hsing Hung
Department of Urology, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Wei-Ming Li
Department of Urology, School of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
Wei-Ming Li
Department of Urology, Kaohsiung Medical University Gangshan Hospital, Kaohsiung, Taiwan
Wei-Ming Li
Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Chih-Jen Yang
School of Post-Baccalaureate Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
Chih-Jen Yang
Division of Endocrinology and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
Yi-Jing Sheen & Wayne Huey-Herng Sheu
Department of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Yi-Jing Sheen
National Center for Geriatrics and Welfare Research, National Health Research Institutes, Miaoli, Taiwan
Shi-Heng Wang
Michigan State University, East Lansing, MI, USA
Timothy Raben, Erik Widen & Stephen Hsu
Genomic Prediction, Inc., Hackettstown, NJ, USA
Erik Widen & Stephen Hsu
Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
Feng-Jen Hsieh
Division of Urology, Department of Surgery, Chang Gung Memorial Hospital, Chiayi, Taiwan
Dong-Ru Ho
Graduate Institute of Clinical Medical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
Dong-Ru Ho
School of Medicine, National Tsing Hua University, Hsinchu, Taiwan
Dong-Ru Ho
Department of Dermatology, Chang Gung Memorial Hopistal, Linkou, Taiwan
Yu-Huei Huang
School of Medicine, College of Medicine, Chang-Gung University, Taoyuan, Taiwan
Yu-Huei Huang
Division of Rheumatology, Allergy and Immunology, Department of Internal Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
Chung-Han Yang, Yen-Fu Chen & Ping-Han Tsai
Department of Psychiatry and Sleep center, Chang Gung Memorial Hospital, Taoyuan, Taiwan
Yu-Shu Huang
College of Medicine, Chang Gung University, Taoyuan, Taiwan
Yu-Shu Huang
Department of Obstetrics and Gynecology, Chang Gung Memorial Hospital, Linkou Medical Center and Chang Gung University College of Medicine, Taoyuan, Taiwan
Hsien-Ming Wu & Kuan-Gen Huang
Division of Rheumatology, Allergy and Immunology, Department of Internal Medicine, New Taipei City Municipal TuCheng Hospital, New Taipei City, Taiwan
Ping-Han Tsai
Department of Otolaryngology Head & Neck Surgery, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
Chih-Yen Chien
Doctoral Program of Clinical and Experimental Medicine, National Sun Yat-sen University, Kaohsiung, Taiwan
Chih-Yen Chien
Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
Yi-Lwun Ho, Ming-Shiang Wu, Jia-Horng Kao, Yen-Bin Liu, Mao-Hsin Lin & Yen-Hung Lin
Department of Internal Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
Yi-Lwun Ho, Ming-Shiang Wu, Yen-Bin Liu, Jyh-Ming Jimmy Juang, Mao-Hsin Lin, Yen-Hung Lin & Jyh-Ming Liou
Hepatitis Research Center, National Taiwan University Hospital, Taipei, Taiwan
Jia-Horng Kao
Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
Jia-Horng Kao
Cardiovascular Center and Heart Failure Center, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
Jyh-Ming Jimmy Juang
Cardiovascular Center, National Taiwan University Hospital, Taipei, Taiwan
Yen-Hung Lin
Department of Internal Medicine, National Taiwan University Hospital Yunlin branch, Yunlin, Taiwan
Ji-Yuh Lee
Division of Hematology and Oncology, Department of Internal Medicine, Chung Shan Medical University Hospital, Taichung, Taiwan
Hsueh-Ju Lu
School of Medicine, Chung Shan Medical University, Taichung, Taiwan
Hsueh-Ju Lu
Endocrinology and Metabolism, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
Chieh-Hua Lu, Jhih-Syuan Liu & Nain-Feng Chu
General Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
An-Chieh Feng
Department of Dermatology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
Chien-Ping Chiang
Department and Graduate Institute of Biochemistry, National Defense Medical Center, Taipei, Taiwan
Chien-Ping Chiang
Division of Gastroenterology, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
Jung-Chun Lin
Psychiatry, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
Yi-Wei Yeh
Urology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
En Meng
Cardiovascular and Mitochondria Related Disease Research Center, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Hualien, Taiwan
Chih-Yang Huang
Center of General Education, Buddhist Tzu Chi Medical Foundation, Tzu Chi University of Science and Technology, Hualien, Taiwan
Chih-Yang Huang
Center of Stem Cell and Precision Medicine, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Hualien, Taiwan
Chi-Cheng Li
School of Medicine, Tzu Chi University, Hualien, Taiwan
Chi-Cheng Li & Kuei-Ying Su
Department of Hematology and Oncology, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Hualien, Taiwan
Chi-Cheng Li & Tso-Fu Wang
Department of Medicine, College of Medicine, Tzu Chi University, Hualien, Taiwan
Tso-Fu Wang
Buddhist Tzu Chi Stem Cells Center, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Hualien, Taiwan
Tso-Fu Wang
Division of Allergy, Immunology and Rheumatology, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Hualien, Taiwan
Kuei-Ying Su
Department of Ophthalmology, Far Eastern Memorial Hospital, New Taipei City, Taiwan
Jia-Kang Wang
Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan
Jia-Kang Wang & Mei-Hsiu Chen
Department of Medicine, National Taiwan University, Taipei, Taiwan
Jia-Kang Wang
Department of Internal Medicine, Far Eastern Memorial Hospital, New Taipei City, Taiwan
Mei-Hsiu Chen
Department of Biomedical Engineering, Ming Chuan University, Taoyuan, Taiwan
Mei-Hsiu Chen
Division of Endocrinology, Department of Internal Medicine, Far-Eastern Memorial Hospital, Taipei, Taiwan
Hua-Fen Chen
School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
Hua-Fen Chen & Fu-Tien Chiang
Department of Public Health, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
Hua-Fen Chen
Department of Genomic Medicine and Center for Medical Genetics, Changhua Christian Hospital, Changhua, Taiwan
Gwo-Chin Ma & Ting-Yu Chang
Department of Cardiology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
Fu-Tien Chiang
Precision Medicine Center, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
Hsing-Jung Chang
Graduate Institute of Business Administration, College of Management, Fu Jen Catholic University, New Taipei City, Taiwan
Hsing-Jung Chang
Koo Foundation Sun Yat-Sen Cancer Center, Taipei, Taiwan
Kuo-Jang Kao & Chen-Fang Hung
Department of Ophthalmology, Taipei City Hospital, Taipei, Taiwan
Ching-Yao Tsai
Institute of Public Health, National Yang Ming Chiao Tung University, Taipei, Taiwan
Ching-Yao Tsai
Department of Health and Welfare, University of Taipei, Taipei, Taiwan
Ching-Yao Tsai
Division of Gastroenterology and Hepatology, Department of Internal Medicine, Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi, Taiwan
Po-Yueh Chen
Clinical Trial Center, Department of Medical Research, Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi City, Taiwan
Po-Yueh Chen
Fu-Jen Catholic University School of Medicine, New Taipei City, Taiwan
Kochung Tsui
Department of Internal Medicine, Cathay General Hospital, Taipei, Taiwan
Kochung Tsui
Department of Clinical Pathology, Cathay General Hospital, Taipei, Taiwan
Kochung Tsui
Cardiovascular Research Institute, University of California, San Francisco, CA, USA
Pui-Yan Kwok
Institute for Human Genetics, University of California, San Francisco, CA, USA
Pui-Yan Kwok
Department of Dermatology, University of California, San Francisco, CA, USA
Pui-Yan Kwok
Institute of Molecular and Genomic Medicine, National Health Research Institutes, Miaoli, Taiwan
Wayne Huey-Herng Sheu
Division of Endocrinology and Metabolism, Department of Internal Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
Wayne Huey-Herng Sheu
Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan
Shun-Fa Yang
Department of Medical Research, Chung Shan Medical University Hospital, Taichung, Taiwan
Shun-Fa Yang
Division of Gastroenterology and Hepatology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
Jyh-Ming Liou
Department of Internal Medicine, National Taiwan University Cancer Center, Taipei, Taiwan
Jyh-Ming Liou
Division of Colorectal Surgery, Department of Surgery, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
Jaw-Yuan Wang
Department of Radiology, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
Jeng-Fong Chiou
Department of Radiation Oncology, Taipei Medical University Hospital, Taipei, Taiwan
Jeng-Fong Chiou

Authors

Hung-Hsin Chen
View author publications
Search author on:PubMed Google Scholar
Chien-Hsiun Chen
View author publications
Search author on:PubMed Google Scholar
Ming-Chih Hou
View author publications
Search author on:PubMed Google Scholar
Yun-Ching Fu
View author publications
Search author on:PubMed Google Scholar
Ling-Hui Li
View author publications
Search author on:PubMed Google Scholar
Che-Yu Chou
View author publications
Search author on:PubMed Google Scholar
Erh-Chan Yeh
View author publications
Search author on:PubMed Google Scholar
Ming-Fang Tsai
View author publications
Search author on:PubMed Google Scholar
Chun-houh Chen
View author publications
Search author on:PubMed Google Scholar
Hsin-Chou Yang
View author publications
Search author on:PubMed Google Scholar
Yen-Tsung Huang
View author publications
Search author on:PubMed Google Scholar
Yi-Min Liu
View author publications
Search author on:PubMed Google Scholar
Chun-yu Wei
View author publications
Search author on:PubMed Google Scholar
Jen-Ping Su
View author publications
Search author on:PubMed Google Scholar
Wan-Jia Lin
View author publications
Search author on:PubMed Google Scholar
Elin H. F. Wang
View author publications
Search author on:PubMed Google Scholar
Chi-Lu Chiang
View author publications
Search author on:PubMed Google Scholar
Jeng-Kai Jiang
View author publications
Search author on:PubMed Google Scholar
I-Hui Lee
View author publications
Search author on:PubMed Google Scholar
Kung-Hao Liang
View author publications
Search author on:PubMed Google Scholar
Wei-Sheng Chen
View author publications
Search author on:PubMed Google Scholar
Hung-Cheng Tsai
View author publications
Search author on:PubMed Google Scholar
Shih-Yao Lin
View author publications
Search author on:PubMed Google Scholar
Fu-Pang Chang
View author publications
Search author on:PubMed Google Scholar
Hsiang-Ling Ho
View author publications
Search author on:PubMed Google Scholar
Yi-Chen Yeh
View author publications
Search author on:PubMed Google Scholar
Wei-Cheng Tseng
View author publications
Search author on:PubMed Google Scholar
Ming-Hwai Lin
View author publications
Search author on:PubMed Google Scholar
Hsiao-Ting Chang
View author publications
Search author on:PubMed Google Scholar
Ling-Ming Tseng
View author publications
Search author on:PubMed Google Scholar
Wen-Yih Liang
View author publications
Search author on:PubMed Google Scholar
Paul Chih-Hsueh Chen
View author publications
Search author on:PubMed Google Scholar
Yu-Cheng Hsieh
View author publications
Search author on:PubMed Google Scholar
Yi-Ming Chen
View author publications
Search author on:PubMed Google Scholar
Tzu-Hung Hsiao
View author publications
Search author on:PubMed Google Scholar
Ching-Heng Lin
View author publications
Search author on:PubMed Google Scholar
Yen-Ju Chen
View author publications
Search author on:PubMed Google Scholar
I-Chieh Chen
View author publications
Search author on:PubMed Google Scholar
Chien-Lin Mao
View author publications
Search author on:PubMed Google Scholar
Shu-Jung Chang
View author publications
Search author on:PubMed Google Scholar
Yen-Lin Chang
View author publications
Search author on:PubMed Google Scholar
Yi-Ju Liao
View author publications
Search author on:PubMed Google Scholar
Chih-Hung Lai
View author publications
Search author on:PubMed Google Scholar
Wei-Ju Lee
View author publications
Search author on:PubMed Google Scholar
Hsin Tung
View author publications
Search author on:PubMed Google Scholar
Ting-Ting Yen
View author publications
Search author on:PubMed Google Scholar
Hsin-Chien Yen
View author publications
Search author on:PubMed Google Scholar
Ming-Yao Chen
View author publications
Search author on:PubMed Google Scholar
Ying-Chin Lin
View author publications
Search author on:PubMed Google Scholar
Yung-Ta Kao
View author publications
Search author on:PubMed Google Scholar
Bi-Zhen Kao
View author publications
Search author on:PubMed Google Scholar
Jing-Er Lee
View author publications
Search author on:PubMed Google Scholar
Chi-Li Chung
View author publications
Search author on:PubMed Google Scholar
Ju-Chi Liu
View author publications
Search author on:PubMed Google Scholar
Paul Chan
View author publications
Search author on:PubMed Google Scholar
Chang-Hsien Lin
View author publications
Search author on:PubMed Google Scholar
Chia-Hsin Chen
View author publications
Search author on:PubMed Google Scholar
I-Chen Wu
View author publications
Search author on:PubMed Google Scholar
Lung-Chang Lin
View author publications
Search author on:PubMed Google Scholar
Jiunn-Wei Wang
View author publications
Search author on:PubMed Google Scholar
Shen-liang Shih
View author publications
Search author on:PubMed Google Scholar
Sun-Wung Hsieh
View author publications
Search author on:PubMed Google Scholar
Chih-Hsing Hung
View author publications
Search author on:PubMed Google Scholar
Wei-Ming Li
View author publications
Search author on:PubMed Google Scholar
Chih-Jen Yang
View author publications
Search author on:PubMed Google Scholar
Cheng-Shin Yang
View author publications
Search author on:PubMed Google Scholar
Ru-Hui Weng
View author publications
Search author on:PubMed Google Scholar
Yu-Chi Chen
View author publications
Search author on:PubMed Google Scholar
Chun-Ping Chang
View author publications
Search author on:PubMed Google Scholar
Tai-Hsun Wu
View author publications
Search author on:PubMed Google Scholar
Yu-Chang Lin
View author publications
Search author on:PubMed Google Scholar
Yi-Jing Sheen
View author publications
Search author on:PubMed Google Scholar
Shi-Heng Wang
View author publications
Search author on:PubMed Google Scholar
Sye-Pu Chen
View author publications
Search author on:PubMed Google Scholar
Timothy Raben
View author publications
Search author on:PubMed Google Scholar
Erik Widen
View author publications
Search author on:PubMed Google Scholar
Stephen Hsu
View author publications
Search author on:PubMed Google Scholar
Feng-Jen Hsieh
View author publications
Search author on:PubMed Google Scholar
Dong-Ru Ho
View author publications
Search author on:PubMed Google Scholar
Yu-Huei Huang
View author publications
Search author on:PubMed Google Scholar
Chung-Han Yang
View author publications
Search author on:PubMed Google Scholar
Yu-Shu Huang
View author publications
Search author on:PubMed Google Scholar
Yen-Fu Chen
View author publications
Search author on:PubMed Google Scholar
Hsien-Ming Wu
View author publications
Search author on:PubMed Google Scholar
Ping-Han Tsai
View author publications
Search author on:PubMed Google Scholar
Kuan-Gen Huang
View author publications
Search author on:PubMed Google Scholar
Chih-Yen Chien
View author publications
Search author on:PubMed Google Scholar
Yi-Lwun Ho
View author publications
Search author on:PubMed Google Scholar
Ming-Shiang Wu
View author publications
Search author on:PubMed Google Scholar
Jia-Horng Kao
View author publications
Search author on:PubMed Google Scholar
Yen-Bin Liu
View author publications
Search author on:PubMed Google Scholar
Jyh-Ming Jimmy Juang
View author publications
Search author on:PubMed Google Scholar
Mao-Hsin Lin
View author publications
Search author on:PubMed Google Scholar
Yen-Hung Lin
View author publications
Search author on:PubMed Google Scholar
Ji-Yuh Lee
View author publications
Search author on:PubMed Google Scholar
Hsueh-Ju Lu
View author publications
Search author on:PubMed Google Scholar
Chieh-Hua Lu
View author publications
Search author on:PubMed Google Scholar
An-Chieh Feng
View author publications
Search author on:PubMed Google Scholar
Jhih-Syuan Liu
View author publications
Search author on:PubMed Google Scholar
Chien-Ping Chiang
View author publications
Search author on:PubMed Google Scholar
Nain-Feng Chu
View author publications
Search author on:PubMed Google Scholar
Jung-Chun Lin
View author publications
Search author on:PubMed Google Scholar
Yi-Wei Yeh
View author publications
Search author on:PubMed Google Scholar
En Meng
View author publications
Search author on:PubMed Google Scholar
Chih-Yang Huang
View author publications
Search author on:PubMed Google Scholar
Chi-Cheng Li
View author publications
Search author on:PubMed Google Scholar
Tso-Fu Wang
View author publications
Search author on:PubMed Google Scholar
Kuei-Ying Su
View author publications
Search author on:PubMed Google Scholar
Jia-Kang Wang
View author publications
Search author on:PubMed Google Scholar
Mei-Hsiu Chen
View author publications
Search author on:PubMed Google Scholar
Hua-Fen Chen
View author publications
Search author on:PubMed Google Scholar
Gwo-Chin Ma
View author publications
Search author on:PubMed Google Scholar
Ting-Yu Chang
View author publications
Search author on:PubMed Google Scholar
Fu-Tien Chiang
View author publications
Search author on:PubMed Google Scholar
Hsing-Jung Chang
View author publications
Search author on:PubMed Google Scholar
Kuo-Jang Kao
View author publications
Search author on:PubMed Google Scholar
Chen-Fang Hung
View author publications
Search author on:PubMed Google Scholar
Ching-Yao Tsai
View author publications
Search author on:PubMed Google Scholar
Po-Yueh Chen
View author publications
Search author on:PubMed Google Scholar
Kochung Tsui
View author publications
Search author on:PubMed Google Scholar
Yuan-Tsong Chen
View author publications
Search author on:PubMed Google Scholar
Pui-Yan Kwok
View author publications
Search author on:PubMed Google Scholar
Wayne Huey-Herng Sheu
View author publications
Search author on:PubMed Google Scholar
Shun-Fa Yang
View author publications
Search author on:PubMed Google Scholar
Jyh-Ming Liou
View author publications
Search author on:PubMed Google Scholar
Jaw-Yuan Wang
View author publications
Search author on:PubMed Google Scholar
Jeng-Fong Chiou
View author publications
Search author on:PubMed Google Scholar
Jer-Yuarn Wu
View author publications
Search author on:PubMed Google Scholar
Cathy S. J. Fann
View author publications
Search author on:PubMed Google Scholar

Contributions

Supervision: P.-Y.K., W.H.-H.S., S.-F.Y., J.-M.L., J.-Y. Wang, J.-F.C., J.-Y. Wu, C.S.-J.F., M.-C.H., Y.-C.F. and Chun-Houh Chen. Formal analysis: H.-H.C., C.-Y. Chou, J.-P.S., W.-J.L., F.-J.H., E.H.F.W., E.-C.Y., S.-H.W. and S.-P.C. Resources: P.-Y.K., C.-L. Chiang, J.-K.J., I.-H.L., K.-H.L., W.-S.C., H.-C.T., S.-Y.L., F.-P.C., H.-L.H., Y.-C.Y., W.-C.T., Ming-Hwai Lin, H.-T.C., L.-M.T., W.-Y.L., P.C.-H.C., Y.-C.H., Y.-M.C., T.-H.H., Ching-Heng Lin, Y.-J.C., I.-C.C., C.-L.M., S.-J.C., Y.-L.C., Y.-J.L., C.-H. Lai, W.-J.L., H.T., T.-T.Y., H.-C. Yen, M.-Y.C., Ying-Chin Lin, Y.-T.K., B.-Z.K., J.-E.L., C.-L. Chung, J.-C.L., P.C., Chang-Hsien Lin, Chia-Hsin Chen, I.-C.W., L.-C.L., J.-W.W., S.-l.S., S.-W.H., C.-H.H., W.-M.L., C.-J.Y., Y.-T.C., D.-R.H., Y.-H.H., C.-H.Y., Y.-S.H., Y.-F.C., H.-M.W., P.-H.T., K.-G.H., C.-Y. Chien, Y.-L.H., M.-S.W., J.-H.K., Y.-B.L., J.-M.J.J., Mao-Hsin Lin, Y.-H.L., J.-Y.L., H.-J.L., A.-C.F., J.-S.L., C.-P. Chiang, N.-F.C., Y.-W.Y., E.M., C.-Y.H., C.-C.L., T.-F.W., K.-Y.S., J.-K.W., M.-H.C., H.-F.C., G.-C.M., T.-Y.C., F.-T.C., H.-J.C., K.-J.K., C.-F.H., C.-Y.T., P.-Y.C., K.T. and Y.-J.S. Validation (TWB): H.-H.C., C.-Y. Chou, J.-P.S., W.-J.L., F.-J.H., E.H.F.W., E.-C.Y. and J.-Y. Wu. Validation (UKB and All of Us): T.R., E.W. and S.H. Data curation: M.-F.T., T.-H.W. and Yu-Chang Lin. Writing—original draft: H.-H.C., C.S.-J.F., L.-H.L., C.-Y. Chou, J.-P.S., W.-J.L., E.H.F.W. and E.-C.Y. Writing—review and editing: P.-Y.K., E.-C.Y., M.-F.T., Chun-Houh Chen, H.-C. Yang, Y.-T.H., Chien-Hsiun Chen, C.-y.W., H.-H.C., C.S.-J.F. and L.-H.L. Investigation: L.-H.L., R.-H.W., Y.-C.C. and C.-P. Chang. Project administration: Y.-M.L. and C.-S.Y. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Hung-Hsin Chen or Cathy S. J. Fann.

Ethics declarations

Competing interests

S.H. is a founder, shareholder and serves on the Board of Directors of Genomic Prediction, Inc. (GP) and E.W. is an employee and shareholder of GP. The other authors declare no competing interests.

Peer review

Peer review information

Nature thanks Bogdan Pasaniuc and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Scatter plots of the case proportion for dichotomized phenotypes by disease categories.

The significantly positive correlations are observed across disease categories between the case proportion in TPMI (x-axis) and the 5-year prevalence in NHIRD (y-axis), except congenital anomalies.

Extended Data Fig. 2 Comparison of TPMI GWAS-identified loci to the previously published GWAS.

The replication rates of TPMI GWAS-identified loci when compared to previously reported loci from the GWAS catalog are presented in this bar chart. Red bars indicate the comparison of TPMI findings to that of all ancestries and blue bars represent the comparison to East Asian ancestries. The categories of diseases are shown under the bars.

Extended Data Fig. 3 The Manhattan plot of GWAS for viral hepatitis B in TPMI.

The names of nearest mapped gene were labeled for the independent GWAS significant loci.

Extended Data Fig. 4 Genetic correlation heatmap for all heritable traits.

Heatmap showing genetic correlations among heritable traits. Genetic correlations were estimated using LDSC, with colors representing the correlation coefficients between traits. The weighted pair group method with arithmetic mean (WPGMA) was used for clustering with the correlation coefficient as distance between traits.

Extended Data Fig. 5 The scatter plot of performance for PRS developed by different tools.

The color represents the phecode category, and the shape indicates the PRS development tool: Lassosum2 (circle), LDpred2 (triangle), MegaPRS (square), PRS-CS (cross), and SbayesR (square cross). Each phecode is positioned at a unique x-coordinate, with the tool that has the highest AUC highlighted.

Extended Data Fig. 6 The bar chart and dot plot for PRS performance.

Bar and dot plot showing PRS explained variance (r²) and SNP-heritability for dichotomous traits. Gray bars indicate SNP-heritability (estimated from TPMI GWAS unrelated set [n = 248,754] with LDSC), and the colored bar chart presents the r² values, indicating the proportion of variance explained by the PRS among TPMI validation set (n = 20,000), and dots and error bars show Area Under the receiver operating characteristic Curve (AUC) with 95% confidence interval. An asterisk (*) indicates estimates considering the MHC region.

Extended Data Fig. 7 External validation of PRS models in Taiwan Biobank and other cohorts.

PRS performance is presented as Area Under the receiver operating characteristic Curve (AUC) with 95% confidence inverval in TPMI (orange, TPMI validation set, n = 20,000), Taiwan Biobank (green, n = 88,628), East Asians in UK Biobank (blue, n = 1,572), and East Asians in All of Us (purple, n = 6,895). Circles represent TPMI-derived PRS, and triangles indicate UKB (European)-derived PRS models. Only the estimates with case size > 30 were showed on the figure.

Extended Data Fig. 8 External validation of TPMI-derived PRS model and TPMI-UKB cross-population PRS models across populations.

The plots show the Area Under the receiver operating characteristic Curve (AUC) with 95% confidence inverval for PRS validation in East Asian (red, n = 88,628 in TWB; 1,572 in UKB; 6,895 in All of Us), European (olive green, n = 472,869 in UKB and 152,754 in All of Us), African (green, n = 8,074 in UKB and 60,964 in All of Us), South Asian (blue, n = 9,893 in UKB and 2,334 in All of Us), and Admixed American (purple, n = 32,394 in All of Us) populations from TPMI (circle), UKB (triangle), and All of Us (square) cohorts.

Extended Data Table 1 Proportion of overall health indices explained by genetic risk

Full size table

Supplementary information

Supplementary Information (download DOCX )

Detailed methods of genotyping and phenotyping quality control and discussion of SNP-based heritability for quantitative traits, including Supplementary Figs. 1–20 and Tables 17–20.

Reporting Summary (download PDF )

Supplementary Tables (download XLSX )

Supplementary Tables 1–16.

Peer Review File (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, HH., Chen, CH., Hou, MC. et al. Population-specific polygenic risk scores for people of Han Chinese ancestry. Nature 648, 128–137 (2025). https://doi.org/10.1038/s41586-025-09350-y

Download citation

Received: 14 October 2024
Accepted: 02 July 2025
Published: 15 October 2025
Version of record: 15 October 2025
Issue date: 04 December 2025
DOI: https://doi.org/10.1038/s41586-025-09350-y

This article is cited by

sc-eQTL unveil immunogenetic architecture of polycystic ovary syndrome
- Xiaoqian Xu
- Yuzhou Bao
- Hao Wang
Scientific Reports (2026)
The Taiwan Precision Medicine Initiative provides a cohort for large-scale studies
- Hsin-Chou Yang
- Pui-Yan Kwok
- Jer-Yuarn Wu
Nature (2025)