Population-scale genomic medicine with the Hong Kong Genome Project

Ying, Dingge; Cheung, Ching-Lung; O, Chun-Kwan; Lam, Wai Kei Jacky; Au Yeung, Shiu Lun; Lau, Chak Sing; Luk, Ho Ming; Leung, Christopher Kai Shun; Tse, Desiree Man Sik; Liu, James Si Chai; Hue, Shirley Pik Ying; Kwok, Jamie Sui Lam; Yeung, Denis Long Him; Preusch, Christopher Brandon; Ma, Wei; Tang, Wenshu; Tong, Amy Hin Yan; Au, Lisa Wing Chi; Chan, Juliana Chung-Ngor; Chan, Yap-Hang; Cheng, Shirley Sze Wing; Chong, Shuk Ching; Fung, Cheuk Wing; Ho, Stephanie; Krishnamoorthy, Suhas; Leung, Gabriel Matthew; Li, Philip Hei; Li, Qing; Loong, Herbert Ho-Fung; Lui, Rashid Nok Shun; Luo, Shan; Ma, Becky Mingyao; Ma, Ronald Ching Wan; Na, Rong; Tan, Kathryn Choon Beng; Wong, Sheila Suet-Na; Lo, Su-Vui; Chu, Annie Tsz Wai; Chung, Brian Hon Yin

doi:10.1038/s41591-026-04410-w

Download PDF

Article
Open access
Published: 15 May 2026

Population-scale genomic medicine with the Hong Kong Genome Project

Nature Medicine (2026) Cite this article

9103 Accesses
14 Altmetric
Metrics details

Subjects

Abstract

The Hong Kong Genome Project (HKGP) aims to build a foundational resource for precision medicine in the Chinese population through large-scale genome sequencing and integrated analyses. Here we report findings from over 20,000 HKGP participants across two cohorts: a rare disease cohort including 2,227 patients with suspected genetic diseases and a population cohort including 18,261 participants undergoing genomic screening for medically actionable findings. The rare disease cohort achieved a diagnostic rate of 25%. When benchmarked against panels designed for European ancestries, the analysis revealed that 3.7% of the individuals in the population cohort had pathogenic or likely pathogenic variants associated with dominant disorders. While 48% of individuals were found to carry recessive disorder genes in the gene list based upon European ancestries, our analysis revealed that 38 additional clinically important genes would have been overlooked in the Chinese population. Pharmacogenomic analysis demonstrated that nearly all participants harbored at least one actionable phenotype, potentially informing nearly one million annual prescriptions in Hong Kong. The ongoing HKGP establishes a curated Hong Kong Chinese reference for clinically relevant genetic variation and serves as a blueprint for the implementation of precision medicine in underrepresented populations.

Population-specific polygenic risk scores for people of Han Chinese ancestry

Article Open access 15 October 2025

Genomic health data generation in the UK: a 360 view

Article Open access 19 October 2021

Geno4ME Study: implementation of whole genome sequencing for population screening in a large healthcare system

Article Open access 01 July 2025

Main

With rapid genomic advances, large-scale genomic projects and global initiatives have prioritized genetic etiologies through two complementary approaches: diagnostic focused, particularly for rare diseases that affect 300 million individuals worldwide, and comprehensive precision medicine platforms¹.

For diagnostic purposes, the 100,000 Genomes Project in the UK achieved a diagnostic yield of 25% for rare diseases prior to National Health Service (NHS) genome sequencing (GS) implementation² and expanded to cancer and pharmacogenomics studies^3,4. Australian Genomics primarily focused on evidence generation that subsequently informed in policy and practice to improve the equitable access to diagnostic testing⁵. In the United States, the All of Us Research Program exemplifies a comprehensive precision medicine approach by building a one-million-participant diverse genomic database to investigate genetic risk and enable applications, including pharmacogenomics⁶.

Regional genome projects, such as Singapore’s National Precision Medicine Program (NPM), Japan’s Initiative on Rare and Undiagnosed Diseases (IRUD) and Korea’s Genetic Diagnosis Program for Rare Disease (KGDP), discovered enrichment of genetic disease and population-specific founder variants identified through GS technologies, significantly improving rare disease diagnosis for Asian populations while enriching global genomic resources^7,8,9. These efforts highlight the importance of delineating ethnicity-specific allele frequencies for genetic variants.

Despite these, Chinese populations are underrepresented in genomic research^7,10,11,12. Prominent international genomic databases, such as the Genome Aggregation Database (gnomAD), are predominantly based on European populations^13,14. Pathogenic variant enrichment in major populations can bias screening while variants more prevalent in underrepresented populations risk being overlooked or misclassified, leading to diagnostic delays, unnecessary testing and worsened health disparities. It limits the applicability of international genetic guidelines, including those of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Pharmacogenetics Implementation Consortium (CPIC), to Chinese and other non-European populations^15,16,17.

Led by the Hong Kong Genome Institute (HKGI), the HKGP addresses this gap by establishing a population-specific genomic resource for the Chinese population. It aims to improve the diagnosis and management of rare diseases, tumor syndromes and other diseases by collaborating with key stakeholders to integrate genomics into medical practice, foster research and build genomic capacity, thereby laying a foundation for world-class genomic research and widespread adoption of genomic medicine in Hong Kong.

The pilot phase of the HKGP on rare diseases with short-read GS, involving 520 probands, achieved a 24% diagnostic yield—similar to the UK’s 100,000 Genomes Project^18,19. The ongoing main phase aims to include 100,000–120,000 genomes by 2030. By encompassing clinical applications of GS on diagnostics and beyond, this HKGP flagship study presents the project’s initial comprehensive pipeline and core findings. The integrated results derived from the project’s data involve key components and pursue four core aims: (1) characterizing population-wide diagnostic genomic variation in Hong Kong; (2) enabling early intervention and preventive care for asymptomatic individuals through an analysis of pathogenic or likely pathogenic (P/LP) variants in genes associated with dominant disorders, such as tumor syndromes and cardiovascular diseases; (3) informing reproductive planning through an analysis of carrier frequencies for recessive genetic disorders; and (4) optimizing treatment efficacy, reducing adverse drug reactions and improving therapeutic outcomes through pharmacogenomic profiling.

Results

Study cohort overview

To address the four aims outlined above, we structured our analyses around two complementary cohorts drawn from the 24,112 participants recruited and sequenced by the HKGP between July 2021 and November 2024. The diagnostic cohort (n = 2,227) comprises probands who had completed phenotype-guided diagnostic analysis, supporting personalized genetic diagnosis for individuals with suspected genetic conditions. The HKGP Chinese cohort (n = 18,261) comprises unrelated individuals of Chinese ancestry, selected through stringent relatedness and ethnicity filtering, to enable genotype-driven analyses of clinically actionable findings, including dominant disorder risks, recessive carrier burdens and pharmacogenomic variation.

The diagnostic cohort for phenotype-guided genetic diagnosis

Of the HKGP participants, nearly 50% were probands—individuals firstly identified in their families as having a genetic condition—requiring personalized genetic diagnosis. Genetic diagnosis had been completed for 2,227 probands (904 singletons and 1,323 probands from various family structures), including the 520 probands enrolled in the pilot phase¹⁸, constituting the diagnostic cohort summarized in this study. The cohort had a balanced sex distribution and represented a wide range of age groups (<18 years: 37.6%; 18–60 years: 42.6%; >60 years: 19.8%). Most participants were Chinese (95.0%), with the minority being mixed Chinese or other ethnicities (Extended Data Table 1).

Determinants of diagnostic yields

Comprehensive variant detection and curation identified positive genetic diagnoses for 553 out of the 2,227 probands (24.8%), consistent with the pilot phase¹⁸. The diagnostic yields varied across disease categories (4.65−56.8%; Fig. 1a). Thirteen probands received multiple diagnosis, with P/LP variants in two genes explaining distinct phenotypes in the same individual (Supplementary Table 1). Subgroup analyses with χ² test revealed that non-singleton probands presented a higher yield (26.4%) than singletons (22.6%) (P = 0.041), highlighting the value of sequencing family members to enhance variant interpretation. The subgroup with more than eight Human Phenotype Ontology (HPO) terms had a significantly higher yield (27.1%) than that with fewer terms (22.6%) (P = 0.014). The diagnostic yields were slightly higher in adult probands and in those with previous genetic testing (Extended Data Table 2).

**Fig. 1: Summary of the findings from the diagnostic cohort (n = 2,227).**

Variants in the diagnostic cohort

Among the 553 probands with positive genetic diagnoses, a total of 572 unique P/LP variants were identified through ACMG guideline-based curation. Of these, 486 (85.0%) were single-nucleotide variants (SNVs) or small insertions and deletions (indels) across 350 genes, whereas 86 (15.0%) were copy number variants (CNVs), structural variants (SVs) or short tandem repeats (STRs), spanning 30 genes. Among the SNVs/indels, nonsense (22.4%), frameshift (22.4%) and missense (41.6%) were common. The CNVs/SVs/STRs were predominantly deletions (58.1%) and repeat expansions (19.8%).

Of the identified P/LP SNVs/indels, 31.1% were novel, and 68.9% had been previously reported in ClinVar. Sixty-two (12.7%) variants labeled as uncertain significance or conflicting pathogenicity in ClinVar were reclassified as P/LP. Of the CNV/SV/STR variants, 57.6% were novel, and 42.4% had been reported in ClinVar, in GeneReviews (www.ncbi.nlm.nih.gov/books/NBK1116/) or in other literature (Table 1 and Supplementary Table 2) as P/LP²⁰.

Table 1 Summary of variant classifications for different variant types in the HKGP

Full size table

Potential clinical management

Among the probands with a positive diagnosis in the diagnostic cohort, GS ended the average diagnostic odyssey of 13 years. To assess the clinical utility of the diagnoses, we classified the potential changes in clinical management into seven categories. GS provided diagnoses that altered the clinical trajectory of 488 (88.2%) probands, reducing the diagnostic burden on probands and families through potential clinical management. Specifically, the minimal need for additional testing (two probands, 0.367%) confirmed GS as the penultimate diagnostic tool (Fig. 1b and Supplementary Table 2).

A Chinese-specific reference cohort for clinically actionable findings

The HKGP Chinese cohort includes 11,362 asymptomatic and symptomatic singletons (partially overlapped with the diagnostic cohort) and 6,899 parents. The demographic and clinical characteristics of this cohort were relatively consistent with the diagnostic cohort, with intentional differences in health status and ethnicity by design (Extended Data Table 1). The HKGP Chinese cohort not only supports genotype-driven analyses of clinically actionable variants but also represents the allele frequency landscape of the Hong Kong Chinese population.

Variants in dominant disorder-related genes

Using the HKGP Chinese cohort, we investigated 73 dominant disorder (autosomal and X-linked) ACMG secondary finding genes (version 3.2)¹⁶ to assess GS utility beyond the primary indication for testing. We excluded participants with related phenotypes, resulting in 17,949 participants analyzed. Among these, 670 individuals (3.73%) carried at least one P/LP variant across 54 genes, with 20 participants carrying two or more (Supplementary Tables 3 and 4).

A total of 373 unique P/LP variants were identified in this analysis (Table 1). BRCA2 and TTN were enriched predominantly in nonsense and frameshift P/LP variants, whereas LDLR, SCN5A and MYH7 were enriched in missense P/LP variants (Fig. 2a). Among the 361 identified P/LP SNVs/indels, 25.8% were novel, whereas 74.2% had been previously reported in ClinVar. Seventy-five (20.8%) ClinVar non-P/LP variants were reclassified as P/LP in this study following the ACMG guidelines. Specifically, 25.3% were null variants meeting PVS1 by manual verification²¹; 64.0% were missense variants supported by high prediction scores (PP3_Strong)²²; and 10.7% were upgraded via additional evidence from the literature or databases (Supplementary Table 5 and Extended Data Fig. 1).

**Fig. 2: Summary of the findings in dominant genes in the HKGP Chinese cohort (n = 17,949).**

Variant prevalence in dominant disorder-related genes

To quantify the population burden of these clinically actionable variants, we calculated the gene carrier frequency (GCF) for each ACMG secondary finding gene among the 17,949 participants²³, reflecting the prevalence of individuals carrying at least one P/LP transmissible variant in a gene regardless of their own symptomatic status. Consistent with reports from other populations, cancer-related genes (BRCA2, PALB2, BRCA1 and MSH6) had high GCFs. Cardiovascular genes (SCN5A, TTN, LDLR, DSG2 and APOB) presented GCFs markedly higher than in other continental populations^24,25,26. Overall, the cumulative gene carrier frequency (cGCF) for cardiovascular genes (2.48%) was higher than cancer-related genes (1.23%) and others (0.18%) (Fig. 2b and Supplementary Table 6); slightly higher than that in another East Asian population (Korea, 2.17%); and markedly higher than that in European (Icelandic, 1.80%) and African (US based, 0.66%) populations (Fig. 2c). Notably, the enrichment of P/LP variants in SCN5A (long QT syndrome type 3, Brugada syndrome and dilated cardiomyopathy) and DSG2 (arrhythmogenic right ventricular cardiomyopathy) in our cohort was not previously reported (Fig. 2d). These differences in Chinese populations can guide gene prioritization for early disease risk detection panels designed for this population.

Carrier burden in recessive disorder-related genes

We evaluated carrier burden from autosomal and X-linked recessive genes using the ACMG pan-ethnic tier 1−3 carrier screening (CS) gene list (105 genes)^15,27. This tiered framework—tier 1 covering universally recommended conditions; tier 2, carrier frequency ≥1/100 and moderate to severe phenotypes; and tier 3, carrier frequency ≥1/200, including autosomal recessive and X-linked genes—was developed to guide equitable and comprehensive genetic screening across diverse populations, regardless of ancestry. Among 18,065 participants whose children did not have any primary indication phenotypes linked to these genes, 8,693 individuals (48.1%) carried at least one P/LP variant in 105 ACMG CS genes (Extended Data Table 3 and Supplementary Table 7). Across these ACMG CS genes, we identified 1,235 unique P/LP variants spanning 98 genes, of which 1,170 (94.7%) were SNVs or indels, and the remaining 65 variants (5.26%) were CNVs/SVs/STRs (Table 1 and Supplementary Tables 3 and 4).

To translate carrier frequencies into reproductive risk, we estimated the at-risk couple frequency (ACF)²³. Using a random mating approach based on these participants’ P/LP carrier status⁷, we modeled 163 million theoretical pairings. The virtual cumulative at-risk couple frequency (cACF) for ACMG CS genes was 6.60%, dominated by GJB2 (5.01%). Validation in 2,864 actual couples yielded a cACF of 6.77% (also dominated by GJB2, 5.17%), closely aligned with the virtual cACF, confirming the reliability of our modeling (Supplementary Table 7 and Extended Data Fig. 2).

To further investigate whether the ACMG CS gene list is optimal for the Chinese population, we classified the genes in the HKGP Chinese cohort on the basis of its pan-ethnic tiering framework. Only 19 ACMG CS genes exceeded the 1/200 carrier frequency threshold, contributing to a cGCF of 0.62 for 48.1% of carriers (Fig. 3a). HBA1/HBA2 (thalassemia) dominated tier 1 genes. East Asian populations, including HKGP Chinese, showed a relatively high GCF for GJB2 (Fig. 3b,c); they exhibited lower cGCF for ACMG CS tier 2 and tier 3 genes compared to Europeans when GJB2 was excluded from the analysis (Supplementary Table 8).

**Fig. 3: Summary of the findings for recessive genes in the HKGP Chinese cohort (n = 18,065).**

CS gene re-tiering for Chinese

To increase the CS efficiency in the Chinese population, we included an addition of 1,354 recessive genes from ‘Mackenzie’s Mission’²⁷ and other CS panels and re-tiered these CS genes using Chinese-specific GCF data with 1/200 threshold per ACMG framework (designated as HKGP tiers). This resulted in the addition of 38 genes in HKGP tier 2/3, and 82 of the ACMG CS genes were excluded (Supplementary Table 7).

Despite the inclusion of many non-ACMG CS genes with high GCFs in the HKGP Chinese cohort but low GCFs in non-Finnish Europeans (Fig. 3d), the total number of genes decreased from 105 in the ACMG tiers to 61 in the HKGP tiers; the cGCF increased from 0.62 to 1.06, resulting in an increased number of carriers from 8,693 (48.1%) to 11,820 (65.4%) and the cACF from 6.60% to 15.4% (Fig. 3e,f). Similarly, the cGCF for East Asians also increased using HKGP tiers, reflecting a shared genetic background between HKGP participants and East Asian populations (Fig. 3e). By contrast, European profiles showed greater similarity to African¹³ (Supplementary Table 7 and Extended Data Fig. 3). These findings revealed that the pan-ethnic ACMG CS gene list was inadequate for the Chinese population, highlighting an underdetection risk when the unmodified ACMG guidelines were adopted. The HKGP Chinese-specific gene list resolves this gap and indicates broader East Asian relevance through the increased cGCF.

Functional alleles in pharmacogenomic profiling

To analyze the pharmacogenomics—how genetic variation influences drug response—in the Chinese population, we analyzed 25 Clinical Annotation Level 1 A/B pharmacogenes in the Pharmacogenomics Knowledge Base (PharmGKB)²⁸, representing highest-evidence tiers for variant−drug associations among all individuals within the HKGP Chinese cohort after excluding four with phenotype-linked bias. We identified 157 altered-function alleles, defined as those with functional differences compared with the recommendations in the CPIC guidelines¹⁷ across 23 pharmacogenes (Supplementary Table 9). Gene-level comparisons with CPIC population with maximum sample size revealed significant differences in altered-function allele frequencies, with five genes showing GCF > 0.05 and two showing GCF ≤ 0.05 (Fig. 4a). Specifically, the frequency of altered-function alleles of ABCG2 in our cohort was 31.8%, higher than in other populations, primarily due to an allele with decreased function (rs2231142-T).

**Fig. 4: Summary of the findings for pharmacogenes in the HKGP Chinese cohort (n = 18,257).**

Each participant carried an average of 8.78 altered-function alleles. Thirty-nine alleles had a frequency exceeding 0.01, with 17 alleles exceeding 0.10 across 11 pharmacogenes (Supplementary Table 9). Six pharmacogenes (ABCG2, CYP2B6, UGT1A1, HLA-B, SLCO1B1and CYP2D6) exhibited altered-function alleles with frequencies exceeding 0.10 by gene in our cohort, which were not included in the Association for Molecular Pathology (AMP) reportable list²⁹. Their frequencies suggest the need for further investigation to determine their potential for reporting under AMP guidelines (for example, CYP2D6*10 + CYP2D6*36 with an allele frequency of 0.36; Fig. 4d).

Actionable pharmacogenomic phenotypes

Examining metabolic phenotypes from altered-function alleles is a key step toward identifying actionable insights into drug response. Except for CACNA1S and CFTR, all pharmacogenes had detectable actionable phenotypes, with 14 having actionable phenotype frequencies above 0.10. At least one actionable phenotype was found in 99.98% of the individuals (mean: 5.20 actionable phenotypes per individual; Fig. 4b,c and Supplementary Table 10). These comprise 2.79 ‘therapeutic management’ actionable phenotypes, 1.07 ‘impact-on-safety’ actionable phenotypes and 1.07 ‘impact-on-pharmacokinetic’ actionable phenotypes, categorized by the US Food and Drug Administration (FDA). This high prevalence was largely driven by the variants in VKORC1, which affect warfarin sensitivity and the risk of over-anticoagulation and are known to be highly prevalent in Chinese and other Asian populations⁷. CYP2C19 exhibited high frequencies for two AMP tier 1 no-function alleles (CYP2C19*2: 31.59%; CYP2C19*3: 4.96%), contributing to high actionable phenotypes that require therapeutic changes based on FDA guidelines (Supplementary Table 11).

To assess the potential clinical impact of these findings, we analyzed prescription data for the most prescribed drugs in 2023−2024 from the Hospital Authority, a statutory body that manages all public hospitals in Hong Kong (Supplementary Table 12). Among the top 20 drugs, 12 had guidelines (covering seven pharmacogenes), with pharmacogenomic testing potentially informing nearly one million (903,299/2,936,806, 30.8%) annual prescriptions, mainly for dosage adjustment and alternative therapy (Table 2, Fig. 4e and Supplementary Table 12). Expanding to the top 50 drugs, 13 had FDA-recognized gene−drug interactions, and 16 carried clinically important labels. These findings highlight the opportunity to enhance prescribing practices and improve clinical care, with further research to substantiate their clinical utility.

Table 2 Top prescribed drugs with actionable pharmacogenomic phenotypes in Hong Kong (2024)

Full size table

To evaluate potentially deleterious novel pharmacogenetic variants, we analyzed putative protein-disrupting variants in nine pharmacogenes with known loss-of-function (LoF) mechanism. A total of 108 variants were detected in eight genes from 340 (1.86%) individuals. Whereas 88 (81.5%) variants were absent in gnomAD, 81 (75.0%) were unique to single individuals, suggesting a high degree of individual specificity. Notably, 70 (64.8%) variants in DPYD, SLCO1B1 and G6PD may harbor a particularly high burden of novel pharmacogenetic variants (Supplementary Table 13). The high prevalence of rare risk and putative protein-disrupting variants in pharmacogenes underscores the need for GS in pharmacogenetic testing, as genotyping may miss or misidentify them.

Discussion

This study provides a large-scale, integrated genomic analysis specific to the Hong Kong Chinese population^20,21,22. Our findings offer guidance for local clinical practice and genetic testing protocols. By establishing a population variant baseline and evaluating clinically actionable genes utility, we fill a major gap in Asian genomic diversity, enabling tailored implementation of diagnostics, screening and pharmacogenomics. Moreover, the comprehensive methodologies and collaborative framework established can serve as a blueprint for other projects to help the development of population-specific genomic resources worldwide.

Previous Chinese precision medicine initiatives, including the Taiwan Precision Medicine Initiative, the China Kadoorie Biobank and pharmacogenomic studies in China, have advanced understanding of common genetic variation and pharmacogenomics, primarily focusing on chronic diseases and drug response using SNP arrays or low-depth GS^30,31,32. The HKGP complements these efforts by employing high-depth GS, enabling the study of rare diseases and the identification of novel, complex and structural variants. Together, these initiatives play a vital role in building a comprehensive foundation for precision medicine, with HKGP addressing an important gap by focusing on rare diseases. This flagship study marks a key HKGP milestone, having integrated multidomain genomic analyses through comprehensive GS of more than 20,000 participants.

From our short-read GS biobank and linked phenotypic data, we reveal a 25% diagnostic yield. Consistent with our pilot study and major genome projects², this study demonstrated the scalability of a clinical GS pipeline¹⁸. Unlike other Asian genome initiatives^8,9 using targeted approaches, our comprehensive GS with standardized and internationally aligned protocols improves technically challenging variant detection—constituting 15% of P/LP variants in our cohort—that targeted approaches may overlook³³. To share these findings, including several recurrent founder mutations (Supplementary Table 14), we are establishing gene/variant directories and partnership with the Hospital Authority to integrate HKGP’s GS into clinical genomic testing workflows, mirroring the impactful Genomics England−NHS model. Three years after the pilot, HKGP stands at a critical juncture in genomic findings disclosure. Although our current protocol returns only primary findings³⁴, this study serves as the initial step toward broader return options (Supplementary Figs. 1−3).

Beyond its diagnostic applications, we have established a foundational precision medicine resource for the Hong Kong Chinese population. Our local genomic database supports strategic screening programs for dominant genetic disorders and is already being operationalized within the public healthcare system¹⁶. Although cancer-related mutation burdens have been found to be consistent with other populations, this aggregate masks critical subtype disparities. Lynch syndrome (MLH1, MSH2, MSH6 and PMS2) demonstrated a substantial local burden, approaching half that of BRCA1/BRCA2-associated cancers, and is relatively more prevalent in our local population than in Europe (Supplementary Table 6). These findings highlight the underdiagnosis of Lynch syndrome and the need to optimize population-based genetic testing, especially for individuals with a family history³⁵. Although Hong Kong has established clear genetic BRCA1/BRCA2 testing criteria (https://www.chp.gov.hk/files/pdf/breast_cancer_professional_hp.pdf), Lynch syndrome screening remains underdeveloped, warranting strategic review and implementation to improve cancer prevention.

Unlike other populations, nearly half of our individuals at risk of dominant disorders were from cardiovascular function-associated genes—that is, cardiomyopathy (TTN and DSG2), arrhythmia (SCN5A) and hyperlipidemia (LDLR and APOB). This, alongside the absence of P/LP variants in over one-quarter of the ACMG secondary finding genes, necessitates prioritized cardiovascular screening and resource allocation. Given that heart diseases were the third leading cause of deaths in Hong Kong (https://www.chp.gov.hk/en/healthtopics/content/25/57.html), our findings urge policy shifts, including adult cardiology genetic testing for sudden death risks (SCN5A and DSG2), pediatric cardiology expansion beyond congenital disorders and population screening for hyperlipidemia genes (LDLR and APOB) (Extended Data Fig. 4). As for familial hypercholesterolemia, current local genetic testing includes PCSK9, LDLR and APOB³⁶, whereas our data revealed minimal PCSK9 variants in our Chinese population. To optimize resource utilization, we recommend refocusing genetic testing on high-yield genes (LDLR and APOB) to improve familial hypercholesterolemia management efficiency. Our findings support policy development to refine cardiac genetic services and implement screening pilots for high-burden conditions, advancing HKGP’s objectives of enhancing personalized disease risk prediction with Chinese-specific genomic resources.

Nearly half of HKGP participants carried P/LP variants from the ACMG pan-ethnic CS gene list, translating to an estimate of one in 16 couples at risk of having offspring affected by recessive disorders. Consistent with previous studies^37,38, the risk coverage for ACMG CS genes was significantly lower in the HKGP Chinese cohort than in the European cohort (Supplementary Table 12). Clinically significant recessive conditions with high ACFs, such as CD36-associated bleeding disorders and FLG-linked ichthyosis vulgaris, were not properly covered. To address these patterns, we re-tiered CS genes using the ACMG guidelines weighted by HKGP Chinese carrier frequencies. This doubled the risk coverage from 6.6% to 15.4%, translating to 3,229 at-risk pregnancies annually in Hong Kong, with 42% fewer genes screened. Specifically, 14 of the 38 additionally included genes are associated with metabolic disorders. Although the newborn screening program in Hong Kong typically prioritizes such conditions, it includes only seven genes (Supplementary Table 7). Expanding coverage to include all 14 genes would enable early intervention potentially for approximately 130 at-risk pregnancies annually. Other than metabolic disorders, some conditions were overlooked with the current screening panels. Omissions of the related genes can lead to potentially devastating consequences, including immediate mortality risks (GALC, F7and C6), irreversible disability (CAPN3and TH), chronic debilitation (DNAH11, LAMA3and SPINK5) and quality-of-life impacts (PRKRAand EDA). In addition, we further identified a residual ‘long tail’ risk from genes not covered in Chinese CS tiers 1−3 and undetectable by conventional panels alone. These findings from re-tiering support updating the population-specific CS gene list to improve efficiency with fewer resources, underscoring the role of HKGP in guiding population-optimized reproductive genomic policy.

In characterizing pharmacogenomic alleles, GS demonstrated superior performance, especially for genes such as CYP2D6, where CNVs and SVs significantly impact function. This enhanced resolution revealed frequency disparities in clinically consequential pharmacogenetic alleles. The HKGP dataset provided validation for the AMP recommendations, exemplified by CYP2C19, where AMP tier 1 alleles accounted for 98.41% of functional alterations, whereas tier 2 burden (0.57%) was largely driven by the Chinese-specific CYP2C19*37 allele (0.38%). Although relatively rare, its potential clinical relevance is amplified by the high volume of prescriptions for drugs metabolized by CYP2C19, including sertraline, clopidogrel and lansoprazole.

Locally prevalent variants, shaped by population-specific prescribing patterns, can prevent adverse drug reactions and improve therapeutic outcomes for patients, when incorporated into local guidelines. Our HKGP resource addresses potential clinical population-specific needs, as we revealed that 30.8% (0.9 million) of common annual prescriptions involve drugs with pharmacogenetically actionable phenotypes (Supplementary Table 12). Only three of the 20 most prescribed drugs carry FDA-recognized gene−drug interactions with therapeutic management recommendations, and 16 of the top 50 drugs carry FDA clinically important labels; this regulatory threshold represents a floor, not a ceiling, for clinically actionable pharmacogenomics.

Although this study provides foundational genomic insights for Hong Kong, its generalizability may be limited by the small sample size. To ensure the robustness of future research findings, we will expand our biobank diversity to enhance population representativeness. The present study relies on short-read sequencing, which has recognized limitations in detecting SVs and resolving repetitive genomic regions. Building upon our existing workflows and internationally standardized variant interpretation¹⁹, we plan to integrate advanced technologies such as long-read sequencing and multiomics approaches^39,40 to overcome these challenges, alongside continued advances in bioinformatics¹⁸, and to improve efficiency and diagnostic power⁴¹. For translation of clinical actionability, HKGI will continue collaborating with the Hospital Authority and international research initiatives to develop interoperable clinical decision supports and pragmatic trials to validate their utility in primary care. For pharmacogenomics, additional work, such as the PREPARE clinical trial⁴², is needed to link population-specific variants to clinically meaningful drug response or adverse effect profiles (for example, effect sizes, penetrance and outcomes in real-world care) before they can be considered for guideline implementation. Moreover, newborn screening studies will be implemented to validate risk estimates for recessive disorders. These efforts establish the project as a key resource for precision medicine.

The HKGP represents a paradigm shift from reactive symptom management to proactive health preservation by delivering precision-guided clinical applications, including optimized screening, prevention, therapeutic strategies and reproductive pathways. This study provides a large-scale genomic analysis tailored to the underrepresented Chinese population of Hong Kong, building the scientific foundation to redesign clinical services around predictive risk profiling. These insights drive lifecourse-optimized personalized care, cross-generational planning and systemic healthcare evolution through evidence-based policy, positioning Hong Kong at the vanguard of precision medicine—where genomics underpins clinical decision-making, public health strategy and societal wellbeing. As we expand our efforts, this study serves as both a foundation and a bridge to future genomic medicine advancements in Hong Kong and beyond.

Methods

Ethical approval for the HKGP and this study was granted by the central institutional review board (IRB) (HKGP-2021-001 and HKGP-2022-001) and the IRBs of the Department of Health (L/M257/2021), the Joint Chinese University of Hong Kong/New Territories East Cluster (2021.423 and 2023.120) and the University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 21–413 and UW 23–289).

Participants

For the HKGP, both asymptomatic individuals and symptomatic probands suspected of having a genetic disease were prospectively identified and recruited across a range of medical specialities at the three partnering centers of the HKGI. All participants received pretest genetic counseling and provided informed written consent following the unique three-tier consent and assent model designed by the HKGI⁴³. As described in our pilot study, detailed phenotype information, including family history and symptom onset, was collected and recorded using HPO terms¹⁸.

HKGP participants whose samples were subjected to GS, variant calling and classification before November 2024 were included in this study¹⁸. Probands with suspected genetic disease(s), together with their family members, and who had finished genetic diagnosis were included in the diagnostic cohort. Unrelated Chinese participants, including both healthy and affected singletons, in addition to parents from duo and trio family structures, were included for the analysis as the HKGP Chinese cohort. Notably, individuals exhibiting phenotypes associated with selected dominant genes (described in the Gene selection section below), as well as participants with offspring demonstrating phenotypes related to selected recessive genes, were excluded from the respective analyses. To ensure unrelatedness, PLINK (version 2.0) was employed to assess the biological sex and the relatedness among the remaining participants in the HKGP Chinese cohort, with one participant removed from each pair (parents will be retained for non-singleton participants) exhibiting a kinship coefficient greater than 0.177 (ref. ⁴¹). Participants with conflicting self-reported sex and sequencing data imputed sex (PLINK 2 --impute-sex) were removed for this study. Chinese ethnicity was determined on the basis of self-reported data and validated through ancestry admixture analysis, where Chinese ethnicity was identified as the predominant ancestry using SNVstory⁴⁴.

Enrollment criteria

1)
Undiagnosed disorders
1. a)
  The definition for undiagnosed disorders is disorders without a specific diagnosis after thorough evaluation through clinical assessment and routine investigation.
2. b)
  HKGP will recruit patients who meet the following criteria:
  1. i)
    The patient has a medical condition that meets the aforesaid definition.
  2. ii)
    Consent of the patient is obtained for providing and sharing medical information and samples.
  3. iii)
    The patient (or parents or legal guardian) agrees to trio testing—that is, blood sample to be taken from patient and both parents. In case trio testing is not possible, the decision will be made based on the relevant specialists’ assessment.
2)
Cancers with clinical clues linked to possible hereditary components
1. a)
  The definition is as follows:
  1. i)
    Having more than one first-degree or second-degree relative with confirmed cancer; or
  2. ii)
    Developing cancer at a younger age than expected for that cancer type; or
  3. iii)
    Pediatric patients with cancer; or
  4. iv)
    Having more than one type of cancer in the same person
2. b)
  Recruitment criteria for patients with hereditary cancer and genetic predisposition to cancer would be:
  1. i)
    The patient is pathologically confirmed with cancer that meets the above definition; and
  2. ii)
    Consent of the patient is obtained for providing and sharing medical information and samples.
3)
Other patients who will benefit from GS (under the theme ‘Genomics and Precision Health’ of the main phase of HKGP)
1. a)
  ‘Genomics and Precision Health’ is a cohort that aims to improve the health of individuals with and without specific diseases by harnessing the power of genomics technologies. The health of individuals can be improved by genomics technologies according to clinical, personal, economic and system utilities.
4)
Unaffected first-degree family members aged older than 18 years of the above three cohorts

Exclusion criteria

Exclusion criteria include patients with known genetic cause for their condition or patient/parents/legal guardian/substitute decision-maker unwilling to participate in the study.

GS and variant detection

The detailed workflows for sequencing and data analysis of short-read GS were previously described¹⁸. In brief, whole blood (or buccal/saliva when necessary) was collected, and genomic DNA was extracted for polymerase chain reaction (PCR)-free short-read GS using the KAPA HyperPlus Kit and sequenced on Illumina NovaSeq 6000 or X Plus to achieve a mean coverage of ≥29.5×. After passing quality control checks, the GATK-based standard bioinformatics pipeline was used for secondary analysis. In short, reads were aligned to the GRCh38 reference using BWA (version 0.7.17) with duplicate removal via Picard (version 2.27.4), and variant calling for autosomes, sex chromosomes and the mitochondrial genome was performed using GATK HaplotypeCaller, Mutect2 (version 4.2.6.1), CNVKit (version 0.9.9), Manta (version 1.6.0) and ExpansionHunter (version 3.1.2) to detect SNVs, indels, CNVs, SVs and STRs^45,46,47,48.

Gene selection

Genes with strong or definitive gene‒disease associations, as classified by Clinical Genome Resource (ClinGen) (‘definitive’ or ‘strong’), Genomics England PanelApp or PanelApp Australia (‘green’), were prioritized. Genes with moderate evidence of association (‘moderate’ in ClinGen or ‘amber’ in PanelApp) were selectively included on the basis of consensus with referring clinicians.

For the dominant disorder-related genes used for the HKGP Chinese cohort analysis, we adopted a reference gene list of 73 dominant genes from the ACMG secondary findings gene list version 3.2 (ref. ¹⁶).

For recessive disorder-related genes, we consolidated a comprehensive list of 1,459 genes from multiple well-recognized sources to ensure broad coverage and clinical relevance. These sources included (1) 105 genes from the ACMG-recommended CS pan-ethnic gene list, including HBA1 and HBA2 for Asian individuals¹⁵; (2) 1,283 genes from ‘Mackenzie’s Mission’ version 2.2 gene list, derived from a large-scale Australian CS initiative²⁷; (3) 101 autosomal recessive genes associated with treatable inherited disorders⁴⁹; and (4) 140 additional genes from other commercially available CS panels and relevant published resources. This integrative approach was intended to maximize the clinical utility of our CS protocol by capturing both established and emerging gene‒disease associations. Detailed lists of the dominant and recessive genes are provided in Supplementary Tables 6 and 7.

Variant classification

SNVs and indels

Diagnostic cohort

Following a phenotype-driven diagnostic workflow similar to that used in the HKGP pilot study¹⁸, SNVs and indels (<50 base pairs) with allele frequencies <0.005 in gnomAD versions 2.1.1 and 3.1.2 were prioritized via inheritance-based filtering and phenotypic matching with HPO terms through Exomiser⁵⁰, supplemented by virtual gene panels from Genomics England PanelApp and PanelApp Australia as described above. The pathogenicity of the variants was determined according to ACMG guidelines and up-to-date recommendations from the ClinGen Sequence Variant Interpretation (SVI) Working Group through manual curation. Specifically, mitochondrial variants were analyzed according to the ClinGen Mitochondrial Disease Nuclear and Mitochondrial Expert Panel Specifications to the ACMG/AMP Variant Interpretation Guidelines. Following the HKGP principles of reporting, we reported variants that were classified as P/LP only when their biological effects matched the patient phenotype. Orthogonal validation was performed for all P/LP variants using independent DNA extracted from the original sample. Variants of uncertain significance (VUSs) in dominant genes that meet the following criteria, agreed upon by all parties in the multidisciplinary team, including clinicians, were reported: highly compatible with the clinical phenotypes and when additional secondary assay/analysis—such as RNA sequencing, enzyme activity testing, immunohistochemical staining, imaging studies and segregation analysis—can be performed to confirm the diagnosis. Variants were visualized using Integrated Genomics Viewer (IGV) version 2.17.4 (ref. ⁵¹).

The HKGP Chinese cohort (recessive and dominant genes)

In addition to diagnostic findings, SNVs and indels in our consolidated gene lists for other clinical findings were retained for curation if their allele frequencies were <0.05 in gnomAD version 3.1.2 unless they were included on the BA1 (‘standalone benign’) criterion exception list. Through a combination of automated and manual curation (Supplementary Fig. 4), these variants were classified into three categories: reported P/LP, ACMG P/LP and ACMG VUS or benign (ACMG VUS-B).

a.
Reported P/LP

P/LP variants from ClinVar with three-star or four-star review status were classified by expert panels such as ClinGen or authoritative consortia such as the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA). In addition, to reduce the total number of variants for manual review, one-star or two-star review status variants were also classified as reported P/LP for recessive genes.
b.
ACMG P/LP and VUS or benign (VUS-B)

Other identified variants were processed through two analytic pipelines: (1) both ClinVar-reported and novel variants in the dominant gene list were classified using ACMG/ClinGen guidelines and a Bayesian classification framework; (2) ClinVar-unreported null variants in the LoF genes were classified using the PVS1 criterion. All ClinVar data were accessed and extracted on 30 June 2024.

For the variants detected in the HKGP Chinese cohort, the classification process was further refined using our previously established semiautomated brief cohort analysis workflow (S-BCAW)⁵². Both automated scoring and manual curation were applied throughout the curation process. For recessive genes, null variants absent from ClinVar were assigned PVS1 criterion using AutoPVS1 (version 1.1) and classified similarly²¹.

SVs and CNVs

Diagnostic cohort

A phenotype-driven diagnostic workflow similar to that used in the HKGP pilot study was followed. The pathogenicity of deletions and duplications was interpreted in accordance with the joint consensus standards of CNV interpretation by the ACMG and ClinGen⁵³. Currently, there is no established expert consensus for the interpretation of other SV types. For these variant types, PVS1 was applied at an appropriate strength on the basis of the predicted impact on gene function⁵⁴.

The HKGP Chinese cohort (recessive and dominant genes)

The analysis of SVs focused specifically on genes identified in the predefined gene list, where the disease mechanism is LoF. Insertions, deletions and duplications within these gene lists were curated according to the ACMG/ClinGen joint consensus guidelines for CNV interpretation⁵³.

Among the recessive disorder-related genes, some loci present unique technical challenges that cannot be reliably detected by conventional variant callers, as described above. To overcome these limitations, specialized approaches were employed: an in-house developed caller was used for detecting common α-globin gene deletions (HBA1/HBA2), and Illumina’s SMNCopyNumberCaller was used for precise quantification of SMN1 and SMN2(ref.⁵⁵).

STRs

STRs were analyzed at loci defined by the Illumina repeat catalog (https://github.com/Illumina/RepeatCatalogs). STR calls were considered pathogenic if the repeat size was greater than the pathogenic reportable threshold summarized in gnomAD on the basis of the literature.

Defining GCF and cGCF

To characterize carrier frequencies at the gene level, we adopted the concept of GCF, defined as the fraction of participants carrying any P/LP variant(s) in the gene.

To facilitate further analysis across groups of genes, we introduced the concept of cGCF, which is defined as the sum of GCFs for all genes within a specific gene list or tier. These metrics provide a robust framework for quantifying carrier frequencies at multiple levels of granularity, enabling population-specific insights and facilitating tier-based gene classification.

Clinical utility

Clinical utility is defined as the percentage of individuals experiencing potential changes to clinical management after a diagnosis, which helps to accelerate decision-making and the consensus formulation process for all relevant stakeholders. The potential change in clinical management was classified into seven categories according to Riggs et al. and the UK 100,000 Genomes Project^19,56: (1) referral to specialist(s); (2) indication for further diagnostic tests to evaluate possible complications; (3) initiation or contraindication of interventional or surgical procedures; (4) surveillance for potential future complications; (5) initiation or contraindication of medications; (6) lifestyle changes; and (7) clinical trial eligibility (meet enrollment criteria for phase 2 or higher interventional (related to drugs, medical devices, procedures and vaccines as defined in https://clinicaltrials.gov/) or observational (focused on assessing non-interventional biomedical or health outcomes) trial studies listed in https://clinicaltrials.gov/ or https://www.clinicaltrialsregister.eu/ that were related to the patient’s target gene and disease at the time of diagnosis).

Diagnostic odyssey

The diagnostic odyssey is defined as the time from when the disease’s symptoms are first noted in the proband (odyssey start date) to the time when a genetic diagnosis is reached. We determined the odyssey start date by retrieving the earliest record in the clinical management system that describes the symptoms of the primary indication(s) when referred to the HKGP. The date of genetic diagnosis was determined on the basis of the date at which the HKGP issued the report to the referring clinician. The diagnostic odyssey was calculated as the date of genetic diagnosis minus the odyssey start date, rounded to the nearest year; for odysseys shorter than 1 year, duration was calculated in months.

Founder mutation screening

Novel potential founder mutations were assessed in this study. The following selection criteria were applied for novel founder mutations: (1) repeated occurrence among the participants in this study, (2) absence in the gnomAD non-East Asian genome dataset and (3) absence in ClinVar. For known variants, Chinese-specific founder mutations were directly collected from the literature and compared with our findings. Shared haplotype analysis was conducted for both novel and known potential founder mutation loci among related participants carrying the mutation. This analysis used IBDseq⁵⁷ for common variants (minor allele frequency >0.5% in this study).

Estimation of ACF

To estimate the ACF, all possible mating combinations among unrelated Chinese participants included in this study were evaluated. Specifically, (1) all pairings, irrespective of sex, were considered for autosomal recessive genes (\({C}_{2}^{n}\) pairings in total; n is the number of unrelated Chinese participants), and (2) only female‒male pairings were assessed for X-linked genes. A virtual couple was classified as ‘at-risk’ if both individuals carried P/LP variants in any of the same autosomal recessive genes or if the female carried P/LP variants in any X-linked genes. The ACF estimated through random mating was then compared to the observed frequency of actual couples carrying P/LP variants in the same gene within this cohort.

Re-tiering CS genes based on ACMG guidelines for the Chinese population

Genes were re-tiered on the basis of ACMG CS guidelines, with carrier frequency thresholds applied to the gene-specific GCF derived from Chinese population data in the HKGP. Tier 1 was unchanged and includes CFTR, SMN1/SMN2, HBA1/HBA2 and HBB. Tier 2 included genes associated with severe or moderate phenotypes and a carrier frequency of at least 1/100 in autosomes in our Chinese population, whereas tier 3 included genes with carrier frequencies of at least 1/200 in sex chromosomes or autosomes. This tiering approach was designed to reflect population-specific genetic characteristics while maintaining consistency with the ACMG’s evidence-based recommendations. cGCFs for different tiers were compared for this Chinese tier and ACMG pan-ethnic tiers for the Chinese population and other populations in the gnomAD 4.0 database¹³.

Pharmacogenomics

Gene selection and individual selection

To profile the actionable pharmacogenomic variants, we consolidated a gene list of 25 pharmacogenes with PharmGKB Clinical Annotation Level 1A or 1B (Supplementary Table 9). Among the 25 pharmacogenes analyzed, seven pharmacogenes (CACNA1S, CFTR, DPYD, G6PD, MT-RNR1, RYR1and VKORC1) are associated with congenital diseases as classified by ClinGen with definitive, strong or moderate gene−disease validity or as ‘green’ (diagnostic) or ‘amber’ (borderline) in relevant disease panels in Genomics England PanelApp and PanelApp Australia (similar gene selection approach for the diagnostic cohort). To avoid confounding effects from these conditions, individuals from the HKGP Chinese cohort were excluded from the analysis if their own or their offsprings’ primary phenotypes matched the associated congenital diseases. The remaining individuals were included for the pharmacogenomic analysis of known alleles and novel variants.

Known pharmacogenomic variants

Genotyping of known alleles of the 25 selected pharmacogenes was conducted using various tools: (1) Cyrius version 1.1.1 (ref. ⁵⁸) for CYP2D6 alleles, (2) HLA-HD version 1.7.0 (ref. ⁵⁹) for HLA-A and HLA-B alleles, (3) Aldy version 4.6 (ref. ⁶⁰) for other pharmacogenes with star allele nomenclature and (4) VCF-derived for pharmacogenes defined by dbSNP rsIDs. Allele function and phenotype were determined on the basis of information sourced from CPIC and PharmGKB (accessed 12 November 2024). Variants listed in the AMP’s minimum sets for pharmacogenomic testing are also labeled in the same table.

To investigate the discrepancy between the Chinese population and the population with maximum sample size in CPIC, we followed the definitions and methods described by Hernandez et al.¹⁷ to compare the differences in the frequencies of altered functional alleles.

To further investigate the significance of the clinical impact of the actionable phenotypes in pharmacogenes, we categorized actionable phenotypes according to the three sections defined by the FDA Tables of Pharmacogenetic Associations (www.fda.gov/medical-devices/precision-medicine/table-pharmacogenetic-associations) (Supplementary Table 11).

Novel variants in LoF pharmacogenes

To further investigate novel deleterious variants in pharmacogenes, SNVs, indels, CNVs and SVs were detected using the same methodology described earlier. This analysis focused on nine pharmacogenes for which no-function alleles have been defined to be associated with actionable phenotype by CPIC or PharmGKB (CYP2B6, CYP2C9, CYP2C19, CYP2D6, DPYD, G6PD, NUDT15, SLCO1B1and TPMT). These genes were selected based on the rationale that LoF is a mechanism associated with their actionable phenotype. Only putative protein-disrupting variants, including frameshift, inframe, splicing and nonsense variants in these genes with PVS1 strength reaching ‘very strong’ from AutoPVS1, were included in this study after manual investigation on IGV for to ensure high-quality variants.

Estimated actionable prescriptions in Hong Kong

To examine the pharmaceutical landscape in Hong Kong, the prescription records of all medications from hospitals under the Hong Kong Hospital Authority between 1 December 2023 and 30 November 2024 were retrieved from the Clinical Data Analysis and Reporting System (CDARS) database. The top 50 drugs were selected on the basis of the total prescription count during this period. We estimated the number of actionable prescriptions by multiplying the frequency of pharmacogenomic actionable phenotypes, as defined in PharmGKB and CPIC and identified in HKGP’s data, for each individual pharmacogenomic gene. To further study the clinical relevance, we analyzed these prescribed drugs using the FDA’s Table of Pharmacogenomic Biomarkers in Drug Labeling (www.fda.gov/drugs/science-and-research-drugs/table-pharmacogenomic-biomarkers-drug-labeling) and identified clinically consequential pharmacogenomic information with three key labeling sections: adverse reactions, warnings and precautions and dosage and administration.

Results reporting

Primary findings

Building upon patient and clinician feedback, we will continue to prioritize returning clinically significant findings directly related to the referral indication and clinical phenotype.

Additional medically actionable findings

Dominant disorders

For participant opt-in for feedback of additional findings of GS, we developed a plan for reporting and returning findings in 13 genes (of which 12 are associated with dominant disorders)—MLH1, MSH2, MSH6, MUTYH, APC, BRCA1, BRCA2, VHL, MEN1, RET, LDLR, APOBand PCSK9—based on clinical actionability and severity. In compliance with ACMG guidelines and reporting guidance, only P/LP variants will be reported (https://search.clinicalgenome.org/kb/genes/acmgsf). This structured approach ensures responsible return of high-impact genetic information while respecting clinical context and participant preferences.

Recessive disorders

For reporting and returning additional findings of MUTYH-associated polyposis, only individuals with two identified disease-causing variants will receive results. Regarding expanded CS, we are at the crossroads. Although we will continue to return carrier status upon patient request, this study reinforces our decision to develop a Chinese-specific CS panel rather than relying solely on resources based on European ancestries, such as ACMG and ‘Mackenzie’s Mission’. We have demonstrated our capability to identify and return these results to patients.

Pharmacogenomics

Given the potential for broad impact, we are now initiating comprehensive review with our scientific and ethics advisory committees to explore strategies for pharmacogenomics implementation.

Statistics and reproducibility

All statistical analyses were performed using R version 4.3.3. Diagnostic yield comparisons for the diagnostic cohort and cGCF comparison in recessive genes were performed by the one-sided χ² test (Extended Data Table 2 and Supplementary Table 8).

ACF comparisons were performed by two-sided Fisher’s exact test for each gene, and the P value was further corrected by Bonferroni correction for multiple testing on multiple genes (Supplementary Table 7). The significance level was set as P < 0.05 for all analyses in this study.

No statistical method was used to predetermine sample size. The sample size for the diagnostic cohort was determined by including all the HKGP participants who finished genetic diagnosis by November 2024 in HKGI. The sample size for the HKGP Chinese cohort was determined by including all unrelated Chinese participants who finished variant analysis by the same cutoff date.

For both cohorts, individuals with sequencing data who failed the quality control were excluded in this study. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Deidentified proband-level information used for the diagnostic cohort is available in Supplementary Tables 1 and 2. Detailed variant-level information used for the diagnostic cohort and the HKGP Chinese cohort is available in Supplementary Tables 3, 4, 5, 9, 13 and 14. Detailed gene-level information is available in Supplementary Tables 6, 7 and 11. Variants identified in the diagnostic cohort were uploaded to ClinVar in batches (https://www.ncbi.nlm.nih.gov/clinvar/submitters/510250/).

Deidentified individual-level genotype data of variants presented in this paper and additional aggregate-level data not included in this paper are currently available to researchers who obtain IRB approval by completing the following steps:

1. Researchers should submit a Data Access Request to HKGI (hkgi_gc_team@genomics.org.hk) outlining the proposed research, including its purpose, scope of data to be accessed and researcher information.

2. The HKGI Data Access Review Panel will review the application in a quarterly meeting to assess the scientific, clinical, technical, resource and regulatory feasibility of the proposal. All feasible proposals will be approved.

3. The HKGI team will collaborate with applicants to prepare the formal proposal and related IRB documentation.

4. Anonymous, aggregate data will then be provided to applicants either directly or within designated HKGI facilities (for 3−12 months), depending on the assessment of the proposal.

The same application process also applies to other individual-level genomic data beyond this paper. As the HKGP is actively recruiting new participants at the time of writing, access to such data will be granted to external researchers after the completion of the main phase of this project in 2030. Source data are provided with this paper.

Code availability

The code and scripts used to perform all analyses and generate the figures in this study are publicly available on GitHub at https://github.com/hkgi-steam/hkgi_flagship_paper_2025. The repository includes analysis scripts for identifying variants, generating summary statistics and producing the display figures. Instructions for reproducing the figures are also provided, including steps to build the required computational environment using Jupyter Notebook and Apptainer.

References

Health, T. L. G. The landscape for rare diseases in 2024. Lancet Glob. Health 12, e341 (2024).
Article Google Scholar
Smedley, D. et al. 100,000 Genomes pilot on rare-disease diagnosis in health care—preliminary report. N. Engl. J. Med. 385, 1868–1880 (2021).
Article CAS PubMed Google Scholar
Sosinsky, A. et al. Insights for precision oncology from the integration of genomic and clinical data of 13,880 tumors from the 100,000 Genomes Cancer Programme. Nat. Med. 30, 279–289 (2024).
Article CAS PubMed PubMed Central Google Scholar
Leong, I. U. S. et al. Large-scale pharmacogenomics analysis of patients with cancer within the 100,000 Genomes Project combining whole-genome sequencing and medical records to inform clinical practice. J. Clin. Oncol. 43, 682–693 (2025).
Article PubMed Google Scholar
Stark, Z. et al. Australian Genomics: outcomes of a 5-year national program to accelerate the integration of genomics in healthcare. Am. J. Hum. Genet. 110, 419–426 (2023).
Article CAS PubMed PubMed Central Google Scholar
Venner, E. et al. Whole-genome sequencing as an investigational device for return of hereditary disease risk and pharmacogenomic results as part of the All of Us Research Program. Genome Med. 14, 34 (2022).
Article PubMed PubMed Central Google Scholar
Chan, S. H. et al. Analysis of clinically relevant variants from ancestrally diverse Asian genomes. Nat. Commun. 13, 6694 (2022).
Article CAS PubMed PubMed Central Google Scholar
Takahashi, Y. et al. Six years’ accomplishment of the Initiative on Rare and Undiagnosed Diseases: nationwide project in Japan to discover causes, mechanisms, and cures. J. Hum. Genet. 67, 505–513 (2022).
Article PubMed PubMed Central Google Scholar
Kim, M. J. et al. The Korean Genetic Diagnosis Program for Rare Disease Phase II: outcomes of a 6-year national project. Eur. J. Hum. Genet. 31, 1147–1153 (2023).
Article PubMed PubMed Central Google Scholar
Huang, Y. et al. Landscape of secondary findings in Chinese population: a practice of ACMG SF v3.0 list. J. Pers. Med. 12, 1503 (2022).
Article PubMed PubMed Central Google Scholar
Hsu, J. S. et al. Complete genomic profiles of 1496 Taiwanese reveal curated medical insights. J. Adv. Res. 66, 197–207 (2024).
Article CAS PubMed Google Scholar
Yu, M. H. C. et al. Actionable secondary findings in 1116 Hong Kong Chinese based on exome sequencing data. J. Hum. Genet. 66, 637–641 (2021).
Article CAS PubMed Google Scholar
Hotakainen, R., Järvinen, T., Kettunen, K., Anttonen, A.-K. & Jakkula, E. Estimation of carrier frequencies of autosomal and X-linked recessive genetic conditions based on gnomAD v4.0 data in different ancestries. Genet. Med. 27, 101304 (2025).
Article CAS PubMed Google Scholar
Chung, C. C. Y., Project, H. K. G., Chu, A. T. W. & Chung, B. H. Y. Rare disease emerging as a global public health priority. Front. Public Health 10, 1028545 (2022).
Article PubMed PubMed Central Google Scholar
Gregg, A. R. et al. Screening for autosomal recessive and X-linked conditions during pregnancy and preconception: a practice resource of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 23, 1793–1806 (2021).
Article PubMed PubMed Central Google Scholar
Miller, D. T. et al. ACMG SF v3.2 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 25, 100866 (2023).
Article CAS PubMed PubMed Central Google Scholar
Hernandez, S., Hindorff, L. A., Morales, J., Ramos, E. M. & Manolio, T. A. Patterns of pharmacogenetic variation in nine biogeographic groups. Clin. Transl. Sci. 17, e70017 (2024).
Article CAS PubMed PubMed Central Google Scholar
Lam, W. K. J. et al. The implementation of genome sequencing in rare genetic diseases diagnosis: a pilot study from the Hong Kong genome project. Lancet Reg. Health West. Pac. 55, 101473 (2025).
PubMed PubMed Central Google Scholar
Turnbull, C. et al. The 100 000 Genomes Project: bringing whole genome sequencing to the NHS. BMJ 361, k1687 (2018).
Article PubMed Google Scholar
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Article CAS PubMed Google Scholar
Xiang, J., Peng, J., Baxter, S. & Peng, Z. AutoPVS1: an automatic classification tool for PVS1 interpretation of null variants. Hum. Mutat. 41, 1488–1498 (2020).
Article CAS PubMed Google Scholar
Pejaver, V. et al. Calibration of computational tools for missense variant pathogenicity classification and ClinGen recommendations for PP3/BP4 criteria. Am. J. Hum. Genet. 109, 2163–2177 (2022).
Article CAS PubMed PubMed Central Google Scholar
Guo, M. H. & Gregg, A. R. Estimating yields of prenatal carrier screening and implications for design of expanded carrier screening panels. Genet. Med. 21, 1940–1947 (2019).
Article PubMed Google Scholar
Jensson, B. O. et al. Actionable genotypes and their association with life span in Iceland. N. Engl. J. Med. 389, 1741–1752 (2023).
Article CAS PubMed Google Scholar
Kim, Y., Kim, J.-M., Cho, H.-W., Park, H.-Y. & Park, M.-H. Frequency of actionable secondary findings in 7472 Korean genomes derived from the National Project of Bio Big Data pilot study. Hum. Genet. 142, 1561–1569 (2023).
Article CAS PubMed PubMed Central Google Scholar
Venner, E. et al. The frequency of pathogenic variation in the All of Us cohort reveals ancestry-driven disparities. Commun. Biol. 7, 174 (2024).
Article PubMed PubMed Central Google Scholar
Kirk, E. P. et al. Gene selection for the Australian Reproductive Genetic Carrier Screening Project (‘Mackenzie’s Mission’). Eur. J. Hum. Genet. 29, 79–87 (2021).
Article PubMed Google Scholar
Whirl-Carrillo, M. et al. An evidence-based framework for evaluating pharmacogenomics knowledge for personalized medicine. Clin. Pharmacol. Ther. 110, 563–572 (2021).
Article PubMed PubMed Central Google Scholar
Pratt, V. M. et al. TPMT and NUDT15 genotyping recommendations: a joint consensus recommendation of the Association for Molecular Pathology, Clinical Pharmacogenetics Implementation Consortium, College of American Pathologists, Dutch Pharmacogenetics Working Group of the Royal Dutch Pharmacists Association, European Society for Pharmacogenomics and Personalized Therapy, and Pharmacogenomics Knowledgebase. J. Mol. Diagn. 24, 1051–1063 (2022).
Article PubMed PubMed Central Google Scholar
Wei, C.-Y. et al. Clinical impact of pharmacogenetic risk variants in a large chinese cohort. Nat. Commun. 16, 6344 (2025).
Article CAS PubMed PubMed Central Google Scholar
Wang, L.-Y. et al. The pharmacogenomic landscape in the Chinese: an analytics of pharmacogenetic variants in 206,640 individuals. Innovation (Camb.) 6, 100773 (2025).
CAS PubMed PubMed Central Google Scholar
Walters, R. G. et al. Genotyping and population characteristics of the China Kadoorie Biobank. Cell Genom. 3, 100361 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ng, H.-Y. et al. Identification of technically challenging variants – whole genome sequencing improves diagnostic yield in patients with high clinical suspicion of rare diseases. HGG Adv. 6, 100469 (2025).
Chu, A. T., CY, C. C., Lo, S. V. & Chung, B. H. Marketing and publicity strategies for launching the pilot phase of the Hong Kong Genome Project. J. Transl. Genet. Genom. 7, 66–78 (2023).
Google Scholar
Park, J. et al. Impact of population screening for Lynch syndrome insights from the All of Us data. Nat. Commun. 16, 523 (2025).
Article CAS PubMed PubMed Central Google Scholar
Tomlinson, B. et al. Guidance on the management of familial hypercholesterolaemia in Hong Kong: an expert panel consensus viewpoint. Hong Kong Med. J. 24, 408–415 (2018).
Article CAS PubMed Google Scholar
Chau, J. F. T. et al. Comprehensive analysis of recessive carrier status using exome and genome sequencing data in 1543 Southern Chinese. NPJ Genom. Med. 7, 23 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hou, W. et al. [Carrier screening for 223 monogenic diseases in Chinese population: a multi-center study in 33 104 individuals]. Nan Fang Yi Ke Da Xue Xue Bao 44, 1015–1023 (2024).
CAS PubMed PubMed Central Google Scholar
Warburton, P. E. & Sebra, R. P. Long-read DNA sequencing: recent advances and remaining challenges. Annu. Rev. Genom. Hum. Genet. 24, 109–132 (2023).
Article CAS Google Scholar
Smail, C. & Montgomery, S. B. RNA sequencing in disease diagnosis. Annu. Rev. Genom. Hum. Genet. 25, 353–367 (2024).
Article CAS Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Swen, J. J. et al. A 12-gene pharmacogenetic panel to prevent adverse drug reactions: an open-label, multicentre, controlled, cluster-randomised crossover implementation study. Lancet 401, 347–356 (2023).
Article CAS PubMed Google Scholar
Chu, A. T. W. et al. The Hong Kong genome project: building genome sequencing capacity and capability for advancing genomic science in Hong Kong. J. Transl. Genet. Genom. 7, 196–212 (2023).
Article CAS Google Scholar
Bollas, A. E. et al. SNVstory: inferring genetic ancestry from genome sequencing data. BMC Bioinformatics 25, 76 (2024).
Article PubMed PubMed Central Google Scholar
Dolzhenko, E. et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 35, 4754–4756 (2019).
Article CAS PubMed PubMed Central Google Scholar
Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput. Biol. 12, e1004873 (2016).
Article PubMed PubMed Central Google Scholar
Auwera, G. A. V. der et al. From FastQ data to high-confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinform. 43, 11.10.1–11.10.33 (2013).
Google Scholar
Rajaby, R. et al. INSurVeyor: improving insertion calling from short read sequencing data. Nat. Commun. 14, 3243 (2023).
Article CAS PubMed PubMed Central Google Scholar
Veldman, A. et al. Newborn screening by DNA-first: systematic evaluation of the eligibility of inherited metabolic disorders based on treatability. Int. J. Neonatal Screen. 11, 1 (2024).
Article PubMed PubMed Central Google Scholar
Smedley, D. et al. Next-generation diagnostics and disease-gene discovery with the Exomiser. Nat. Protoc. 10, 2004–2015 (2015).
Article CAS PubMed PubMed Central Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ying, D. et al. Accelerating genetic diagnostics in retinitis pigmentosa: implementation of a semi-automated bespoke cohort analysis workflow for Hong Kong Genome Project. Hum. Genet. 144, 515–528 (2025).
Article PubMed PubMed Central Google Scholar
Riggs, E. R. et al. Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen). Genet. Med. 22, 245–257 (2020).
Article PubMed Google Scholar
Tayoun, A. N. A. et al. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum. Mutat. 39, 1517–1524 (2018).
Article PubMed PubMed Central Google Scholar
Chen, X. et al. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. Genet. Med. 22, 945–953 (2020).
Article CAS PubMed PubMed Central Google Scholar
Riggs, E. R. et al. Chromosomal microarray impacts clinical management. Clin. Genet. 85, 147–153 (2014).
Article CAS PubMed Google Scholar
Browning, B. L. & Browning, S. R. Detecting identity by descent and estimating genotype error rates in sequence data. Am. J. Hum. Genet. 93, 840–851 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chen, X. et al. Cyrius: accurate CYP2D6 genotyping using whole-genome sequencing data. Pharmacogenomics J. 21, 251–261 (2021).
Article PubMed PubMed Central Google Scholar
Kawaguchi, S., Higasa, K., Shimizu, M., Yamada, R. & Matsuda, F. HLA-HD: an accurate HLA typing algorithm for next-generation sequencing data. Hum. Mutat. 38, 788–797 (2017).
Article CAS PubMed Google Scholar
Numanagić, I. et al. Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes. Nat. Commun. 9, 828 (2018).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors thank the patients, their families and the healthcare and recruitment teams at the partnering centers of the HKGI, The University of Hong Kong/Queen Mary Hospital, The Chinese University of Hong Kong/Prince of Wales Hospital and Hong Kong Children’s Hospital for their contributions to the HKGP. We also acknowledge all staff members at the HKGI for their support in sample sequencing, data curation and analysis. The authors also thank the Hospital Authority for supporting the HKGP. The HKGP is a publicly funded initiative commissioned by the Health Bureau of the Hong Kong SAR Government. The funder had no role in the study design, data collection, analysis, decision to publish or preparation of the manuscript.

Author information

These authors contributed equally: Dingge Ying, Ching-Lung Cheung, Chun-Kwan O, Wai Kei Jacky Lam, Shiu Lun Au Yeung.

Authors and Affiliations

Hong Kong Genome Institute, Hong Kong SAR, China
Dingge Ying, Desiree Man Sik Tse, James Si Chai Liu, Shirley Pik Ying Hue, Jamie Sui Lam Kwok, Denis Long Him Yeung, Christopher Brandon Preusch, Wei Ma, Wenshu Tang, Amy Hin Yan Tong, Su-Vui Lo, Annie Tsz Wai Chu & Brian Hon Yin Chung
Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Ching-Lung Cheung & Suhas Krishnamoorthy
Laboratory of Data Discovery for Health Limited (D24H), Hong Kong Science Parks, Hong Kong SAR, China
Ching-Lung Cheung & Gabriel Matthew Leung
Hinda and Arthur Marcus Institute for Aging Research, Hebrew SeniorLife, Boston, MA, USA
Ching-Lung Cheung
Department of Medicine and Therapeutics, Faculty of Medicine, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong SAR, China
Chun-Kwan O, Lisa Wing Chi Au, Juliana Chung-Ngor Chan, Rashid Nok Shun Lui & Ronald Ching Wan Ma
Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong SAR, China
Chun-Kwan O, Wai Kei Jacky Lam, Juliana Chung-Ngor Chan & Ronald Ching Wan Ma
Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong SAR, China
Wai Kei Jacky Lam
School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Shiu Lun Au Yeung, Gabriel Matthew Leung & Shan Luo
Department of Medicine, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Chak Sing Lau, Yap-Hang Chan, Philip Hei Li, Becky Mingyao Ma & Kathryn Choon Beng Tan
Department of Clinical Genetics, Hong Kong Children’s Hospital, Hong Kong SAR, China
Ho Ming Luk, Shirley Sze Wing Cheng & Stephanie Ho
Department of Ophthalmology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Christopher Kai Shun Leung
Department of Ophthalmology, Queen Mary Hospital, Hong Kong SAR, China
Christopher Kai Shun Leung & Qing Li
Hong Kong Eye Hospital, Hong Kong SAR, China
Christopher Kai Shun Leung
Grantham Hospital, Hong Kong SAR, China
Christopher Kai Shun Leung & Qing Li
Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong SAR, China
Juliana Chung-Ngor Chan & Ronald Ching Wan Ma
Department of Paediatrics, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong SAR, China
Shuk Ching Chong
Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong, Hong Kong SAR, China
Shuk Ching Chong
Joint Baylor-CUHK Center of Medical Genetics, The Chinese University of Hong Kong, Hong Kong SAR, China
Shuk Ching Chong
Department of Paediatrics and Adolescent Medicine, Hong Kong Children’s Hospital, Hong Kong SAR, China
Cheuk Wing Fung & Sheila Suet-Na Wong
Department of Clinical Oncology, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong SAR, China
Herbert Ho-Fung Loong
State Key Laboratory of Translational Oncology, The Chinese University of Hong Kong, Hong Kong SAR, China
Herbert Ho-Fung Loong
Medical Data Analytics Centre, The Chinese University of Hong Kong, Hong Kong SAR, China
Rashid Nok Shun Lui
Institute of Digestive Disease, The Chinese University of Hong Kong, Hong Kong SAR, China
Rashid Nok Shun Lui
Department of Family Medicine and Primary Care, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Shan Luo
Department of Surgery, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Rong Na
Department of Paediatrics and Adolescent Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
Brian Hon Yin Chung

Authors

Dingge Ying
View author publications
Search author on:PubMed Google Scholar
Ching-Lung Cheung
View author publications
Search author on:PubMed Google Scholar
Chun-Kwan O
View author publications
Search author on:PubMed Google Scholar
Wai Kei Jacky Lam
View author publications
Search author on:PubMed Google Scholar
Shiu Lun Au Yeung
View author publications
Search author on:PubMed Google Scholar
Chak Sing Lau
View author publications
Search author on:PubMed Google Scholar
Ho Ming Luk
View author publications
Search author on:PubMed Google Scholar
Christopher Kai Shun Leung
View author publications
Search author on:PubMed Google Scholar
Desiree Man Sik Tse
View author publications
Search author on:PubMed Google Scholar
James Si Chai Liu
View author publications
Search author on:PubMed Google Scholar
Shirley Pik Ying Hue
View author publications
Search author on:PubMed Google Scholar
Jamie Sui Lam Kwok
View author publications
Search author on:PubMed Google Scholar
Denis Long Him Yeung
View author publications
Search author on:PubMed Google Scholar
Christopher Brandon Preusch
View author publications
Search author on:PubMed Google Scholar
Wei Ma
View author publications
Search author on:PubMed Google Scholar
Wenshu Tang
View author publications
Search author on:PubMed Google Scholar
Amy Hin Yan Tong
View author publications
Search author on:PubMed Google Scholar
Lisa Wing Chi Au
View author publications
Search author on:PubMed Google Scholar
Juliana Chung-Ngor Chan
View author publications
Search author on:PubMed Google Scholar
Yap-Hang Chan
View author publications
Search author on:PubMed Google Scholar
Shirley Sze Wing Cheng
View author publications
Search author on:PubMed Google Scholar
Shuk Ching Chong
View author publications
Search author on:PubMed Google Scholar
Cheuk Wing Fung
View author publications
Search author on:PubMed Google Scholar
Stephanie Ho
View author publications
Search author on:PubMed Google Scholar
Suhas Krishnamoorthy
View author publications
Search author on:PubMed Google Scholar
Gabriel Matthew Leung
View author publications
Search author on:PubMed Google Scholar
Philip Hei Li
View author publications
Search author on:PubMed Google Scholar
Qing Li
View author publications
Search author on:PubMed Google Scholar
Herbert Ho-Fung Loong
View author publications
Search author on:PubMed Google Scholar
Rashid Nok Shun Lui
View author publications
Search author on:PubMed Google Scholar
Shan Luo
View author publications
Search author on:PubMed Google Scholar
Becky Mingyao Ma
View author publications
Search author on:PubMed Google Scholar
Ronald Ching Wan Ma
View author publications
Search author on:PubMed Google Scholar
Rong Na
View author publications
Search author on:PubMed Google Scholar
Kathryn Choon Beng Tan
View author publications
Search author on:PubMed Google Scholar
Sheila Suet-Na Wong
View author publications
Search author on:PubMed Google Scholar
Su-Vui Lo
View author publications
Search author on:PubMed Google Scholar
Annie Tsz Wai Chu
View author publications
Search author on:PubMed Google Scholar
Brian Hon Yin Chung
View author publications
Search author on:PubMed Google Scholar

Consortia

Hong Kong Genome Project

Dingge Ying
, Desiree Man Sik Tse
, James Si Chai Liu
, Shirley Pik Ying Hue
, Jamie Sui Lam Kwok
, Denis Long Him Yeung
, Christopher Brandon Preusch
, Wei Ma
, Wenshu Tang
, Amy Hin Yan Tong
, Su-Vui Lo
, Annie Tsz Wai Chu
& Brian Hon Yin Chung

Contributions

Conceptualization: B.H.Y.C., A.T.W.C. and S.V.L. Supervision: B.H.Y.C. Project administration: D.Y. Writing—original draft: D.Y., D.M.S.T., J.S.C.L., S.P.Y.H. and J.S.L.K. Writing—review and editing: C.-L.C., C.-K.O, W.K.J.L., S.L.A.Y., D.M.S.T., W.M., W.T., A.H.Y.T. and C.B.P. Methodology: B.H.Y.C. and D.Y. Data curation and analysis: D.Y., J.S.C.L., S.P.Y.H., J.S.L.K, D.L.H.Y. and C.B.P. Visualization: C.B.P., D.M.S.T. and D.L.H.Y. Resources: H.K.G.P., C.-L.C., C.-K.O, W.K.J.L., S.L.A.Y., C.S.L., H.M.L., C.K.S.L., L.W.C.A., J.C.-N.C., Y.-H.C., S.S.W.C., S.C.C., C.W.F., S.H., S.K., G.M.L., P.H.L., Q.L., H.H.-F.L., R.N.S.L., S.-V.L., B.M.M., R.C.W.M., R.N., K.C.B.T. and S.S.-N.W.

Corresponding author

Correspondence to Brian Hon Yin Chung.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Magnus Ingelman-Sundberg, Zornitza Stark and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Anna Ranzoni, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Breakdown of the reporting status of the P/LP variants for 53 of 73 dominant disorder-related genes by related phenotypes.

All pathogenic or likely pathogenic (P/LP) variants, classified using the method described in this study, in dominant disorder-associated genes from the HKGP Chinese cohort were cross-referenced with the ClinVar database. Variants not reported in ClinVar were classified as novel. Variants reported in ClinVar and classified as pathogenic or likely pathogenic were labeled as ClinVar P/LP, while all others were labeled as ClinVar non-P/LP. P/LP, pathogenic or likely pathogenic.

Source data

Extended Data Fig. 2 Comparison of the measured ACF and random mating estimates across the top ACF genes.

A couple is at-risk if they both have P/LP variants in the same recessive disorder genes. Measured at-risk couple frequencies are highlighted by colours based on the tiers of the ACMG pan-ethnic carrier screening panel. The random mating-estimated ACF was closely aligned with the measured ACF. ACF, at-risk couple frequency.

Source data

Extended Data Fig. 3 Gene carrier frequency (GCF) comparison across gnomAD 4.0 continental populations and the HKGP Chinese cohort.

This figure shows the comparison of GCF for ACMG and non-ACMG carrier screening (CS) genes in the HKGP Chinese cohort versus other continental populations from gnomAD 4.0 (non-Finnish Europeans, East Asians, and Africans). Colours and shapes indicate both the ACMG tier classification of genes and the pairwise GCF comparisons between populations, consistent with Fig. 3d and the ACMG categorization. Subfigures illustrate specific comparisons as follows: a, HKGP Chinese vs. non-Finnish Europeans. b, HKGP Chinese vs. East Asians. c, non-Finnish Europeans vs. East Asians. d, non-Finnish Europeans vs. Africans. GCF, gene carrier frequency.

Source data

Extended Data Fig. 4 Pathogenic or likely pathogenic mutation spectra for LDLR and APOB.

a, P/LP variants identified in LDLR from the HKGP Chinese cohort. b, P/LP variants identified in APOB from the HKGP Chinese cohort. Exons are depicted as blue boxes linked by thin lines (introns); the grey bar under LDLR indicates a single intron 6–12 duplication event that we observed. Circles denote missense variants, and squares denote null variants. The colour of a symbol corresponds to the ClinVar classification (red = pathogenic; orange = likely pathogenic; light grey = VUS; dark grey = novel). P/LP: pathogenic or likely pathogenic.

Source data

Extended Data Table 1 Participant demographics

Full size table

Extended Data Table 2 Determinants of diagnostic yield in the diagnostic cohort

Full size table

Extended Data Table 3 Cumulative gene carrier frequency and at-risk couple frequency in the recessive disorder-related gene panels

Full size table

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1−4 and legends for Supplementary Tables 1−14.

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Tables (download XLSX )

Supplementary Table 1: Full list of probands in the diagnostic cohort, including clinical information and genetic diagnoses. Supplementary Table 2: Variant details and clinical management of positively diagnosed probands included in the diagnostic cohort. Supplementary Table 3: Identified P/LP SNVs and indels of dominant and recessive genes in the HKGP Chinese cohort. Supplementary Table 4: Identified P/LP SVs, CNVs and STRs in dominant and recessive genes in the HKGP Chinese cohort. Supplementary Table 5: Reclassified P/LP variants in the HKGP Chinese cohort. Supplementary Table 6: GCF of P/LP variants in dominant disorder-related genes. Supplementary Table 7: GCF of P/LP variants and CS tier in recessive disorder-related genes. Supplementary Table 8: Statistical comparison between two cGCFs from different populations and tiering sources. Supplementary Table 9: Frequencies of altered-function alleles for pharmacogenes. Supplementary Table 10: Number of actionable metabolomic phenotypes for pharmacogenes per participant (source data for Fig. 4c). Supplementary Table 11: Frequency of metabolomic phenotypes for pharmacogenes. Supplementary Table 12: Top 50 most prescribed drugs in Hong Kong with FDA drug labels and pharmacogenetic associations. Supplementary Table 13: Novel putative protein-disrupting variants in LoF pharmacogenes. Supplementary Table 14: Novel founder mutations found in the HKGP with shared haplotypes.

Source data

Source Data Figs. 1−4 and Extended Data Figs. 1−4 (download XLSX )

Statistical Source Data for Figs. 1–4 and Extended Data Figs. 1−4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ying, D., Cheung, CL., O, CK. et al. Population-scale genomic medicine with the Hong Kong Genome Project. Nat Med (2026). https://doi.org/10.1038/s41591-026-04410-w

Download citation

Received: 09 September 2025
Accepted: 16 April 2026
Published: 15 May 2026
Version of record: 15 May 2026
DOI: https://doi.org/10.1038/s41591-026-04410-w

Subjects

Abstract

Similar content being viewed by others

Main

Results

Study cohort overview

The diagnostic cohort for phenotype-guided genetic diagnosis

Determinants of diagnostic yields

Variants in the diagnostic cohort

Potential clinical management

A Chinese-specific reference cohort for clinically actionable findings

Variants in dominant disorder-related genes

Variant prevalence in dominant disorder-related genes

Carrier burden in recessive disorder-related genes

CS gene re-tiering for Chinese

Functional alleles in pharmacogenomic profiling

Actionable pharmacogenomic phenotypes

Discussion

Methods

Participants

Enrollment criteria

Exclusion criteria

GS and variant detection

Gene selection

Variant classification

SNVs and indels

Diagnostic cohort

The HKGP Chinese cohort (recessive and dominant genes)

SVs and CNVs

Diagnostic cohort

The HKGP Chinese cohort (recessive and dominant genes)

STRs

Defining GCF and cGCF

Clinical utility

Diagnostic odyssey

Founder mutation screening

Estimation of ACF

Re-tiering CS genes based on ACMG guidelines for the Chinese population

Pharmacogenomics

Gene selection and individual selection

Known pharmacogenomic variants

Novel variants in LoF pharmacogenes

Estimated actionable prescriptions in Hong Kong

Results reporting

Primary findings

Additional medically actionable findings

Dominant disorders

Recessive disorders

Pharmacogenomics

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Consortia

Hong Kong Genome Project

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links