Introduction

In the context of genome-wide association studies (GWAS), imputation refers to the process of predicting genotypes that are not directly assayed in a sample of individuals1. By using linkage disequilibrium (LD) data from a typically extensive reference panel, imputation allows for the inference of untyped genotypes from a limited selection of array SNPs. Thus, imputation increases the number of variants that can be tested for association in a GWAS, improving power to detect loci that contribute to a complex trait and facilitating large-scale meta-analysis, as well as the fine-mapping of underlying causal variants2. The quality of imputation depends on several factors that include: (i) the number of haplotypes, density of variants, and the size of the reference panel; (ii) the sample size of the GWAS and the properties of the genotyping array; (iii) and the genetic similarity of individuals in the reference panel and GWAS.

Over the last years, one of the most widely used reference panels for imputation has utilized haplotypes derived from WGS data (3× depth) from the 1000 genome project (1KG)3. The reference panel comprises 2504 individuals from 26 populations of African, Admixed American, East Asian, Non-Finish European, and South Asian ancestry, resulting in a dataset of over 88 million variants, and has been demonstrated to enable high-quality imputation into these ancestry groups for variants with minor allele frequency (MAF) as low as 0.5%. However, the 1KG dataset mainly focuses on common variants, and rare and low-frequency variants in specific target populations may be underrepresented. This can lead to imputation inaccuracies in populations that are poorly represented in the reference panel4. Two common approaches to reference panel design aim to increase the total sample size from which haplotype and variant information can be derived. One approach involves creating large, publicly available reference panels such as the Trans-Omics for Precision Medicine (TOPMed) panel, which, as of 2024, comprises 133,597 samples5. These panels amalgamate existing WGS datasets, encompassing diverse population groups falling under the broad categories of European, African, Hispanic/Latino, Asian, and Admixed ancestry6. Conversely, an alternative approach adopts a more targeted strategy. This approach leverages population-specific WGS data in conjunction with an existing genetically diverse reference panel. This tailored approach aims for the optimization of reference panels to enhance imputation accuracy within specific population groups despite an overall smaller sample size relative to panels such as TOPMed.

The integration of additional ancestry- or population-specific WGS data with the 1KG panel has been shown to enhance the imputation quality of population-specific lower-frequency variants. This improvement is achieved by increasing the number of reference haplotypes available and enhancing their genetic similarity with the samples used in genome-wide association studies (GWAS). Examples of such population-specific reference panels have been mostly based on European populations7,8,9,10,11, and few based on East Asian12,13,14 African15 South Asian16, and multi-ancestry17,18 populations. Across these studies, various WGS depths were used to augment existing reference panels, but comparisons between the efficacy of WGS data of different depths and combinations thereof was relatively limited. This has the potential to be a key area for consideration when designing population-specific reference panels as quantifying the difference in improvements in imputation quality between WGS of varying depth could potentially aid in the guidance of panel design and the allocation of resources between the total number of WGS samples vs the depth at which each sample is sequenced.

The primary aim of this study is to establish an effective approach to improving imputation quality into Japanese GWAS utilizing WGS data available from Biobank Japan (BBJ)19. With the development of multiple Japanese-specific reference panels, we aim to compare the effects of incorporating high and low depth sequencing data20. Additionally, this study aims to demonstrate the improvements in discovery and statistical power through GWAS into serum uric acid and total cholesterol levels within BBJ, offered by the optimal Japanese population-specific reference panel over the TOPMed reference panel.

Results

Comparative analysis of imputation accuracy between TOPMed and 1KG-WGS enhanced panels

To enhance imputation quality into GWAS conducted in the BBJ Project, we assembled WGS data from a subset of 7517 Japanese individuals from BBJ. Following quality control, we retained 1491 individuals sequenced at 30× depth; 1765 individuals sequenced at 15× depth; and 4216 individuals sequenced at 3× depth (Methods, Fig. S1). We merged the 1KG high-depth panel17 with four different combinations of the Japanese population-specific WGS by reciprocal imputation: 1037 individuals at 30× depth (1KG+1K panel); 3256 individuals at 30× or 15× depth (1KG+3K panel); 4216 individuals at 3× depth (1KG+4K panel); and 7472 individuals at 30×, 15× or 3× depth (1KG+7K panel) (Methods).

Following the construction of the Japanese population-specific reference panels, we evaluated the imputation performance of our panels against that of TOPMed. We employed a leave-out strategy, selecting 200 high-depth WGS samples from the 1KG+3K reference panel for this purpose. We imputed variants on chromosomes 5 and 19 using array data for each panel. By comparing these imputed variants with the true genotypes found in WGS data, we initially assessed the imputation accuracy (aggregated r2), categorizing it by minor allele frequency (MAF) bins, ranging from rare to common variants (Methods). The Japanese WGS-enhanced 1KG panels outperformed TOPMed specifically for variants with a MAF of less than 5% (Table 1, Fig. 1a). The most significant performance gain was observed in the 1KG+7K reference panel, followed by 1KG+3K, 1KG+4K, and 1KG+1K. In contrast, for common variants with a MAF over 5%, the imputation accuracy of the TOPMed reference panel was comparable to that of the population-specific panels. In addition to aggregated r2, we calculated the concordance, recall, precision, and Imputation Quality Score (IQS). The IQS is derived from the ratios of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) in the imputation results. This provides a score that quantifies the imputation quality on a scale from 0 to 1 and is complementary to the r2 (Methods)21. Similarly, we demonstrated that the 1KG+7K panel achieved the highest IQS especially in the lower MAF bins in comparison to TOPMed (Fig. 1b). Overall, the 1KG+7K reference panel exhibited the best performance in our analysis of the leave-out 200 sample set with a mean concordance r2 = 0.919 and 75.8% of variants with r2 > 0.9 (Table 1).

Table 1 Quality of imputation assessment performed using a leave-out 200 Whole Genome Sequencing (WGS) approach on chromosomes 5 and 19
Fig. 1: Whole genome sequencing concordance for chromosomes 5 and 19.
figure 1

Comparison of the imputation quality of five different imputation panels (TOPMed, 1KG+1K, 1KG+3K, 1KG+4K, 1KG+7K). The x-axis represents the minor allele frequency bins, while the y-axis represents the represents (a) r2 and b IQS values plotted against MAF bins.

While the 1KG+1K panel produced higher quality imputation for low MAF variants relative to the TOPMed panel, when considering imputation quality across all MAF ranges (in terms of r2 values) the 1KG+1K panel fails to match the overall performance of TOPMed and the three additional population-specific panels. Comparing the reference panels based on the percentage of SNPs with concordance r2 > 0.9 highlights the noticeably lower imputation quality of the TOPMed panel relative to the 1KG+7K, 1KG+4K, and 1KG+3K panels. Imputation up to the 1KG+7K, 1KG+4K and the 1KG+3K panels produced similar outputs at this concordance r2 > 0.9 threshold (75.8%, 72.2% and 73.1%, respectively) representing substantial improvement in imputation quality compared to the 1KG+1K (66.4% of SNPs with r2 > 0.9) and the TOPMed panel (69.2% of SNPs with r2 > 0.9, Table 1). Moreover, the 1KG+7K panel exhibits higher average precision, recall, and concordance than the TOPMed panel. These additional metrics strengthens the suggestion that the 1KG+7K panel may provide more accurate imputation for the given data set (Methods, Table 1).

GWAS results for known loci associated with serum uric acid and total cholesterol levels

Throughout our comparisons of imputation quality metrics, the 1KG+7K panel outperformed the competing reference panels and provided a consistently high level of imputation quality. However, it is necessary to assess the impact of this improved imputation quality and the utility provided in the context of GWAS. Thus, we imputed ~180K individuals from BBJ up to both the 1KG+7K and TOPMed reference panels. Subsequently, we conducted GWAS analysis for two traits (serum uric acid and total cholesterol) in BBJ, utilizing subsets for which relevant phenotype data was available (Methods).

Phenotype data for serum uric acid levels was available for 104,174 individuals. Across 27 previously reported loci for uric acid in the Japanese population22, the lead variant at 20 loci was different after imputation up to the TOPMed and the 1KG+7K panels (Table S1). Similar to the lead SNPs reported in the TOPMed output, the majority of those reported in the 1KG+7K output exhibited similar MAF and r2 metrics, as well as comparable levels of association in terms of P values. There were, however, two notable exceptions. The first exception being the NRXN2-SLC22A12 locus, now defined by SNP rs121907892 (P = 2.4 × 10–1480) with a stronger level of association compared to the previous lead SNP rs57633992 (P = 7.3 × 10–845)21. The second exception was at the LOC101927932 locus, previously defined by SNP rs602657822. The alternative SNP based on the 1KG+7K output, rs202213319, is a common SNP (MAF = 16%), confidently imputed (r2 = 0.923) and displays a stronger degree of association (P = 8.2 × 10–12) compared to rs6026578 (MAF = 28%, r2 = 0.951, P = 9.7 × 10–10), the previous lead SNP.

For the GWAS of total cholesterol levels the total sample size included 122,407 individuals. Across the previously reported loci for total cholesterol in the Japanese population22, the lead SNP at 21 loci was different after imputation up to the TOPMed and the 1KG+7K panels (Table S2). The HS1BP3 locus, previously defined by SNP rs75352129 (P = 2.0 × 10–39), was an exception, now defined by SNP rs13306206 (P = 8.9 × 10–83) with a stronger level of association22.

Identification of potentially novel loci unique to the 1KG+7K imputation output

We assessed the evidence for additional loci attaining genome-wide significant association (P < 5 × 10–8) with serum uric acid levels. We identified 11 loci for which the lead variant did not map to a previously reported locus in BBJ (Table S3). Additionally, we used the findings of two trans-ancestry meta-analysis of 457,690 individuals23 and 473,241 individuals24, a meta-analysis of 121,745 Japanese individuals25, and a meta-analysis of 60,585 Korean individuals26 to identify loci that have been previously reported (Table S3).

Notably, we found that 5 out of the 11 loci exhibited significant association with serum uric acid levels when imputed up to the 1KG+7K panel, but not when imputed up to the TOPMed panel (Table 2, Figs. 2a, and S2). These variants include SRGAP2-AS1/LINC02798 (rs1156692814, MAF = 0.92%, P = 4.9 × 10–8); SLC22A13 (rs117371763, MAF = 4.83%, P = 1.8 × 10–10); PSORS1C1/PSORS1C2 (rs2233950, MAF = 4.12%, P = 1.4 × 10–8); MAP7 (rs78302547, MAF = 0.71%, P = 1.3 × 10–8); NRG4 (rs12595289, MAF = 17.5%, P = 5.0 × 10–10). With the exception of NRG4, these signals are represented by lead SNPs that fall into the rare variant category (MAF < 5%). Furthermore, these variants exhibited improved imputation quality, denoted by r2, in comparison to their counterparts in the TOPMed imputation output. Of these loci, it is worth noting that the protein encoded by SLC22A13 is a member of the organic anion transporter family, acts as a urate transporter to regulate urate levels in blood and is an integral membrane protein primarily found in epithelial cells of the proximal tubule of the kidney. The SLC22A13 lead variant from our study, rs117371763, has been implicated in decreased serum uric acid levels in a Japanese cohort. Additionally, other rare allelic variants in this gene have been associated with hyperuricemia and gout27,28,29,30.

Table 2 Lead variants from 1KG+7K mapping outside previously reported loci (Kanai et al., 2018) not significant in TOPMed
Fig. 2: Manhattan plot of genome-wide association studies for serum uric acid and total cholesterol levels.
figure 2

Manhattan plot of genome-wide association with (a) serum uric acid levels and (b) total cholesterol levels, conducted using 104,174 and 122,407 Japanese individuals from the Biobank Japan Project respectively. Each point represents a variant passing quality control (r2 > 0.3 and MAF > 0.1%), plotted with their association p-value (on a -log10 scale) as a function of genomic position (NCBI build 37). Association signals attaining genome-wide significance (p < 5 × 10–8), indicated by grey horizontal line, and significant associations unique to 1KG+7K are highlighted in red.

In addition to serum uric acid levels, we also assessed total cholesterol levels. We identified four potentially novel loci, the lead variants of which did not map to any previously reported locus in BBJ (Table S4). Upon comparison with the TOPMed based GWAS, we confirmed that one of the four novel loci-defining variants was absent in the TOPMed output, and the 1 Mb region around this locus did not yield any significant associations (Table 2, Figs. 2b, and S3). This significant locus PPIA/H2AZ2 (rs899749693, MAF = 0.23%, P = 2.7 × 10–9) showed an r2 > 0.873, while for TOPMed this variant was filtered out due to its low imputation quality.

Discussion

Improvement in imputation quality with imputation up to population-specific WGS data

The comparison of imputation quality across the TOPMed and population-specific reference panels first assessed the total imputation output of each panel across different MAF ranges through r2 metrics. We conducted benchmarking by performing a comparison analysis between our population-specific panels and the TOPMed panel. To validate the performance of the imputation panels, we used multiple matrices. By employing a leave-out method using 200 random high-depth WGS samples from the 1KG+3K reference panel, we evaluated aggregated r2, concordance rate, recall, precision, and IQS. These benchmarking matrices offer insights into data quality and each serve different purposes. In particular, the r2 value specifically measures the proportion of shared variance using dosage genotypes, while IQS evaluates the concordance between actual genotypes and hard-called genotypes, making it a more specialized tool for assessing imputation performance for rare variants31. By considering these metrics, we gained broader insights into the performance of our imputation panels.

The 1KG+7K panel exhibited the highest quality metrics across all measurements of imputation quality. The 1KG+3K panel, using only high depth WGS data exhibited the second highest performance, with the 1KG+4K and 1KG+1K reference panels following in that order. It appears that the total number of sequences in the 1KG+4K panel, using only low depth WGS data (3× depth) was more effective in comparison to the more limited 1KG+1K panel using only high depth (30× depth) WGS data. Additionally, the imputation quality measured by both r2 and IQS of 1KG+4K was comparable to the 1KG+3K across the MAF bins. These findings imply that a high number of low-depth WGS samples could potentially achieve an imputation quality similar to that of a moderate number of high-depth WGS samples.

Next, in our benchmarking process, we examined the MAF ranges. Regardless of the sample size or depth of the WGS used, imputation of rare and low-frequency variants was greatly improved when utilizing the WGS augmented reference panels. Imputation quality for common variants remained largely similar between the WGS augmented panels and the TOPMed panel, with only small increases observed when imputing up to the 1KG+7K panel. Thus, the top-performing panel based on these metrics was the 1KG+7K reference panel, providing the best imputation quality at all MAF ranges, especially at the lowest MAF ranges <5%.

WGS depth and reference panel design

Our study aimed to enhance imputation quality into the Japanese population by supplementing an established and genetically diverse reference panel with WGS data sourced from the target population. The WGS data in question is sourced from a subset of ~7K individuals from BBJ. WGS samples were selected in a manner to capture genetic diversity within the Japanese population, and so, were recruited from seven major regions (North, North-East, East, West, Central, South, and Okinawa) of Japan. Our results demonstrate that this approach, utilizing both high and low-depth WGS data, yields improvements in imputation quality, particularly for rare and low-frequency variants unique to the Japanese population. Augmenting the 1KG high-depth panel with a combination of high-depth WGS data (30× and 15×) provided improvement in both the number of variants imputed and imputation accuracy, especially for rare and low-frequency variants.

The 1KG+7K reference panel was designed to investigate the potential benefits or drawbacks of supplementing high-depth sequencing data with additional low-depth sequencing data. Utilizing 30×, 15×, and 3× depth WGS data, both high and low-depth data were processed separately before incorporation into the 1KG high-depth reference panel. Imputation using the 1KG+7K panel resulted in the highest number of imputed variants meeting the r2 thresholds of 0.3 and 0.9. Notably, performance gains in imputation quality were most pronounced in the lower MAF ranges, supporting improved imputation of rare/low-frequency variants. The panel’s large sample size sourced from the target population played a crucial role in effectively capturing variants with typically low minor allele counts (MAC) and supporting more accurate haplotype estimation.

Furthermore, our study observed comparable imputation quality for variants with MAF > 5% across all reference panels. These findings were anticipated, as common variants with high MAF and consequently high MAC can be adequately captured with smaller sample sizes and are not exclusive to specific population groups. However, the improved imputation of rare/low-frequency variants is significant enough to distinguish the 1KG+7K panel. The primary goal of developing such population-specific reference panels is to enhance the imputation quality for population-specific variants, which often have low MAC in publicly available reference panels such as TOPMed, leading to low-quality imputation.

Previous publications have shared the same general conclusions that the use of population-specific WGS data can facilitate a higher quality imputation, an effect most pronounced at low MAF ranges6,7,8,9,10,11,12,13,14,15,16,17,18. However, these studies assessed the imputation quality achieved when imputing up to a standard publicly available reference panel, a study-specific reference panel, and a combination of a publicly available panel and their own study-specific WGS data. Our results present a comparison between multiple iterations of a population-specific reference panel, allowing for a greater understanding of how differing WGS sample set sizes and sequencing depths effect the improvements in imputation quality observed when supplementing a reference panel such as the 1KG with WGS data. The comparisons across the four population-specific reference panels show that, given that the data is processed separately, high-depth WGS data can be supplemented with additional low-depth WGS data when designing population-specific reference panels, to maximise the overall sample set size and facilitate higher quality imputation. Furthermore, our results provide some insight into potential resource allocation when designing population-specific reference panels. The use of high-depth WGS data supports imputation up to a greater total number of variants. However, if the primary objective is solely to improve imputation quality, low-depth WGS data appears to emerge as a more cost-effective alternative in situations where resources are limited. This is supported in our benchmarking analysis (Table 1) where we observed comparable imputation quality metrics between the 1KG+3K (high-depth) and 1KG+4K (low depth) reference panels.

GWAS into serum uric acid and total cholesterol levels following imputation up to the 1KG+7K and TOPMed reference panels

To further assess the capabilities of the 1KG+7K panel and the benefits of improved imputation quality in comparison to the TOPMed panel, we imputed ~180K samples from BBJ up to both reference panels. We selected two traits (serum uric acid and total cholesterol levels) due to the large number of samples for which phenotype data was available and the plethora of known associations previously reported in BBJ23,25,27. Additionally, these traits have clinical relevance in common diseases such as gout (serum uric acid) and cardiovascular disease (total cholesterol level)32,33,34,35.

A three-way comparison between the results of Kanai et al.22, and our own GWAS using imputation outputs derived from the TOPMed and 1KG+7K reference panels provided an insight into the performance of the 1KG+7K panel, concerning the identification of previously reported trait-associated loci. Our results demonstrate that for the majority of known loci across the two traits, the lead SNPs reported in all three sets of results were comparable in terms of signal strength and MAF. These results were as expected as trait-associated loci identified through single variant analysis are typically defined by common variants with relatively high MAF. Therefore, the improved imputation quality at low MAF ranges of the 1KG+7K panel will be somewhat limited for the purpose of identifying novel lead variants in previously reported loci, and we would expect comparable results to those reported when imputing up to the TOPMed panel.

Impact of improved imputation quality of the 1KG+7K panel on novel locus discovery

Our GWAS into serum uric and total cholesterol in BBJ identified multiple instances of novel associations. These associations were either unique to the WGS data of the 1KG+7K panel and absent in the TOPMed imputation output, or present in both imputation outputs but achieving genome-wide significance only in the 1KG+7K output. In summary, 5/11 novel loci for serum uric acid and 1/4 novel loci for total cholesterol were unique to the 1KG+7K imputation output. These findings suggest that the 1KG+7K imputation panel can significantly influence the detection of genetic variants associated with a specific trait, particularly those in the rare variant category (MAF < 5%). These results are complimentary to those from the benchmarking analysis, indicating that the 1KG+7K panel may provide a more accurate representation of these specific genetic variants. This enhanced capacity to detect novel associations, coupled with the improved imputation quality of loci defined by low-frequency variants and further augmented by additional variants unique to the Japanese WGS data, suggests that imputation using the 1KG+7K panel offers increased support for the discovery of trait-associated loci in BBJ compared to imputation using the TOPMed panel.

The results of this work suggest that although imputing up to the 1KG+7K panel was beneficial in GWAS of a well-studied trait, the greatest advantages may lie in analysing rare and low-frequency variants. Future association analyses focusing on these variants present the best opportunity to demonstrate the benefits of improved imputation quality afforded by the 1KG+7K reference panel. Notably, other studies have utilized our 1KG+7K reference panel, yielding promising results in identifying rare variant associations in the Japanese population for various traits beyond serum uric acid levels. Including inguinal hernias36, atopic dermatitis37, risk of fatal myocardial infarction38, knee extension strength39, pelvic organ prolapse40, and systemic lupus erythematosus41,42, and for analysing the genetic structure of Japanese population at a finer scale43. Through this work we have demonstrated that WGS datasets of substantially different depths can be merged in the construction of population-specific reference panels, thereby maximising the overall sample size without compromising the imputation quality achieved. By doing so, we have established a resource to support in-depth analysis of complex traits within the Japanese population.

Methods

Biobank Japan (BBJ) project

All individuals participating in this study were collected through the BBJ Project, which is a multi-institutional hospital-based registry. Approximately 200,000 patients diagnosed with at least one of 47 common diseases were recruited at 66 hospitals across Japan from 2003 through 2008. DNA, serum and clinical information were collected as previously described1,3. Written informed consent was obtained from all participants as approved by the ethics committees of the RIKEN Center for Integrative Medical Sciences. All ethical regulations relevant to human research participants were followed.

Reference panel preparation

A subset of 7517 Japanese individuals from BBJ were whole genome sequenced at various depths: (i) 1502 individuals at 30× depth using Illumina Hiseq 2500 Rapid or Illumina Hiseq 2500 V4 platform; (ii) 1786 individuals at 15× depth using Illumina Hiseq X Five platform; and (iii) 4229 individuals at 3× depth using Illumina Hiseq 2500 V4. Three sets of sequenced data were prepared: (i) a subset of 1045 individuals (30×); (ii) a joint set of 1502 (30×) and 1786 (15×) individuals; and (iii) a set of 4229 individuals (3×); and each dataset was processed according to the standardised best practice method proposed by GATK v.3.2-244. The samples were recruited across seven regions of Japan: North, Northeast, East, West, Central, South, and Okinawa as representative of the Japanese population (Table S5). As briefly explained elsewhere10, we set exclusion criteria to control quality of genotypes sequenced at high depth (30× and 15×) as follows: (i) depth of coverage (DP) <5; (ii) genotype quality (GQ) <20; or (iii) DP > 60 and GQ < 95. We set these genotypes as missing and excluded variants with call rate <90% before conducting variant quality score recalibration (VQSR). After running VSQR implemented by GATK, variants located in low-complexity regions (LCR), as defined by mdust software (“hs37d5-LCRs.20140224.bed” downloaded from http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/low_complexity_regions/hs37d5-LCRs.20140224.bed.gz), were additionally excluded in the high depth (30×) dataset of 1045 individuals. Finally, the missing genotypes were imputed using BEAGLE v.4.145. For the low depth (3×) sequence dataset, BAM files obtained by GATK were processed through the best practice of the variant calling pipeline of GotCloud v.1.17.546.

Samples were filtered-out on the following six conditions: (i) “genomic DNA failure” was found in sequencing summary; (ii) read quality control errors were found in sequencing index, fraction of uniformity coverage, GC bias, fraction of duplicate reads and low read depth; (iii) duplicates found by identity-by-descent check; (iv) extreme heterozygosity (>12.5 SD from the mean); (v) low concordance with genotyping array; and/or (vi) high contamination rate (CHIPMIX ≥ 0.25 and FREEMIX ≥ 0.25) as measured by verify BamID. After filtering, 7472 “clean” samples remained.

We next merged the 1KG high-depth panel with four different combinations of the Japanese WGS data (Fig. S1). We obtained joint-called VCFs based on the hg19 reference from the high-depth 1KGP data made available by Google Genomics (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP)47. Multi-allelic sites were removed, and general quality control was performed for all WGS sets and were then phased by SHAPEIT5 v5.1.148. After pre-phasing, reciprocal imputation was performed between the 1KG high-depth and each WGS set with minimac4 v4.1.549. The final reference panels were generated by merging the reciprocally imputed files, 1KG high-depth and each WGS set. For the 1KG+7K panel, cleaned and phased 3K and 4K files were first merged by BCFTOOLS v1.15.150 before reciprocally imputing and merging with 1KG high-depth file using the IMMerge tool. Lastly, we imputed the BBJ data using the TOPMed Imputation Server r2 according to the default parameters (https://imputation.biodatacatalyst.nhlbi.nih.gov).

Comparison of imputation quality between reference panels

To perform the benchmarking, firstly, for the leave-out method, we used the SNP array genotype file from 200 random samples and conducted imputation using the 1KG high-depth reference panel, as well as the WGS-enhanced reference panels (1KG+1K, 1KG+3K, 1KG+4K, 1KG+7K) using minimac4 v4.1.5. It is important to note that before imputing the 200 random samples, we removed them from each panel and phased the updated panels to ensure an unbiased analysis.

Regarding the TOPMed data, the files used for this analysis had already been imputed using 180,882 BBJ samples. Therefore, we extracted the 200 samples that were found in this dataset, converted the coordinates to hg19, and utilized the genotypes from these 200 WGS samples for the subsequent concordance analysis.

After imputation, we filtered the variants based on an r2 threshold of >0.3, focusing on variants that were shared between the genotype files imputed up to the different reference panels. These shared variants were retained for further analyses. To assess the concordance between the imputed genotypes and the true genotypes generated by WGS, we performed a concordance analysis using the GLIMPSE2 software. GLIMPSE2 performs concordance analysis by taking two sets of genetic data as input, in our case, the imputed WGS-enhanced reference panels and TOPMed, and compares them against the true genotypes from the 200 WGS samples. The software matches variants based on genomic coordinates and compares genotypes. It then calculates concordance metrics for each pair of variants by determining the frequency of each possible two-locus genotype in the sample population. These frequencies are used to compute the observed and expected frequencies of each allele, which in turn are used to calculate the r2 value. After running GLIMPSE2 concordance, we categorized the variants into bins based on their MAF to visualize the concordance rate across the spectrum of rare to common variants.

The total output for each reference panel was assessed by examining the relationship between mean MAF and mean r2 concordance values across the MAF bins to capture imputation quality across the MAF spectrum. Following this, we compared the total number of variants falling in the r2 categories of r2 > 0.3, r2 > 0.7, and r2 > 0.9 for each panel’s imputation output. We also calculated the concordance, precision, recall and IQS. From the additional output imputation quality metrics, concordance assesses the similarity between the imputed and actual data; precision examines the proportion of correctly imputed positive observations out of all predicted positives; and recall, also known as sensitivity, measures the ratio of correctly identified positive results out of all actual positive results21. IQS is calculated using the ratios of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) from imputation results, providing a score that reflects the quality of the imputation on a scale from 0 to 131.

Genome-wide association study of serum uric acid and total cholesterol levels

We extracted 181,927 individuals from BBJ who were genotyped using either (i) the Illumina HumanOmniExpressExome BeadChip or (ii) a combination of Illumina HumanOmniExpress and HumanExome arrays and imputed them up to the 1KG+7K panel. Quality control was performed by removing individuals who: withdrew consent; had call rates <98%; discordant genotypes; gender-mismatch; closely related samples; and/or non-East Asian ancestry. Any samples overlapping with those in the reference panels were also removed. Quality control on variants excluded those with call rate <99%; number of heterozygotes <5; extreme deviation from Hardy–Weinberg equilibrium (exact P < 1 × 10–6); palindromic variants; and/or non-autosomal variants.

From the 181,927 BBJ samples, we extracted individuals with phenotype data recorded for serum uric acid and total cholesterol levels. After imputation, variants with r2 < 0.3 were excluded from downstream association analyses. Serum uric acid and total cholesterol levels, measured by routine laboratory examination, were retrieved from medical records, and were adjusted for age, sex, ten principal components of genetic ancestry and disease status of the 47 common diseases recorded in BBJ. The resulting residuals were then normalized to Z-scores. Each variant passing quality control was tested for association (additive genotype dosage) with Z-scores in a linear regression model using PLINKv251.

We first considered 27 and 32 previously reported loci for serum uric acid and total cholesterol in the Japanese population respectively22. We compared the lead variants identified after imputation up to the TOPMed panel and the merged 1KG+7K panel. We then considered lead variants attaining genome-wide significance after imputation up to the merged 1KG+7K panel that did not map to the previously reported loci in the Japanese population. We also excluded variants with MAF < 0.1%. We compared the lead variants at these loci with those previously reported in GWAS of serum uric acid and total cholesterol and identified the variants that were not significant when imputing up to TOPMed panel but became significant when imputing up to 1KG+7K panel.

Statistics and reproducibility

To evaluate imputation accuracy, we performed a leave-out analysis using 200 high-depth WGS samples selected from the combined 1KG+3K reference panel. Imputation accuracy metrics, including aggregated r2, concordance rate, recall, precision, and Imputation Quality Score (IQS), were calculated across MAF bins. GWAS for serum uric acid levels and total cholesterol levels were conducted using 104,174 and 122,407 samples respectively from BBJ Project. Both phenotypes were adjusted for age, sex, ten principal components of genetic ancestry, and disease status of the 47 common diseases recorded in BBJ. The residuals were normalized to Z-scores. Association analyses were performed under an additive model using linear regression of genotype dosage on the normalized residuals, utilizing PLINK version 2.0. Genome-wide significance was defined as P < 5.0 × 10–8. All analyses were conducted using default parameters unless otherwise specified, and summary statistics have been deposited in GWAS catalog.