Population-specific reference panel improves imputation quality for genome-wide association studies conducted on the Japanese population

Flanagan, Jack; Liu, Xiaoxi; Ortega-Reyes, David; Tomizuka, Kohei; Matoba, Nana; Akiyama, Masato; Koido, Masaru; Ishigaki, Kazuyoshi; Ashikawa, Kyota; Takata, Sadaaki; Shi, MingYang; Aoi, Tomomi; Momozawa, Yukihide; Ito, Kaoru; Murakami, Yoshinori; Matsuda, Koichi; Kamatani, Yoichiro; Morris, Andrew P.; Horikoshi, Momoko; Terao, Chikashi

doi:10.1038/s42003-024-07338-4

Download PDF

Article
Open access
Published: 19 December 2024

Population-specific reference panel improves imputation quality for genome-wide association studies conducted on the Japanese population

Communications Biology volume 7, Article number: 1665 (2024) Cite this article

5097 Accesses
6 Citations
1 Altmetric
Metrics details

Subjects

Abstract

To improve imputation quality for genome-wide association studies (GWAS) conducted on the Japanese population, we developed and evaluated four Japanese population-specific reference panels. These panels were constructed through the augmentation of the 1000 Genomes Project (1KG) panel using Japanese whole genome sequencing (WGS) data, with sample sizes ranging from 1 K to 7 K individuals enrolled through the Biobank Japan (BBJ) project, and sequencing depths ranging from 3× to 30×. Among these panels, an augmented reference panel comprising 7472 WGS samples of mixed depth (1KG+7K) exhibit the greatest improvement in imputation quality relative to the Trans-Omics for Precision Medicine (TOPMed) reference panel. Notably, we observe these improvements primarily for rare variants with a minor allele frequency (MAF) <5%. To demonstrate the benefits of improved imputation quality in association analyses of complex traits, we conducted GWAS for serum uric acid and total cholesterol levels following imputation up to the 1KG+7K panel. The analysis reveals several loci reaching genome-wide significance (P < 5 × 10^–8) in the 1KG+7K imputation output yet remaining undetected when the same sample set is imputed up to the TOPMed reference panel. In summary, the 1KG+7K panel demonstrates significant advantages in the discovery of trait-associated loci, particularly those influenced by low-frequency association signals.

A diverse ancestrally-matched reference panel increases genotype imputation accuracy in a underrepresented population

Article Open access 31 July 2023

The impact of non-additive genetic associations on age-related complex diseases

Article Open access 23 April 2021

Rapid and accurate multi-phenotype imputation for millions of individuals

Article Open access 04 January 2025

Introduction

In the context of genome-wide association studies (GWAS), imputation refers to the process of predicting genotypes that are not directly assayed in a sample of individuals¹. By using linkage disequilibrium (LD) data from a typically extensive reference panel, imputation allows for the inference of untyped genotypes from a limited selection of array SNPs. Thus, imputation increases the number of variants that can be tested for association in a GWAS, improving power to detect loci that contribute to a complex trait and facilitating large-scale meta-analysis, as well as the fine-mapping of underlying causal variants². The quality of imputation depends on several factors that include: (i) the number of haplotypes, density of variants, and the size of the reference panel; (ii) the sample size of the GWAS and the properties of the genotyping array; (iii) and the genetic similarity of individuals in the reference panel and GWAS.

Over the last years, one of the most widely used reference panels for imputation has utilized haplotypes derived from WGS data (3× depth) from the 1000 genome project (1KG)³. The reference panel comprises 2504 individuals from 26 populations of African, Admixed American, East Asian, Non-Finish European, and South Asian ancestry, resulting in a dataset of over 88 million variants, and has been demonstrated to enable high-quality imputation into these ancestry groups for variants with minor allele frequency (MAF) as low as 0.5%. However, the 1KG dataset mainly focuses on common variants, and rare and low-frequency variants in specific target populations may be underrepresented. This can lead to imputation inaccuracies in populations that are poorly represented in the reference panel⁴. Two common approaches to reference panel design aim to increase the total sample size from which haplotype and variant information can be derived. One approach involves creating large, publicly available reference panels such as the Trans-Omics for Precision Medicine (TOPMed) panel, which, as of 2024, comprises 133,597 samples⁵. These panels amalgamate existing WGS datasets, encompassing diverse population groups falling under the broad categories of European, African, Hispanic/Latino, Asian, and Admixed ancestry⁶. Conversely, an alternative approach adopts a more targeted strategy. This approach leverages population-specific WGS data in conjunction with an existing genetically diverse reference panel. This tailored approach aims for the optimization of reference panels to enhance imputation accuracy within specific population groups despite an overall smaller sample size relative to panels such as TOPMed.

The integration of additional ancestry- or population-specific WGS data with the 1KG panel has been shown to enhance the imputation quality of population-specific lower-frequency variants. This improvement is achieved by increasing the number of reference haplotypes available and enhancing their genetic similarity with the samples used in genome-wide association studies (GWAS). Examples of such population-specific reference panels have been mostly based on European populations^7,8,9,10,11, and few based on East Asian^12,13,14 African¹⁵ South Asian¹⁶, and multi-ancestry^17,18 populations. Across these studies, various WGS depths were used to augment existing reference panels, but comparisons between the efficacy of WGS data of different depths and combinations thereof was relatively limited. This has the potential to be a key area for consideration when designing population-specific reference panels as quantifying the difference in improvements in imputation quality between WGS of varying depth could potentially aid in the guidance of panel design and the allocation of resources between the total number of WGS samples vs the depth at which each sample is sequenced.

The primary aim of this study is to establish an effective approach to improving imputation quality into Japanese GWAS utilizing WGS data available from Biobank Japan (BBJ)¹⁹. With the development of multiple Japanese-specific reference panels, we aim to compare the effects of incorporating high and low depth sequencing data²⁰. Additionally, this study aims to demonstrate the improvements in discovery and statistical power through GWAS into serum uric acid and total cholesterol levels within BBJ, offered by the optimal Japanese population-specific reference panel over the TOPMed reference panel.

Results

Comparative analysis of imputation accuracy between TOPMed and 1KG-WGS enhanced panels

To enhance imputation quality into GWAS conducted in the BBJ Project, we assembled WGS data from a subset of 7517 Japanese individuals from BBJ. Following quality control, we retained 1491 individuals sequenced at 30× depth; 1765 individuals sequenced at 15× depth; and 4216 individuals sequenced at 3× depth (Methods, Fig. S1). We merged the 1KG high-depth panel¹⁷ with four different combinations of the Japanese population-specific WGS by reciprocal imputation: 1037 individuals at 30× depth (1KG+1K panel); 3256 individuals at 30× or 15× depth (1KG+3K panel); 4216 individuals at 3× depth (1KG+4K panel); and 7472 individuals at 30×, 15× or 3× depth (1KG+7K panel) (Methods).

Following the construction of the Japanese population-specific reference panels, we evaluated the imputation performance of our panels against that of TOPMed. We employed a leave-out strategy, selecting 200 high-depth WGS samples from the 1KG+3K reference panel for this purpose. We imputed variants on chromosomes 5 and 19 using array data for each panel. By comparing these imputed variants with the true genotypes found in WGS data, we initially assessed the imputation accuracy (aggregated r²), categorizing it by minor allele frequency (MAF) bins, ranging from rare to common variants (Methods). The Japanese WGS-enhanced 1KG panels outperformed TOPMed specifically for variants with a MAF of less than 5% (Table 1, Fig. 1a). The most significant performance gain was observed in the 1KG+7K reference panel, followed by 1KG+3K, 1KG+4K, and 1KG+1K. In contrast, for common variants with a MAF over 5%, the imputation accuracy of the TOPMed reference panel was comparable to that of the population-specific panels. In addition to aggregated r², we calculated the concordance, recall, precision, and Imputation Quality Score (IQS). The IQS is derived from the ratios of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) in the imputation results. This provides a score that quantifies the imputation quality on a scale from 0 to 1 and is complementary to the r² (Methods)²¹. Similarly, we demonstrated that the 1KG+7K panel achieved the highest IQS especially in the lower MAF bins in comparison to TOPMed (Fig. 1b). Overall, the 1KG+7K reference panel exhibited the best performance in our analysis of the leave-out 200 sample set with a mean concordance r² = 0.919 and 75.8% of variants with r² > 0.9 (Table 1).

Table 1 Quality of imputation assessment performed using a leave-out 200 Whole Genome Sequencing (WGS) approach on chromosomes 5 and 19

Full size table

**Fig. 1: Whole genome sequencing concordance for chromosomes 5 and 19.**

While the 1KG+1K panel produced higher quality imputation for low MAF variants relative to the TOPMed panel, when considering imputation quality across all MAF ranges (in terms of r² values) the 1KG+1K panel fails to match the overall performance of TOPMed and the three additional population-specific panels. Comparing the reference panels based on the percentage of SNPs with concordance r² > 0.9 highlights the noticeably lower imputation quality of the TOPMed panel relative to the 1KG+7K, 1KG+4K, and 1KG+3K panels. Imputation up to the 1KG+7K, 1KG+4K and the 1KG+3K panels produced similar outputs at this concordance r² > 0.9 threshold (75.8%, 72.2% and 73.1%, respectively) representing substantial improvement in imputation quality compared to the 1KG+1K (66.4% of SNPs with r² > 0.9) and the TOPMed panel (69.2% of SNPs with r² > 0.9, Table 1). Moreover, the 1KG+7K panel exhibits higher average precision, recall, and concordance than the TOPMed panel. These additional metrics strengthens the suggestion that the 1KG+7K panel may provide more accurate imputation for the given data set (Methods, Table 1).

GWAS results for known loci associated with serum uric acid and total cholesterol levels

Throughout our comparisons of imputation quality metrics, the 1KG+7K panel outperformed the competing reference panels and provided a consistently high level of imputation quality. However, it is necessary to assess the impact of this improved imputation quality and the utility provided in the context of GWAS. Thus, we imputed ~180K individuals from BBJ up to both the 1KG+7K and TOPMed reference panels. Subsequently, we conducted GWAS analysis for two traits (serum uric acid and total cholesterol) in BBJ, utilizing subsets for which relevant phenotype data was available (Methods).

Phenotype data for serum uric acid levels was available for 104,174 individuals. Across 27 previously reported loci for uric acid in the Japanese population²², the lead variant at 20 loci was different after imputation up to the TOPMed and the 1KG+7K panels (Table S1). Similar to the lead SNPs reported in the TOPMed output, the majority of those reported in the 1KG+7K output exhibited similar MAF and r² metrics, as well as comparable levels of association in terms of P values. There were, however, two notable exceptions. The first exception being the NRXN2-SLC22A12 locus, now defined by SNP rs121907892 (P = 2.4 × 10^–1480) with a stronger level of association compared to the previous lead SNP rs57633992 (P = 7.3 × 10^–845)²¹. The second exception was at the LOC101927932 locus, previously defined by SNP rs6026578²². The alternative SNP based on the 1KG+7K output, rs202213319, is a common SNP (MAF = 16%), confidently imputed (r² = 0.923) and displays a stronger degree of association (P = 8.2 × 10^–12) compared to rs6026578 (MAF = 28%, r² = 0.951, P = 9.7 × 10^–10), the previous lead SNP.

For the GWAS of total cholesterol levels the total sample size included 122,407 individuals. Across the previously reported loci for total cholesterol in the Japanese population²², the lead SNP at 21 loci was different after imputation up to the TOPMed and the 1KG+7K panels (Table S2). The HS1BP3 locus, previously defined by SNP rs75352129 (P = 2.0 × 10^–39), was an exception, now defined by SNP rs13306206 (P = 8.9 × 10^–83) with a stronger level of association²².

Identification of potentially novel loci unique to the 1KG+7K imputation output

We assessed the evidence for additional loci attaining genome-wide significant association (P < 5 × 10^–8) with serum uric acid levels. We identified 11 loci for which the lead variant did not map to a previously reported locus in BBJ (Table S3). Additionally, we used the findings of two trans-ancestry meta-analysis of 457,690 individuals²³ and 473,241 individuals²⁴, a meta-analysis of 121,745 Japanese individuals²⁵, and a meta-analysis of 60,585 Korean individuals²⁶ to identify loci that have been previously reported (Table S3).

Notably, we found that 5 out of the 11 loci exhibited significant association with serum uric acid levels when imputed up to the 1KG+7K panel, but not when imputed up to the TOPMed panel (Table 2, Figs. 2a, and S2). These variants include SRGAP2-AS1/LINC02798 (rs1156692814, MAF = 0.92%, P = 4.9 × 10^–8); SLC22A13 (rs117371763, MAF = 4.83%, P = 1.8 × 10^–10); PSORS1C1/PSORS1C2 (rs2233950, MAF = 4.12%, P = 1.4 × 10^–8); MAP7 (rs78302547, MAF = 0.71%, P = 1.3 × 10^–8); NRG4 (rs12595289, MAF = 17.5%, P = 5.0 × 10^–10). With the exception of NRG4, these signals are represented by lead SNPs that fall into the rare variant category (MAF < 5%). Furthermore, these variants exhibited improved imputation quality, denoted by r², in comparison to their counterparts in the TOPMed imputation output. Of these loci, it is worth noting that the protein encoded by SLC22A13 is a member of the organic anion transporter family, acts as a urate transporter to regulate urate levels in blood and is an integral membrane protein primarily found in epithelial cells of the proximal tubule of the kidney. The SLC22A13 lead variant from our study, rs117371763, has been implicated in decreased serum uric acid levels in a Japanese cohort. Additionally, other rare allelic variants in this gene have been associated with hyperuricemia and gout^27,28,29,30.

Table 2 Lead variants from 1KG+7K mapping outside previously reported loci (Kanai et al., 2018) not significant in TOPMed

Full size table

**Fig. 2: Manhattan plot of genome-wide association studies for serum uric acid and total cholesterol levels.**

In addition to serum uric acid levels, we also assessed total cholesterol levels. We identified four potentially novel loci, the lead variants of which did not map to any previously reported locus in BBJ (Table S4). Upon comparison with the TOPMed based GWAS, we confirmed that one of the four novel loci-defining variants was absent in the TOPMed output, and the 1 Mb region around this locus did not yield any significant associations (Table 2, Figs. 2b, and S3). This significant locus PPIA/H2AZ2 (rs899749693, MAF = 0.23%, P = 2.7 × 10^–9) showed an r² > 0.873, while for TOPMed this variant was filtered out due to its low imputation quality.

Discussion

Improvement in imputation quality with imputation up to population-specific WGS data

The comparison of imputation quality across the TOPMed and population-specific reference panels first assessed the total imputation output of each panel across different MAF ranges through r² metrics. We conducted benchmarking by performing a comparison analysis between our population-specific panels and the TOPMed panel. To validate the performance of the imputation panels, we used multiple matrices. By employing a leave-out method using 200 random high-depth WGS samples from the 1KG+3K reference panel, we evaluated aggregated r², concordance rate, recall, precision, and IQS. These benchmarking matrices offer insights into data quality and each serve different purposes. In particular, the r² value specifically measures the proportion of shared variance using dosage genotypes, while IQS evaluates the concordance between actual genotypes and hard-called genotypes, making it a more specialized tool for assessing imputation performance for rare variants³¹. By considering these metrics, we gained broader insights into the performance of our imputation panels.

The 1KG+7K panel exhibited the highest quality metrics across all measurements of imputation quality. The 1KG+3K panel, using only high depth WGS data exhibited the second highest performance, with the 1KG+4K and 1KG+1K reference panels following in that order. It appears that the total number of sequences in the 1KG+4K panel, using only low depth WGS data (3× depth) was more effective in comparison to the more limited 1KG+1K panel using only high depth (30× depth) WGS data. Additionally, the imputation quality measured by both r² and IQS of 1KG+4K was comparable to the 1KG+3K across the MAF bins. These findings imply that a high number of low-depth WGS samples could potentially achieve an imputation quality similar to that of a moderate number of high-depth WGS samples.

Next, in our benchmarking process, we examined the MAF ranges. Regardless of the sample size or depth of the WGS used, imputation of rare and low-frequency variants was greatly improved when utilizing the WGS augmented reference panels. Imputation quality for common variants remained largely similar between the WGS augmented panels and the TOPMed panel, with only small increases observed when imputing up to the 1KG+7K panel. Thus, the top-performing panel based on these metrics was the 1KG+7K reference panel, providing the best imputation quality at all MAF ranges, especially at the lowest MAF ranges <5%.

WGS depth and reference panel design

Our study aimed to enhance imputation quality into the Japanese population by supplementing an established and genetically diverse reference panel with WGS data sourced from the target population. The WGS data in question is sourced from a subset of ~7K individuals from BBJ. WGS samples were selected in a manner to capture genetic diversity within the Japanese population, and so, were recruited from seven major regions (North, North-East, East, West, Central, South, and Okinawa) of Japan. Our results demonstrate that this approach, utilizing both high and low-depth WGS data, yields improvements in imputation quality, particularly for rare and low-frequency variants unique to the Japanese population. Augmenting the 1KG high-depth panel with a combination of high-depth WGS data (30× and 15×) provided improvement in both the number of variants imputed and imputation accuracy, especially for rare and low-frequency variants.

The 1KG+7K reference panel was designed to investigate the potential benefits or drawbacks of supplementing high-depth sequencing data with additional low-depth sequencing data. Utilizing 30×, 15×, and 3× depth WGS data, both high and low-depth data were processed separately before incorporation into the 1KG high-depth reference panel. Imputation using the 1KG+7K panel resulted in the highest number of imputed variants meeting the r² thresholds of 0.3 and 0.9. Notably, performance gains in imputation quality were most pronounced in the lower MAF ranges, supporting improved imputation of rare/low-frequency variants. The panel’s large sample size sourced from the target population played a crucial role in effectively capturing variants with typically low minor allele counts (MAC) and supporting more accurate haplotype estimation.

Furthermore, our study observed comparable imputation quality for variants with MAF > 5% across all reference panels. These findings were anticipated, as common variants with high MAF and consequently high MAC can be adequately captured with smaller sample sizes and are not exclusive to specific population groups. However, the improved imputation of rare/low-frequency variants is significant enough to distinguish the 1KG+7K panel. The primary goal of developing such population-specific reference panels is to enhance the imputation quality for population-specific variants, which often have low MAC in publicly available reference panels such as TOPMed, leading to low-quality imputation.

Previous publications have shared the same general conclusions that the use of population-specific WGS data can facilitate a higher quality imputation, an effect most pronounced at low MAF ranges^{6,7,8,9,10,11,12,13,14,15,16,17,18}. However, these studies assessed the imputation quality achieved when imputing up to a standard publicly available reference panel, a study-specific reference panel, and a combination of a publicly available panel and their own study-specific WGS data. Our results present a comparison between multiple iterations of a population-specific reference panel, allowing for a greater understanding of how differing WGS sample set sizes and sequencing depths effect the improvements in imputation quality observed when supplementing a reference panel such as the 1KG with WGS data. The comparisons across the four population-specific reference panels show that, given that the data is processed separately, high-depth WGS data can be supplemented with additional low-depth WGS data when designing population-specific reference panels, to maximise the overall sample set size and facilitate higher quality imputation. Furthermore, our results provide some insight into potential resource allocation when designing population-specific reference panels. The use of high-depth WGS data supports imputation up to a greater total number of variants. However, if the primary objective is solely to improve imputation quality, low-depth WGS data appears to emerge as a more cost-effective alternative in situations where resources are limited. This is supported in our benchmarking analysis (Table 1) where we observed comparable imputation quality metrics between the 1KG+3K (high-depth) and 1KG+4K (low depth) reference panels.

GWAS into serum uric acid and total cholesterol levels following imputation up to the 1KG+7K and TOPMed reference panels

To further assess the capabilities of the 1KG+7K panel and the benefits of improved imputation quality in comparison to the TOPMed panel, we imputed ~180K samples from BBJ up to both reference panels. We selected two traits (serum uric acid and total cholesterol levels) due to the large number of samples for which phenotype data was available and the plethora of known associations previously reported in BBJ^23,25,27. Additionally, these traits have clinical relevance in common diseases such as gout (serum uric acid) and cardiovascular disease (total cholesterol level)^32,33,34,35.

A three-way comparison between the results of Kanai et al.²², and our own GWAS using imputation outputs derived from the TOPMed and 1KG+7K reference panels provided an insight into the performance of the 1KG+7K panel, concerning the identification of previously reported trait-associated loci. Our results demonstrate that for the majority of known loci across the two traits, the lead SNPs reported in all three sets of results were comparable in terms of signal strength and MAF. These results were as expected as trait-associated loci identified through single variant analysis are typically defined by common variants with relatively high MAF. Therefore, the improved imputation quality at low MAF ranges of the 1KG+7K panel will be somewhat limited for the purpose of identifying novel lead variants in previously reported loci, and we would expect comparable results to those reported when imputing up to the TOPMed panel.

Impact of improved imputation quality of the 1KG+7K panel on novel locus discovery

Our GWAS into serum uric and total cholesterol in BBJ identified multiple instances of novel associations. These associations were either unique to the WGS data of the 1KG+7K panel and absent in the TOPMed imputation output, or present in both imputation outputs but achieving genome-wide significance only in the 1KG+7K output. In summary, 5/11 novel loci for serum uric acid and 1/4 novel loci for total cholesterol were unique to the 1KG+7K imputation output. These findings suggest that the 1KG+7K imputation panel can significantly influence the detection of genetic variants associated with a specific trait, particularly those in the rare variant category (MAF < 5%). These results are complimentary to those from the benchmarking analysis, indicating that the 1KG+7K panel may provide a more accurate representation of these specific genetic variants. This enhanced capacity to detect novel associations, coupled with the improved imputation quality of loci defined by low-frequency variants and further augmented by additional variants unique to the Japanese WGS data, suggests that imputation using the 1KG+7K panel offers increased support for the discovery of trait-associated loci in BBJ compared to imputation using the TOPMed panel.

The results of this work suggest that although imputing up to the 1KG+7K panel was beneficial in GWAS of a well-studied trait, the greatest advantages may lie in analysing rare and low-frequency variants. Future association analyses focusing on these variants present the best opportunity to demonstrate the benefits of improved imputation quality afforded by the 1KG+7K reference panel. Notably, other studies have utilized our 1KG+7K reference panel, yielding promising results in identifying rare variant associations in the Japanese population for various traits beyond serum uric acid levels. Including inguinal hernias³⁶, atopic dermatitis³⁷, risk of fatal myocardial infarction³⁸, knee extension strength³⁹, pelvic organ prolapse⁴⁰, and systemic lupus erythematosus^41,42, and for analysing the genetic structure of Japanese population at a finer scale⁴³. Through this work we have demonstrated that WGS datasets of substantially different depths can be merged in the construction of population-specific reference panels, thereby maximising the overall sample size without compromising the imputation quality achieved. By doing so, we have established a resource to support in-depth analysis of complex traits within the Japanese population.

Methods

Biobank Japan (BBJ) project

All individuals participating in this study were collected through the BBJ Project, which is a multi-institutional hospital-based registry. Approximately 200,000 patients diagnosed with at least one of 47 common diseases were recruited at 66 hospitals across Japan from 2003 through 2008. DNA, serum and clinical information were collected as previously described^1,3. Written informed consent was obtained from all participants as approved by the ethics committees of the RIKEN Center for Integrative Medical Sciences. All ethical regulations relevant to human research participants were followed.

Reference panel preparation

A subset of 7517 Japanese individuals from BBJ were whole genome sequenced at various depths: (i) 1502 individuals at 30× depth using Illumina Hiseq 2500 Rapid or Illumina Hiseq 2500 V4 platform; (ii) 1786 individuals at 15× depth using Illumina Hiseq X Five platform; and (iii) 4229 individuals at 3× depth using Illumina Hiseq 2500 V4. Three sets of sequenced data were prepared: (i) a subset of 1045 individuals (30×); (ii) a joint set of 1502 (30×) and 1786 (15×) individuals; and (iii) a set of 4229 individuals (3×); and each dataset was processed according to the standardised best practice method proposed by GATK v.3.2-2⁴⁴. The samples were recruited across seven regions of Japan: North, Northeast, East, West, Central, South, and Okinawa as representative of the Japanese population (Table S5). As briefly explained elsewhere¹⁰, we set exclusion criteria to control quality of genotypes sequenced at high depth (30× and 15×) as follows: (i) depth of coverage (DP) <5; (ii) genotype quality (GQ) <20; or (iii) DP > 60 and GQ < 95. We set these genotypes as missing and excluded variants with call rate <90% before conducting variant quality score recalibration (VQSR). After running VSQR implemented by GATK, variants located in low-complexity regions (LCR), as defined by mdust software (“hs37d5-LCRs.20140224.bed” downloaded from http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/low_complexity_regions/hs37d5-LCRs.20140224.bed.gz), were additionally excluded in the high depth (30×) dataset of 1045 individuals. Finally, the missing genotypes were imputed using BEAGLE v.4.1⁴⁵. For the low depth (3×) sequence dataset, BAM files obtained by GATK were processed through the best practice of the variant calling pipeline of GotCloud v.1.17.5⁴⁶.

Samples were filtered-out on the following six conditions: (i) “genomic DNA failure” was found in sequencing summary; (ii) read quality control errors were found in sequencing index, fraction of uniformity coverage, GC bias, fraction of duplicate reads and low read depth; (iii) duplicates found by identity-by-descent check; (iv) extreme heterozygosity (>12.5 SD from the mean); (v) low concordance with genotyping array; and/or (vi) high contamination rate (CHIPMIX ≥ 0.25 and FREEMIX ≥ 0.25) as measured by verify BamID. After filtering, 7472 “clean” samples remained.

We next merged the 1KG high-depth panel with four different combinations of the Japanese WGS data (Fig. S1). We obtained joint-called VCFs based on the hg19 reference from the high-depth 1KGP data made available by Google Genomics (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP)⁴⁷. Multi-allelic sites were removed, and general quality control was performed for all WGS sets and were then phased by SHAPEIT5 v5.1.1⁴⁸. After pre-phasing, reciprocal imputation was performed between the 1KG high-depth and each WGS set with minimac4 v4.1.5⁴⁹. The final reference panels were generated by merging the reciprocally imputed files, 1KG high-depth and each WGS set. For the 1KG+7K panel, cleaned and phased 3K and 4K files were first merged by BCFTOOLS v1.15.1⁵⁰ before reciprocally imputing and merging with 1KG high-depth file using the IMMerge tool. Lastly, we imputed the BBJ data using the TOPMed Imputation Server r2 according to the default parameters (https://imputation.biodatacatalyst.nhlbi.nih.gov).

Comparison of imputation quality between reference panels

To perform the benchmarking, firstly, for the leave-out method, we used the SNP array genotype file from 200 random samples and conducted imputation using the 1KG high-depth reference panel, as well as the WGS-enhanced reference panels (1KG+1K, 1KG+3K, 1KG+4K, 1KG+7K) using minimac4 v4.1.5. It is important to note that before imputing the 200 random samples, we removed them from each panel and phased the updated panels to ensure an unbiased analysis.

Regarding the TOPMed data, the files used for this analysis had already been imputed using 180,882 BBJ samples. Therefore, we extracted the 200 samples that were found in this dataset, converted the coordinates to hg19, and utilized the genotypes from these 200 WGS samples for the subsequent concordance analysis.

After imputation, we filtered the variants based on an r² threshold of >0.3, focusing on variants that were shared between the genotype files imputed up to the different reference panels. These shared variants were retained for further analyses. To assess the concordance between the imputed genotypes and the true genotypes generated by WGS, we performed a concordance analysis using the GLIMPSE2 software. GLIMPSE2 performs concordance analysis by taking two sets of genetic data as input, in our case, the imputed WGS-enhanced reference panels and TOPMed, and compares them against the true genotypes from the 200 WGS samples. The software matches variants based on genomic coordinates and compares genotypes. It then calculates concordance metrics for each pair of variants by determining the frequency of each possible two-locus genotype in the sample population. These frequencies are used to compute the observed and expected frequencies of each allele, which in turn are used to calculate the r² value. After running GLIMPSE2 concordance, we categorized the variants into bins based on their MAF to visualize the concordance rate across the spectrum of rare to common variants.

The total output for each reference panel was assessed by examining the relationship between mean MAF and mean r² concordance values across the MAF bins to capture imputation quality across the MAF spectrum. Following this, we compared the total number of variants falling in the r² categories of r² > 0.3, r² > 0.7, and r² > 0.9 for each panel’s imputation output. We also calculated the concordance, precision, recall and IQS. From the additional output imputation quality metrics, concordance assesses the similarity between the imputed and actual data; precision examines the proportion of correctly imputed positive observations out of all predicted positives; and recall, also known as sensitivity, measures the ratio of correctly identified positive results out of all actual positive results²¹. IQS is calculated using the ratios of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) from imputation results, providing a score that reflects the quality of the imputation on a scale from 0 to 1³¹.

Genome-wide association study of serum uric acid and total cholesterol levels

We extracted 181,927 individuals from BBJ who were genotyped using either (i) the Illumina HumanOmniExpressExome BeadChip or (ii) a combination of Illumina HumanOmniExpress and HumanExome arrays and imputed them up to the 1KG+7K panel. Quality control was performed by removing individuals who: withdrew consent; had call rates <98%; discordant genotypes; gender-mismatch; closely related samples; and/or non-East Asian ancestry. Any samples overlapping with those in the reference panels were also removed. Quality control on variants excluded those with call rate <99%; number of heterozygotes <5; extreme deviation from Hardy–Weinberg equilibrium (exact P < 1 × 10^–6); palindromic variants; and/or non-autosomal variants.

From the 181,927 BBJ samples, we extracted individuals with phenotype data recorded for serum uric acid and total cholesterol levels. After imputation, variants with r² < 0.3 were excluded from downstream association analyses. Serum uric acid and total cholesterol levels, measured by routine laboratory examination, were retrieved from medical records, and were adjusted for age, sex, ten principal components of genetic ancestry and disease status of the 47 common diseases recorded in BBJ. The resulting residuals were then normalized to Z-scores. Each variant passing quality control was tested for association (additive genotype dosage) with Z-scores in a linear regression model using PLINKv2⁵¹.

We first considered 27 and 32 previously reported loci for serum uric acid and total cholesterol in the Japanese population respectively²². We compared the lead variants identified after imputation up to the TOPMed panel and the merged 1KG+7K panel. We then considered lead variants attaining genome-wide significance after imputation up to the merged 1KG+7K panel that did not map to the previously reported loci in the Japanese population. We also excluded variants with MAF < 0.1%. We compared the lead variants at these loci with those previously reported in GWAS of serum uric acid and total cholesterol and identified the variants that were not significant when imputing up to TOPMed panel but became significant when imputing up to 1KG+7K panel.

Statistics and reproducibility

To evaluate imputation accuracy, we performed a leave-out analysis using 200 high-depth WGS samples selected from the combined 1KG+3K reference panel. Imputation accuracy metrics, including aggregated r2, concordance rate, recall, precision, and Imputation Quality Score (IQS), were calculated across MAF bins. GWAS for serum uric acid levels and total cholesterol levels were conducted using 104,174 and 122,407 samples respectively from BBJ Project. Both phenotypes were adjusted for age, sex, ten principal components of genetic ancestry, and disease status of the 47 common diseases recorded in BBJ. The residuals were normalized to Z-scores. Association analyses were performed under an additive model using linear regression of genotype dosage on the normalized residuals, utilizing PLINK version 2.0. Genome-wide significance was defined as P < 5.0 × 10^–8. All analyses were conducted using default parameters unless otherwise specified, and summary statistics have been deposited in GWAS catalog.

Data availability

The final imputation panel is available through the National Bioscience Database Center (NBDC) under the accession ID JGAS000738. We also provide GWAS summary statistics for both uric acid and total cholesterol, which were conducted with imputed genetic data from 1KG+7K and TOPMed reference panels. Full GWAS results are available via the NHGRI-EBI Catalog of human genome-wide association studies (https://www.ebi.ac.uk/gwas/studies/GCST90444427;GCST90444428; GCST90444429;GCST90444430). The source data for Fig. 1 can be found in Table 1 of this manuscript.

References

Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
Article CAS PubMed Google Scholar
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
Article CAS PubMed PubMed Central Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article PubMed Google Scholar
Zhang, Z., Xiao, X., Zhou, W., Zhu, D. & Amos, C. I. False positive findings during genome-wide association studies with imputation: influence of allele frequency and imputation accuracy. Hum. Mol. Genet. 31, 146–155 (2021).
Article PubMed PubMed Central Google Scholar
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature. 590, 290–299 (2021).
Article CAS PubMed PubMed Central Google Scholar
Shi, M. et al. Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels. Brief Bioinform. 25, bbad509 (2023).
Deelen, P. et al. Improved imputation quality of low-frequency and rare variants in European samples using the ‘Genome of The Netherlands’. Eur. J. Hum. Genet. 22, 1321–1326 (2014).
Article CAS PubMed PubMed Central Google Scholar
Pistis, G. et al. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur. J. Hum. Genet. 23, 975–983 (2015).
Article PubMed Google Scholar
Mitt, M. et al. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur. J. Hum. Genet. 25, 869–876 (2017).
Article PubMed PubMed Central Google Scholar
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Article CAS PubMed PubMed Central Google Scholar
Huang, J. et al. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6, 8111 (2015).
Article CAS PubMed Google Scholar
Li, L. et al. The ChinaMAP reference panel for the accurate genotype imputation in Chinese populations. Cell Res. 31, 1308–1310 (2021).
Article PubMed PubMed Central Google Scholar
Yoo, S. K. et al. NARD: whole-genome reference panel of 1779 Northeast Asians improves imputation accuracy of rare and low-frequency variants. Genome Med. 11, 64 (2019).
Article PubMed PubMed Central Google Scholar
Juang, J. J. et al. Rare variants discovery by extensive whole-genome sequencing of the Han Chinese population in Taiwan: Applications to cardiovascular medicine. J. Adv. Res. 30, 147–158 (2021).
Article CAS PubMed Google Scholar
O’Connell, J. et al. A population-specific reference panel for improved genotype imputation in African Americans. Commun. Biol. 4, 1269 (2021).
Article PubMed PubMed Central Google Scholar
Ahmad, M. et al. Inclusion of population-specific reference panel from India to the 1000 genomes phase 3 panel improves imputation accuracy. Sci. Rep. 7, 6733 (2017).
Article PubMed PubMed Central Google Scholar
Hwang, M. Y., Choi, N. H., Won, H. H., Kim, B. J. & Kim, Y. J. Analyzing the Korean reference genome with meta-imputation increased the imputation accuracy and spectrum of rare variants in the Korean population. Front. Genet. 13, 1008646 (2022).
Article PubMed PubMed Central Google Scholar
Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e3419 (2022).
Article CAS PubMed PubMed Central Google Scholar
Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Article PubMed PubMed Central Google Scholar
Rubinacci, S., Hofmeister, R. J., Sousa da Mota, B. & Delaneau, O. Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes. Nat. Genet. 55, 1088–1090 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lab, Torkamani software. Imputation accuracy calculator https://github.com/TorkamaniLab/imputation_accuracy_calculator.
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Article CAS PubMed Google Scholar
Tin, A. et al. Target genes, variants, tissues and transcriptional pathways influencing human serum urate levels. Nat. Genet. 51, 1459–1474 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sakaue, S. et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 53, 1415–1424 (2021).
Article CAS PubMed Google Scholar
Nakatochi, M. et al. Genome-wide meta-analysis identifies multiple novel loci associated with serum uric acid levels in Japanese individuals. Commun. Biol. 2, 115 (2019).
Article PubMed PubMed Central Google Scholar
Park, J. S., Kim, Y. & Kang, J. Genome-wide meta-analysis revealed several genetic loci associated with serum uric acid levels in Korean population: an analysis of Korea Biobank data. J. Hum. Genet. 67, 231–237 (2022).
Article CAS PubMed Google Scholar
Higashino, T. et al. Dysfunctional missense variant of OAT10/SLC22A13 decreases gout risk and serum uric acid levels. Ann. Rheum. Dis. 79, 164–166 (2020).
Article PubMed Google Scholar
Shinoda, Y., Yamashiro, T., Hosooka, A., Yasujima, T. & Yuasa, H. Functional characterization of human organic anion transporter 10 (OAT10/SLC22A13) as an orotate transporter. Drug Metab. Pharmacokinet. 43, 100443 (2022).
Article CAS PubMed Google Scholar
Toyoda, Y. et al. OAT10/SLC22A13 acts as a renal urate re-absorber: clinico-genetic and functional analyses with pharmacological impacts. Front. Pharmacol. 13, 842717 (2022).
Article CAS PubMed PubMed Central Google Scholar
Vávra, J. et al. Examining the association of rare allelic variants in urate transporters. Dis. Markers. 2024, 5930566 (2024).
Article PubMed PubMed Central Google Scholar
Ramnarine, S. et al. When does choice of accuracy measure alter imputation accuracy assessments? PLoS ONE. 10, e0137601 (2015).
Article PubMed PubMed Central Google Scholar
Nakayama, A. et al. Subtype-specific gout susceptibility loci and enrichment of selection pressure on. Ann. Rheum. Dis. 79, 657–665 (2020).
Article CAS PubMed Google Scholar
Rahimi-Sakak, F., Maroofi, M., Rahmani, J., Bellissimo, N. & Hekmatdoost, A. Serum uric acid and risk of cardiovascular mortality: a systematic review and dose-response meta-analysis of cohort studies of over a million participants. BMC Cardiovasc. Disord. 19, 218 (2019).
Article PubMed PubMed Central Google Scholar
Peng, T. C. et al. Relationship between hyperuricemia and lipid profiles in US adults. Biomed Res. Int. 2015, 127596 (2015).
Article PubMed PubMed Central Google Scholar
Jung, E., Kong, S. Y., Ro, Y. S., Ryu, H. H. & Shin, S. D. Serum cholesterol levels and risk of cardiovascular death: a systematic review and a dose-response meta-analysis of prospective cohort studies. Int J Environ Res Public Health. 19, 8272 (2022).
Hikino, K. et al. Susceptibility loci and polygenic architecture highlight population specific and common genetic features in inguinal hernias: genetics in inguinal hernias. eBioMedicine. 70, 103532 (2021).
Article CAS PubMed PubMed Central Google Scholar
Tanaka, N. et al. Eight novel susceptibility loci and putative causal variants in atopic dermatitis. J. Allergy Clin. Immunol. 148, 1293–1306.40 (2021).
Article CAS PubMed Google Scholar
Suetsugu, H. et al. Novel susceptibility loci for steroid-associated osteonecrosis of the femoral head in systemic lupus erythematosus. Hum. Mol. Genet. 31, 1082–1095 (2022).
Article CAS PubMed Google Scholar
Hikino, K. et al. RNF213 variants, vasospastic angina, and risk of fatal myocardial infarction. JAMA Cardiol. 9, 723–731 (2024).
Article PubMed PubMed Central Google Scholar
Ito, S. et al. A genome-wide association study identifies a locus associated with knee extension strength in older Japanese individuals. Commun. Biol. 7, 513 (2024).
Article CAS PubMed PubMed Central Google Scholar
Matsunami, M. et al. Genome-wide association studies for pelvic organ prolapse in the Japanese population. Commun. Biol. 7, 1188 (2024).
Article CAS PubMed PubMed Central Google Scholar
Yin, X. et al. Biological insights into systemic lupus erythematosus through an immune cell-specific transcriptome-wide association study. Ann. Rheum. Dis. 81, 1273–1280 (2022).
Article CAS PubMed Google Scholar
Liu, X. et al. Decoding triancestral origins, archaic introgression, and natural selection in the Japanese population by whole-genome sequencing. Sci Adv. 10, eadi8419 (2024).
Article CAS PubMed PubMed Central Google Scholar
Van der Auwera GA & O'Connor BD. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (1st Edition). O'Reilly Media (2020).
Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).
Article CAS PubMed PubMed Central Google Scholar
Jun, G., Wing, M. K., Abecasis, G. R. & Kang, H. M. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res. 25, 918–925 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yun, T. et al. Accurate, scalable cohort variant calls using DeepVariant and GLnexus. Bioinformatics. 36, 5582–5589 (2021).
Article PubMed PubMed Central Google Scholar
Hofmeister, R. J., Ribeiro, D. M., Rubinacci, S. & Delaneau, O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat. Genet. 55, 1243–1249 (2023).
Article CAS PubMed PubMed Central Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience. 10, giab008 (2021).
Article PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 4, 7 (2015).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Ms. Ayumi Ogawa, Ms. Mitsuyo Yamaguchi, Ms. Naoko Miyagawa, Mr. Akio Wada, Mr. Hiroaki Sasaki (RIKEN IMS) for technical support. We thank Dr. Masatoshi Matsunami (Ryukyu University) for advice. We would like to thank all participants of the BBJ Project. D.O.R. would like to thank Junior Research Associate Program (JRA) of RIKEN. This work was funded by JSPS KAKENHI Grant Number JP20H00462 and Japan Agency for Medical Research and Development (AMED) (21tm0424220, 22ek0109555 and 21ck0106642, 23ek0410114, 23tm0424225, JP18km0605001); Brain Pool Plus (BP+, Brain Pool+) Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2020H1D3A2A03100666), and BK-21 FOUR program through National Research Foundation of Korea (NRF) under Ministry of Education.

Author information

These authors contributed equally: Jack Flanagan, Xiaoxi Liu, David Ortega-Reyes.
These authors jointly supervised this work: Andrew P. Morris, Momoko Horikoshi, Chikashi Terao.

Authors and Affiliations

Laboratory for Genomics of Diabetes and Metabolism, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Jack Flanagan & Momoko Horikoshi
Department of Biostatistics, University of Liverpool, Liverpool, UK
Jack Flanagan & Andrew P. Morris
Graduate School of Data Science, Seoul National University, Seoul, Republic of Korea
Jack Flanagan
Laboratory for Statistical and Translational Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Xiaoxi Liu, David Ortega-Reyes, Kohei Tomizuka, Nana Matoba, Masato Akiyama, Masaru Koido, Kazuyoshi Ishigaki & Chikashi Terao
Laboratory for DNA Data Analysis, National Institute of Genetics, Shizuoka, Japan
David Ortega-Reyes
Department of Genetics, School of Life Science, The Graduate University for Advanced Studies, SOKENDAI, Kanagawa, Japan
David Ortega-Reyes
Department of Genetics, UNC Neuroscience Center, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Nana Matoba
Department of Ophthalmology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
Masato Akiyama
Laboratory of Complex Trait Genomics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
Masaru Koido, MingYang Shi & Yoichiro Kamatani
Laboratory for Genotyping Development, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Kyota Ashikawa, Sadaaki Takata, Tomomi Aoi & Yukihide Momozawa
Laboratory for Cardiovascular Genomics and Informatics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Kaoru Ito
Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Yoshinori Murakami
Laboratory of Clinical Genome Sequencing, Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
Koichi Matsuda
Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Division of Musculoskeletal and Dermatological Sciences, The University of Manchester, Manchester, UK
Andrew P. Morris
Clinical Research Center, Shizuoka General Hospital, Shizuoka, Japan
Chikashi Terao
Department of Applied Genetics, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka, Japan
Chikashi Terao
Laboratory of Genome Technology, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Koichi Matsuda
Laboratory of Clinical Genome Sequencing, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
Koichi Matsuda & Yoichiro Kamatani
Division of Genetics, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Yuji Yamanashi
Division of Clinical Genome Research, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Yoichi Furukawa
Division of Molecular Pathology IMSUT Hospital, Department of Internal Medicine Project Division of Genomic Medicine and Disease Prevention The Institute of Medical Science The University of Tokyo, Tokyo, Japan
Takayuki Morisaki
Department of Cancer Biology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Yoshinori Murakami
Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
Yoichiro Kamatani
Department of Public Policy, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Kaori Muto & Akiko Nagai
Department of Urology, Iwate Medical University, Iwate, Japan
Wataru Obara
Department of Internal Medicine and Rheumatology, Juntendo University Graduate School of Medicine, Tokyo, Japan
Ken Yamaji
Department of Respiratory Medicine, Juntendo University Graduate School of Medicine, Tokyo, Japan
Kazuhisa Takahashi
Division of Pharmacology, Department of Biomedical Science, Nihon University School of Medicine, Tokyo, Japan
Satoshi Asai
Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
Satoshi Asai & Yasuo Takahashi
Tokushukai Group, Tokyo, Japan
Takao Suzuki & Nobuaki Sinozaki
Departmentof Hematology, Nippon Medical School, Tokyo, Japan
Hiroki Yamaguchi
Department of Bioregulation, Nippon Medical School, Kawasaki, Japan
Shiro Minami
Tokyo Metropolitan Geriatric Hospital and Institute of Gerontology, Tokyo, Japan
Shigeo Murayama
Fukujuji Hospital, Japan Anti-Tuberculosis Association, Tokyo, Japan
Kozo Yoshimori
The Cancer Institute Hospital of the Japanese Foundation for Cancer Research, Tokyo, Japan
Satoshi Nagayama
Center for Clinical Research and Advanced Medicine, Shiga University of Medical Science, Shiga, Japan
Daisuke Obata
Department of General Thoracic Surgery, Osaka International Cancer Institute, Osaka, Japan
Masahiko Higashiyama
IIZUKA HOSPITAL, Fukuoka, Japan
Akihide Masumoto
National Hospital Organization Osaka National Hospital, Osaka, Japan
Yukihiro Koretsune

Authors

Jack Flanagan
View author publications
Search author on:PubMed Google Scholar
Xiaoxi Liu
View author publications
Search author on:PubMed Google Scholar
David Ortega-Reyes
View author publications
Search author on:PubMed Google Scholar
Kohei Tomizuka
View author publications
Search author on:PubMed Google Scholar
Nana Matoba
View author publications
Search author on:PubMed Google Scholar
Masato Akiyama
View author publications
Search author on:PubMed Google Scholar
Masaru Koido
View author publications
Search author on:PubMed Google Scholar
Kazuyoshi Ishigaki
View author publications
Search author on:PubMed Google Scholar
Kyota Ashikawa
View author publications
Search author on:PubMed Google Scholar
Sadaaki Takata
View author publications
Search author on:PubMed Google Scholar
MingYang Shi
View author publications
Search author on:PubMed Google Scholar
Tomomi Aoi
View author publications
Search author on:PubMed Google Scholar
Yukihide Momozawa
View author publications
Search author on:PubMed Google Scholar
Kaoru Ito
View author publications
Search author on:PubMed Google Scholar
Yoshinori Murakami
View author publications
Search author on:PubMed Google Scholar
Koichi Matsuda
View author publications
Search author on:PubMed Google Scholar
Yoichiro Kamatani
View author publications
Search author on:PubMed Google Scholar
Andrew P. Morris
View author publications
Search author on:PubMed Google Scholar
Momoko Horikoshi
View author publications
Search author on:PubMed Google Scholar
Chikashi Terao
View author publications
Search author on:PubMed Google Scholar

Consortia

The Biobank Japan Project

Koichi Matsuda
, Yuji Yamanashi
, Yoichi Furukawa
, Takayuki Morisaki
, Yoshinori Murakami
, Yoichiro Kamatani
, Kaori Muto
, Akiko Nagai
, Wataru Obara
, Ken Yamaji
, Kazuhisa Takahashi
, Satoshi Asai
, Yasuo Takahashi
, Takao Suzuki
, Nobuaki Sinozaki
, Hiroki Yamaguchi
, Shiro Minami
, Shigeo Murayama
, Kozo Yoshimori
, Satoshi Nagayama
, Daisuke Obata
, Masahiko Higashiyama
, Akihide Masumoto
& Yukihiro Koretsune

Contributions

M.H. and C.T. conceived and designed the study. J.F._, X.L., D.O.R., K.T., M.K., M.S. performed formal analysis and interpreted the data. K.A., S.T., T.A., Y.Momozawa. conducted the WGS. N.M., M.A., K.Ishigaki. prepared and curated the WGS datasets. The Biobank Japan Project provided the sample resources. J.F., D.O.R., X.L., M.H. drafted the manuscript with the help of other authors. K.Ito., K.M., Y.Murakami, Y.K., C.T. obtained the resources and funding. A.P. M., M.H., and C.T. supervised the study. All authors approved the final version of the manuscript.

Corresponding authors

Correspondence to Momoko Horikoshi or Chikashi Terao.

Ethics declarations

Competing interests

The authors declare no competing interests. Kaoru Ito is an Editorial Board Member for Communications Biology, but was not involved in the editorial review of, nor the decision to publish this article.

Peer review

Peer review information

Communications Biology thanks Simone Rubinacci and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: George Inglis and Johannes Stortz.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Figs.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Flanagan, J., Liu, X., Ortega-Reyes, D. et al. Population-specific reference panel improves imputation quality for genome-wide association studies conducted on the Japanese population. Commun Biol 7, 1665 (2024). https://doi.org/10.1038/s42003-024-07338-4

Download citation

Received: 22 July 2023
Accepted: 02 December 2024
Published: 19 December 2024
DOI: https://doi.org/10.1038/s42003-024-07338-4

This article is cited by

Unraveling time-dependent genetic components underlying alcohol response
- Keiko Hikino
- Ikuo Otsuka
- Chikashi Terao
Neuropsychopharmacology (2025)
A variant in HMMR/HMMR-AS1 is associated with serum alanine aminotransferase levels in the Ryukyu population
- Noriko Ohyama
- Masatoshi Matsunami
- Shiro Maeda
Scientific Reports (2025)