Medium-coverage DNA sequencing in the design of the genetic association study

Xu, Chao; Zhang, Ruiyuan; Shen, Hui; Deng, Hong-Wen

doi:10.1038/s41431-020-0656-2

Article
Published: 26 May 2020

Medium-coverage DNA sequencing in the design of the genetic association study

European Journal of Human Genetics volume 28, pages 1459–1466 (2020)Cite this article

836 Accesses
4 Citations
1 Altmetric
Metrics details

Subjects

Abstract

DNA sequencing is a widely used tool in genetic association study. Sequencing cost remains a major concern in sequencing-based study, although the application of next generation sequencing has dramatically decreased the sequencing cost and increased the efficiency. The choice of sequencing depth and the sequencing sample size will largely determine the final study investment and performance. Many studies have been conducted to find a cost-effective design of sequencing depth that can achieve certain sequencing accuracy using minimal sequencing cost. The strategies previously studied can be classified into two groups: (1) single-stage to sequence all the samples using either high (>~30×) or low (<~10×) sequencing depth; and (2) two-stage to sequence an affordable number of individuals at a high-coverage followed by a large sample of low-coverage sequencing. However, limited studies examined the performance of the medium-coverage (10–30×) sequencing depth for a genetic association study, where the optimum sequencing depth may exist. In this study, using a published simulation framework, we comprehensively compared the medium-coverage sequencing (MCS) to the single- and two-stage high/low-coverage sequencing in terms of the power and type I error of the variant discovery and association testing. We found, given certain sequencing effort, MCS yielded a comparable discovery power and better type I error control compared with the best (highest power) scenarios using other high- and low-coverage single-stage or two-stage designs. However, MCS was not as competent as other designs with respect to the association power, especially for the rare variants and when the sequencing investment was limited.

Opportunities and challenges for the use of common controls in sequencing studies

Article 17 May 2022

Evaluation of a custom QIAseq targeted DNA panel with 164 ancestry informative markers sequenced with the Illumina MiSeq

Article Open access 26 October 2021

Ultra-high-throughput mapping of genetic design space

Article 14 January 2026

References

Sims D, Sudbery I, Ilott NE, Heger A, Ponting CP. Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet. 2014;15:121–32.
Article CAS Google Scholar
Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333–51.
Article CAS Google Scholar
Flannick J, Korn JM, Fontanillas P, Grant GB, Banks E, Depristo MA, et al. Efficiency and power as a function of sequence coverage, SNP array density, and imputation. PLoS Comput Biol. 2012;8:e1002604.
Article CAS Google Scholar
Pasaniuc B, Rohland N, McLaren PJ, Garimella K, Zaitlen N, Li H, et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nat Genet. 2012;44:631–5.
Article CAS Google Scholar
Shen Y, Song R, Pe’er I. Coverage tradeoffs and power estimation in the design of whole-genome sequencing experiments for detecting association. Bioinformatics. 2011;27:1995–7.
Article CAS Google Scholar
Wu Y, Zheng ZL, Visscher PM, Yang J. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biol. 2017;18:86.
Article Google Scholar
Danjou F, Zoledziewska M, Sidore C, Steri M, Busonero F, Maschio A, et al. Genome-wide association analyses based on whole-genome sequencing in Sardinia provide insights into regulation of hemoglobin levels. Nat Genet. 2015;47:1264.
Article CAS Google Scholar
Deelen P, Menelaou A, van Leeuwen EM, Kanterakis A, van Dijk F, Medina-Gomez C, et al. Improved imputation quality of low-frequency and rare variants in European samples using the ‘Genome of The Netherlands’. Eur J Hum Genet. 2014;22:1321–6.
Article CAS Google Scholar
Kreiner-Moller E, Medina-Gomez C, Uitterlinden AG, Rivadeneira F, Estrada K. Improving accuracy of rare variant imputation with a two-step imputation approach. Eur J Hum Genet. 2015;23:395–400.
Article Google Scholar
Li Y, Sidore C, Kang HM, Boehnke M, Abecasis GR. Low-coverage sequencing: implications for design of complex trait association studies. Genome Res. 2011;21:940–51.
Article CAS Google Scholar
Xu C, Wu K, Zhang JG, Shen H, Deng HW. Low-, high-coverage, and two-stage DNA sequencing in the design of the genetic association study. Genet Epidemiol. 2017;41:187–97.
Article Google Scholar
Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet. 2011;12:443–51.
Article CAS Google Scholar
Gilly A, Suveges D, Kuchenbaecker K, Pollard M, Southam L, Hatzikotoulas K, et al. Cohort-wide deep whole genome sequencing and the allelic architecture of complex traits. Nat Commun. 2018;9:4674.
Article Google Scholar
Huang W, Li L, Myers JR, Marth GT. ART: a next-generation sequencing read simulator. Bioinformatics. 2012;28:593–4.
Article Google Scholar
Centers for Disease C. Prevention: National Diabetes Statistics Report: estimates of diabetes and its burden in the United States. Atlanta, GA: US Department of Health and Human Services; 2014.
Google Scholar
Su Z, Marchini J, Donnelly P. HAPGEN2: simulation of multiple disease SNPs. Bioinformatics. 2011;27:2304–5.
Article CAS Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.
Article CAS Google Scholar
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89:82–93.
Article CAS Google Scholar
Wetterstrand KA. DNA sequencing costs: data from the NHGRI Genome Sequencing Program (GSP); 2016. https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost.
Zhang L, Bai W, Yuan N, Du Z. Comprehensively benchmarking applications for detecting copy number variation. PLoS Comput Biol. 2019;15:e1007069.
Article CAS Google Scholar

Download references

Acknowledgements

The work was partially supported by grants from the National Institutes of Health (R01 AR059781, R01 MH104680, R01 AR069055, U19 AG055373, and P20GM109036), Edward G. Schlieder Endowment, and startup funds from Tulane University. This research was supported in part using high performance computing (HPC) resources and services provided by Technology Services at Tulane University, New Orleans, LA.

Author information

Authors and Affiliations

Center for Bioinformatics and Genomics, Department of Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, 70112, USA
Chao Xu, Ruiyuan Zhang, Hui Shen & Hong-Wen Deng
Department of Biostatistics and Epidemiology, The University of Oklahoma Health Sciences Center, Oklahoma City, OK, 73104, USA
Chao Xu
School of Basic Medical Science, Central South University, 410013, Changsha, China
Hong-Wen Deng

Authors

Chao Xu
View author publications
Search author on:PubMed Google Scholar
Ruiyuan Zhang
View author publications
Search author on:PubMed Google Scholar
Hui Shen
View author publications
Search author on:PubMed Google Scholar
Hong-Wen Deng
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Hong-Wen Deng.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, C., Zhang, R., Shen, H. et al. Medium-coverage DNA sequencing in the design of the genetic association study. Eur J Hum Genet 28, 1459–1466 (2020). https://doi.org/10.1038/s41431-020-0656-2

Download citation

Received: 20 December 2019
Revised: 29 April 2020
Accepted: 12 May 2020
Published: 26 May 2020
Version of record: 26 May 2020
Issue date: October 2020
DOI: https://doi.org/10.1038/s41431-020-0656-2

This article is cited by

Genome-wide selective sweep analysis in high-altitude Changthangi goats reveals candidate genes for pashmina fiber production
- Ram Parsad
- Sonika Ahlawat
- Rekha Sharma
Mammalian Genome (2025)

Medium-coverage DNA sequencing in the design of the genetic association study

Subjects

Abstract

Similar content being viewed by others

Opportunities and challenges for the use of common controls in sequencing studies

Evaluation of a custom QIAseq targeted DNA panel with 164 ancestry informative markers sequenced with the Illumina MiSeq

Ultra-high-throughput mapping of genetic design space

Log in or create a free account to read this content

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary information

Supplementary Table 1

Supplementary Table 2

Supplementary Table 3

Rights and permissions

About this article

Cite this article

This article is cited by

Genome-wide selective sweep analysis in high-altitude Changthangi goats reveals candidate genes for pashmina fiber production

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Opportunities and challenges for the use of common controls in sequencing studies

Evaluation of a custom QIAseq targeted DNA panel with 164 ancestry informative markers sequenced with the Illumina MiSeq

Ultra-high-throughput mapping of genetic design space

Log in or create a free account to read this content

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary information

Supplementary Table 1

Supplementary Table 2

Supplementary Table 3

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Genome-wide selective sweep analysis in high-altitude Changthangi goats reveals candidate genes for pashmina fiber production

Search

Quick links