Abstract
High-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods “learn” the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout


Similar content being viewed by others
Data availability
Code availability
References
Tayoun ANA, Spinner NB, Rehm HL, Green RC, Bianchi DW. Prenatal DNA sequencing: clinical, counseling, and diagnostic laboratory considerations. Prenat Diagn. 2018;38:26–32.
Best S, Wou K, Vora N, Van der Veyver IB, Wapner R, Chitty LS. Promises, pitfalls and practicalities of prenatal whole exome sequencing. Prenat Diagn. 2018;38:10–9.
Stojilkovic-Mikic T, Mann K, Docherty Z, Ogilvie CM. Maternal cell contamination of prenatal samples assessed by QF-PCR genotyping. Prenat Diagn 2005;25(1):79–83.
Weida J, Patil AS, Schubert FP, Vance G, Drendel H, Reese A, et al. Prevalence of maternal cell contamination in amniotic fluid samples. J Matern Fetal Neonatal Med. 2017;30:2133–7.
Lamb AN, Rosenfeld JA, Coppinger J, Dodge ET, Dabell MP, Torchia BS, et al. Defining the impact of maternal cell contamination on the interpretation of prenatal microarray analysis. Genet Med. 2012;14:914–21.
Nagan N, Faulkner NE, Curtis C, Schrijver I. Laboratory guidelines for detection, interpretation, and reporting of maternal cell contamination in prenatal analyses. J Mol Diagn. 2011;13:7–11.
DeBoever C, Aguirre M, Tanigawa Y, Spencer CCA, Poterba T, Bustamante CD, et al. Bayesian model comparison for rare variant association studies of multiple phenotypes. 2018. https://doi.org/10.1101/257162.
Degner JF, Marioni JC, Pai AA, Pickrell JK, Nkadori E, Gilad Y, et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009;25:3207–12.
Jun G, Flickinger M, Hetrick KN, Romm JM, Doheny KF, Abecasis GR, et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am J Hum Genet. 2012;91:839–48.
Van der Auwera G. Genotype refinement workflow. https://gatkforums.broadinstitute.org/gatk/discussion/4723/genotype-refinement-workflow (2014).
GATK Team. Genotype refinement workflow for germline short variants. https://gatk.broadinstitute.org/hc/en-us/articles/360035531432-Genotype-Refinement-workflow-for-germline-short-variants (2020) (2020).
Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD; 2016. p. 785–94.
Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data. 2016;3:160025.
Consortium, The 1000 Genomes Project. A global reference for human genetic variation. Nature 2015;526:68–74.
Jia Z, Fengbiao M, Wang L, Li M, Shi Y, Zhang B, et al. Whole-exome sequencing identifies a de novo mutation in TRPM4 involved in pleiotropic ventricular septal defect. Int J Clin Exp Pathol. 2017;10:5092–104.
Corpas M, Valdivia-Granda W, Torres N, Greshake B, Coletta A, Knaus A, et al. Crowdsourced direct-to-consumer genomic analysis of a family quartet. BMC Genom. 2015;16:910.
Jun G, Wing MK, Abecasis GR, Kang HM. An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data. Genome Res. 2015. https://doi.org/10.1101/gr.176552.114.
1000 Genomes Project. GRCh38 alignment README. https://github.com/igsr/1000Genomes_data_indexes/blob/master/data_collections/1000_genomes_project/README.1000genomes.GRCh38DH.alignment (2015).
Kim S, Scheffler K, Halpern AL, Bekritsky MA, Noh E, Kallberg M, et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods. 2018;15:591.
Van der Auwera G. (howto) Apply hard filters to a call set. https://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to-a-call-set (2013).
Flickinger M, Jun G, Abecasis GR, Boehnke M, Kang HM. Correcting for sample contamination in genotype calling of DNA sequence data. Am J Hum Genet. 2015;97:284–90.
Funding
This work was funded by the Skoltech Biomedical Initiative grant to GAB and DY.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
About this article
Cite this article
Nabieva, E., Sharma, S.M., Kapushev, Y. et al. Accurate fetal variant calling in the presence of maternal cell contamination. Eur J Hum Genet 28, 1615–1623 (2020). https://doi.org/10.1038/s41431-020-0697-6
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41431-020-0697-6
This article is cited by
-
A prenatal case misunderstood as specimen confusion: 46,XY/46,XY chimerism
BMC Pregnancy and Childbirth (2024)