Table 1 CNV calling tools included in the study.

From: Benchmarking germline CNV calling tools from exome sequencing data

Tool

Algorithm detail

Features (specifics)

Year

CANOES

Negative binomial distribution, regression-based normalization (GC-content), HMM

At least 15 samples, average targets 6, distance between targets 70 kb, average rate of CNV occurrence in the exome 10–8

2014

CLAMMS

GC-content and average depth normalization, custom reference set using kNN, mixture model, HMM

0.3 < GC < 0.7, mappability > 0.75

2015

cn.MOPS

GC-content and sample normalization, mixture Poissons model and Bayes approach

At least 6 samples

Minimum segments 5

2012

CNVkit

In-target and off-target regions, bias (GC-content, repeat-masked fraction, target density) correction using rolling median, CBS

Exclude poor mappable regions

2016

CODEX

Log-linear decomposition-based normalization, Poisson likelihood-based segmentation

0.2 < GC < 0.8

Target length > 20 bp, median target coverage > 20 × , mappability > 0.9

2015

CoNIFER

Singular value decomposition normalization, ± 1.5 SVD-ZRPKM threshold

At least 50 samples

Probes with median RPKM across samples > 1, samples with a standard deviation of SVD-ZRPKM < 0.5

2012

CONTRA

Base-level log-ratios, GC-content, library-size correction, calling region significant based on normal distribution, CBS for large variation

Include regions at least 10-bp long with coverage > 10

2012

DeAnnCNV

Web-server, GC-normalization, HMM of log read counts ratio

CNV evidence threshold > 80

2015

EXCAVATOR2

In and off-target regions, 3-step normalization (GC-content, mappability, region length) segmentation with shifting level model, FastCall algorithm

Read mapq > 1

Min number of targets in CNV 4

2016

exomeCopy

Negative binomial distribution, HMM using background read depth and positional covariates (GC-content, length)

mapq > 1, overlap to include read into region—1 bp, median value for background, transition probability to CNV 1e-4

Transition probability to normal state 0.05

2011

ExomeDepth

Beta-binomial distribution, optimized reference set, HMM

Read mapq > 20, max distance between target border and the middle of paired read to include read into region 300 bp

Transition probability to CNV 0.0001

Expected CNV length 50 kb

2012

ExonDel

Deletion in exome or genes of interest, GC-content median correction, calling by comparing to median depth within the gene

Read mapq > 20, base quality > 20, min percent of covered bp for each exon 0.1, max number of exons in CNV 9

2014

FishingCNV

PCA of RPKM, CBS test sample, comparing segment coverage against control set distribution

Read mapq > 15

Base quality 10, RPKM > 3

FDR adjusted p-value 0.05

2013

HMZDelFinder

Only deletion, exon and sample filtering, call region with RPKM < 0.65 as deletion, AOH filtering based on VCF, prioritization based on Z-score

Mean RPKM > 7 across samples, deletion frequency < 0.5%

Exclude 2% samples with the highest number of deletion

2017

PatternCNV

Log2-transformed RPKM standardization, average and variability pattern training from control samples, smooth bin within exon

Bin size 10

mapq > 20

2014

XHMM

Gaussian distribution, PCA normalization, HMM

At least 50 samples, 0.1 < GC < 0.9, 10 bp < target < 10 kbp, mean coverage > 10 × across all samples,

average targets 6, distance between targets 70 kb, average rate of CNV occurrence in the exome 10–8

2012