Table 1 Overview of possible contamination types, their consequences and suitable filtering options. PC-AF: pool-complement allelic fraction.

From: Sample-Index Misassignment Impacts Tumour Exome Sequencing

Contamination type

Cause (the type of co-multiplexed samples)

Possible somatic variant calling artefacts

Prevalence of given contamination type in affected datasets

Suitable post-sequencing filtering options

a) Contaminant germline variants in a tumour sample

Any samples from other individuals

False positive somatic variants in the form of germline variation from other individuals

The most likely contamination type to occur;

Contamination targets are expected to be more affected in copy number loss regions*

A variant filter based on an appropriate germline variant database or a relevant panel of normal samples;

A filter based on PC-AF values (if a more discriminative solution is necessary)

b) Contaminant somatic variants in a tumour sample

Other tumour samples

False positive “recurrent” somatic variants in the form of somatic variation from other tumour samples – whether from other individual(s) or the same individual

Expected to be relevant in tumour sample pools enriched** for specific somatic variants;

Contamination targets are expected to be more affected in copy number loss regions*

A filter based on PC-AF values (non-discriminative filtering might lead to false negatives of high importance)

c) Contaminant germline variants in a control sample

Any samples from other individuals

False negatives/missed somatic variant calls – only concerning somatic variants that also occur as germline variants

Dependent on the occurrence of important variants as both germline and somatic in a given project’s setting

Review of calls not classified as somatic, adjustment of the variant caller parameters

d) Contaminant somatic variants in a control sample

Any tumour samples

False negatives/missed somatic variant calls – concerning all somatic variants

Elevated relevancy when matched samples are co-multiplexed;

Prevalence dependent on the enrichment** of potential contaminant variants in a given sample pool;

Consequences dependent on variant caller’s tendency to reject a somatic variant candidate due to evidence of its presence in the matched control

Review of calls not classified as somatic, adjustment of the variant caller parameters

  1. *Copy number loss regions of high-purity tumour samples will be especially affected.
  2. **The enrichment will increase together with given variant’s recurrence, as well as with purity of tumour samples that carry the variant.