Table 1 Distribution of variant type and clinical significance in training and test sets.

From: CNVoyant a machine learning framework for accurate and explainable copy number variant classification

 

Benign

VUS

Pathogenic

Total

Deletions

Train

13,134 (48.3%)

7191 (26.4%)

6886 (25.3%)

27,211

Test

1097 (9.9%)

3785 (34.2%)

6183 (55.9%)

11,065

Duplications

Train

11,145 (44.6%)

10,792 (43.2%)

3028 (12.2%)

24,965

Test

1554 (14.8%)

5595 (53.2%)

3360 (32.0%)

10,509

Total

Train

24,279 (46.5%)

17,983 (34.5%)

9914 (19.0%)

52,176

Test

2651 (12.3%)

9380 (43.5%)

9543 (44.2%)

21,574

  1. 52,176 total CNVs were included in the ClinVar training set, and 21,574 total CNVs were included in the DECIPHER test set. The training set generally favored variants of benign significance, with pathogenic significance encompassing the fewest number of variants. This trend was reversed in the test set, which heavily favored VUS and pathogenic CNVs over those with benign significance. Clinical significance class distribution was generally consistent between duplication and deletion events except for more pathogenic variants being present in deletions.