Table 1 Datasets collected and purpose.

From: A machine learning approach based on ACMG/AMP guidelines for genomic variant classification and prioritization

 

Dataset name

Purpose

# of variants

Model building

Clinvitae training

Training

8496

Clinvitae probability threshold tuning (PTT)

Tuning the probability threshold for classification

4247

Model validation

Clinvitae test

Comparison between different ML methods and the pathogenicity score in19

1415

Clinvitae Validation

Testing classification of the selected ML method, in comparison with the pathogenicity score and the bayesian score

161,744

ICR639

Testing classification and prioritization of the selected ML method on a real dataset, in comparison with the pathogenicity score, the bayesian score, CADD and VVP

18,046