Table 1 Datasets collected and purpose.
 | Dataset name | Purpose | # of variants |
---|---|---|---|
Model building | Clinvitae training | Training | 8496 |
Clinvitae probability threshold tuning (PTT) | Tuning the probability threshold for classification | 4247 | |
Model validation | Clinvitae test | Comparison between different ML methods and the pathogenicity score in19 | 1415 |
Clinvitae Validation | Testing classification of the selected ML method, in comparison with the pathogenicity score and the bayesian score | 161,744 | |
ICR639 | Testing classification and prioritization of the selected ML method on a real dataset, in comparison with the pathogenicity score, the bayesian score, CADD and VVP | 18,046 |