Figure 4
From: Full-scale network analysis reveals properties of the FV protein structure organization

The FV-Class AI framework. (A) The first and most critical step to use predictive ML algorithms is to prepare a clean, highly informative dataset. We used the structural characteristics of the FV structure15 (PDB: 7KVE), as well as centrality measures derived from the FV residue network, and a score quantifying the evolutionary conservation of its residues. In this dataset, we had 63 unique instances representing single-point mutations of FV-deficient patients and \({\sim }\)1250 residues where no FV-deficiency was reported (Supplementary Table S4). After careful curation and standardization, this dataset was ready to be inputted into multiple ML classifier algorithms. (B) We performed a comprehensive parameter optimization to find the best settings for the FV-Class; this yielded AUC values that demonstrate a favorable learning prospect for all algorithms (Methods). In particular, the Support Vector Machine (SVM) obtained the highest value. (C–D) Here, each dot is a residue of the FV protein, and the boxplots depict the number of ML classifiers that predicted a loss-of-function if those residues are mutated; for instance, in red are the residues predicted to be safe to substitute, and in magenta, those that will most likely impair FV’s functions (in general, the most conserved residues, buried at the core of the structure). Abbreviations: DT: Decision trees32; KNN: K-nearest neighbors36; RF: Random forest36; XGB: XGBoost34.