Fig. 1: TCRD knowledge graphs concept and overview of the meta-path-XGBoost algorithm, MPxgb(AD), and workflow.

Centered around the knowledge tree, this concept was essential in selecting data types (Table 1) for the ML algorithms used to impute AD associations for potential proteins/genes. a. Transformation of knowledge graph to ML-ready dataset and training of the model. An example metapath: {Target — (member of) → PPI (protein–protein interaction network) ← (member of) — Protein — (associated with) → Disease} summarizes multiple metapaths for PPI data. b. Evidence weighting by degree-weighted path count (DWPC). c, d. Five-fold cross validation and test set performance are used to evaluate a weighted method (left) AUC-ROC = 0.91/0.93 (five-fold CV/test set) and balanced method (right) AUC-ROC = 0.98/0.62 (five-fold CV/test set) to select the best performing model. e. Feature importance prediction for the AKNA-AD association.