Table 2 Machine learning classifiers’ and ensemble models’ performances

From: A Machine Learning-Driven Virtual Biopsy System For Kidney Transplant Patients

Models

Hand and Till’sMulti-AUC

Mean Absolute Error

Arteriosclerosis(cv Banff score)

Arteriolar hyalinosis(ah Banff score)

Interstitial fibrosis tubular atrophy(IFTA Banff score)

Glomerulosclerosis in percentage

Random Forest

0.836

0.774

0.830

5.807

Gradient Boosting Machine

0.807

0.750

0.805

6.486

Extreme Gradient Boosting Tree

0.830

0.767

0.827

5.768

Linear Discriminant Analysisa

0.761

0.703

0.750

-a

Model Averaged Neural Network

0.777

0.720

0.757

6.573

Multinomial Logistic Regressiona

0.763

0.706

0.753

-a

Ensemble Model

0.833

0.773

0.830

5.999

  1. The models used for ordinal scores (multiclass classification) are as follows: random forest, gradient boosting machine, extreme gradient boosting tree, linear discriminant analysis, model averaged neural network, and multinomial logistic regression. The models used for the percentage of glomerulosclerosis (regression) are as follows: random forest, gradient boosting machine, extreme gradient boosting tree, and model averaged neural network. Finally, we created ensemble models; for the ordinal day-zero lesion scores, we averaged the probabilities of the six models; for the percentage of glomerulosclerosis, we used linear regression of the four models we created. For the ordinal day-zero lesion scores, model performances were assessed by Hand and Till’s area under the curve (multi-AUC). For the percentage of glomerulosclerosis, model performances were assessed by mean absolute error (MAE). Ensemble models were selected as virtual biopsy system. Model performances were assessed in 3-times repeated 10-folds cross-validation (30 resamples).
  2. AUC area under the curve (higher the better). MAE mean absolute error (lower the better).
  3. aLinear discriminant analysis and multinomial logistic regression are not developed for regression but for classification.