Fig. 4: Changes in predictive performance across machine learning classifiers when applied to each model’s textual representations. | npj Digital Medicine

Fig. 4: Changes in predictive performance across machine learning classifiers when applied to each model’s textual representations.

From: The foundational capabilities of large language models in predicting postoperative risks using clinical notes

Fig. 4

Changes in predictive performance across distinct machine learning classifiers compared to the default eXtreme Gradient Boosting (XGBoost) used in our experiments, when applied to the textual representations of foundation models across all outcomes (i.e., modeli − XGBoosti for the ith outcome). These classifiers include logistic regression, random forest, and the outcome-specific feed-forward fully connected auxiliary neural network integrated in the foundation model. The figure panels illustrate the changes for the following outcomes: a 30-day mortality, b Acute Kidney Injury (AKI), c Pulmonary Embolism (PE), d Pneumonia, e Deep Vein Thrombosis (DVT), and f Delirium. The bar graphs represent the means, while the error bars indicate the standard errors across a 5-fold cross-validation. Our results reveal that no single classifier consistently outperformed the others across all outcomes and metrics. Interestingly, the logistic regression classifier performed slightly better in some cases, suggesting that well-tuned language models can generate precise contextual representations that work effectively with a simple classifier.

Back to article page