Fig. 7: Pathology-informed analysis of failure mechanisms.
From: Generating dermatopathology reports from gigapixel whole slide images with HistoGPT

a Like all deep learning algorithms, HistoGPT learns the most distinctive features for each class to reliably discriminate between them. However, for diseases that were rarely seen during training (e.g., psoriasis), these features are not sufficient to be applied to unseen cases and may be confused with features from related diseases (e.g., eczema). b Even if the cases were seen often enough during training, the tissue sample may contain tissue composition, color dynamics, and other variations that were not encountered during training. For example, we found an image of erythrocytes similar to images of eosinophils that the model saw during training, leading to the activation of eosinophil-related concepts in the neural network. c Similarly, there was a case of Clark’s level II melanoma (top) that mimicked the Bowenoid growth pattern of squamous cell carcinoma (bottom) and was predicted as squamous cell carcinoma. d Another case was a grade 3 acute graft-versus-host disease (GVHD, top) that mimicked actinic keratosis (bottom)—HistoGPT diagnosed the latter. e When the whole slide image contains components of different diseases, HistoGPT tends to predict the most likely diagnosis (the class seen most often during training), not the most significant one. This happened in a case of a melanocytic nevus that also showed patterns of seborrheic keratosis. Source data are provided as a Source Data file.