Fig. 3: HistoGPT generates human-level pathology reports of skin diseases. | Nature Communications

Fig. 3: HistoGPT generates human-level pathology reports of skin diseases.

From: Generating dermatopathology reports from gigapixel whole slide images with HistoGPT

Fig. 3

a Our internal Munich dataset is a real-world medical cohort of 15,129 whole slide images from 6705 patients with 167 skin diseases from the Department of Dermatology at the Technical University of Munich. It includes malignant cases such as basal cell carcinoma (BCC, n = 870) and squamous cell carcinoma (SCC, n = 297); precursor lesions such as actinic keratosis (AK, n = 396); as well as benign cases such as benign melanocytic nevus (BMN, n = 770) and seborrheic keratosis (SK, n = 412). We divided the dataset into a training set and a test set using a stratified 75/25 split at the patient level. b Through years of experience, pathologists are often able to make a diagnosis at first glance. Instead of writing a pathology report themselves, they can use HistoGPT in “Expert Guidance” mode by giving the model the correct diagnosis to complete the report. c We evaluated the performance of the model using four semantic-based machine learning metrics: (i) we matched critical medical terms extracted from the original text with the generated text using a dermatology dictionary; (ii) we used the same technique but with ScispaCy, a scientific name entity recognition tool, as the keyword extractor; (iii) we compared the semantic meaning of the original and generated reports by measuring the cosine similarity of their text embeddings generated by the biomedical language model BioBERT; (iv) we used the same technique but with the general purpose large language model GPT-3-ADA for text embedding. d HistoGPT models (blue) surpassed BioGPT-1B (yellow) and GPT-4V (red) on the two text accuracy metrics, Dictionary and ScispaCy, as well as on the two text similarity metrics, BioBERT and GPT-3-ADA (see Methods for details). e Two independent external board-certified dermatopathologists (P1 and P2) evaluated 100 original vs. expert-guided generated reports along with the corresponding whole slide image in a randomized, blinded study. Source data are provided as a Source Data file.

Back to article page