Fig. 3: External Validation on the Full External Dataset at Native Label Prevalence.
From: Detecting stigmatizing language in clinical notes with large language models for addiction care

Bootstrapped (n = 1000) performance on the complete external dataset (288,130 clinical notes), in which stigmatizing language was present in 2072 notes (0.72%). Results reflect model performance in a real-world, highly imbalanced setting without artificial rebalancing of label distribution.