Table 5 Error analysis results

From: Detecting stigmatizing language in clinical notes with large language models for addiction care

Approach

Unreadable

Decision Switch

Too Sensitive or Unrelated to Substance Use

Valid

Total False Positives

Supervised-Fine Tuning (SFT)

11

45

27

10

93

In-context

11

1

152

22

186

  1. Results from error analysis evaluating false positives. Unreadable: the model did not coherently explain its reasoning when prompted. Decision switch: the model did not agree with its original answer to the prompt. Unrelated to substance use: the model considered the note stigmatizing in a non-substance use context. Valid: the model identified stigmatizing language regarding substance use.