Table 5 Clinician validation of llm-generated diagnoses: inter-rater agreement
Clinician | Inter-Rater Accuracy [%] | Inter-Rater Union Accuracy [%] | Inter-rater intersection accuracy [%] | |||
|---|---|---|---|---|---|---|
Claude 3.5 sonnet | RAG-assisted LLM | Claude 3.5 sonnet | RAG-assisted LLM | Claude 3.5 Sonnet | RAG-assisted LLM | |
Clinician 1 | 90.43 | 90.11 | 95.22 | 95.37 | 75.44 | 73.84 |
Clinician 2 | 80.22 | 79.11 | ||||
Clinician 3 | 76.51 | 77.78 | 96.03 | 94.44 | 66.03 | 65.87 |
Clinician 4 | 85.56 | 82.52 | ||||
Average | 83.18 | 82.38 | 95.62 | 94.91 | 70.74 | 69.86 |