Fig. 1: Summary of the confusion matrices for the four types of large language models in detecting seven subtypes of reporting errors. | npj Digital Medicine

Fig. 1: Summary of the confusion matrices for the four types of large language models in detecting seven subtypes of reporting errors.

From: The use of large language models in detecting Chinese ultrasound report errors

Fig. 1

Specifically, the confusion matrices show the performance of GPT-3.5, GPT-4, GPT-4o and Claude 3.5 Sonnet in detecting specific error types in 400 reports. 0 = Error-free; 1 = Item omission; 2 = Contradictory conclusion; 3 = Descriptive error; 4 = Content repetition; 5 = Spelling error; 6 = Other error.

Back to article page