Table 3 Number of correctly provided AO codes and proportion of consistently correct answers.

From: Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports

AO Code

In n cases

Human 1

Human 2

GPT 3.5

GPT 4

FraCChat 3.5

FraCChat 4

Full AO

100

Correct

95

95

5

7

57

83

Consistent correct answers (%)

  

3%

2%*

48%***

71%***

Location

100

Correct

98

98

52

61

91

97

Consistent correct answers (%)

  

42%***

56%***

88%***

96%***

Part of Bone

100

Correct

99

98

59

68

96

99

Consistent correct answers (%)

  

46%***

63%***

93%***

99%***

Type

100

Correct

98

98

22

34

88

97

Consistent correct answers (%)

  

13%

20%

83%***

96%***

Group

73

Correct

70

70

11

12

55

66

Consistent correct answers (%)

  

8%

11%*

67%***

82%***

Subgroup

56

Correct

52

54

3

1

24

44

Consistent correct answers (%)

  

2%

0%

36%*

59%***

  1. Outputs were tested whether the proportion of consistently provided correct answers was higher than inconsistently provided correct answers by one-sided chi squared test. Results are given with * for p < 0.05 and *** for p < .001.