Table 1 Accuracy in primary consultation

From: Enhancing diagnostic capability with multi-agents conversational large language models

Single model

Base model

Number of agents

Most likely diagnosis accuracy

Possible diagnosis accuracy

Further diagnostic tests helpful rate

GPT-3.5

NA

16.23%

27.92%

47.68%

GPT-4

NA

19.65%

34.55%

58.17%

Multi-agent conversation framework

Base model

Number of agents

Most likely diagnosis accuracy

Possible diagnosis accuracy

Further diagnostic tests helpful rate

GPT-3.5

2

23.18%

36.09%

73.84%

GPT-3.5

3

24.17%

35.43%

79.14%

GPT-3.5

4

24.28%

36.64%

77.59%

GPT-3.5

5

22.85%

36.09%

79.47%

GPT-4

2

31.13%

45.03%

73.51%

GPT-4

3

32.45%

46.36%

76.82%

GPT-4

4

34.11%

48.12%

78.26%

GPT-4

5

31.79%

46.36%

81.46%

Subgroup analysis: exclude supervisor agent

Base model

Number of agents

Most Likely diagnosis accuracy

Possible diagnosis accuracy

Further diagnostic tests helpful rate

GPT-3.5

4

24.50%

36.20%

74.28%

GPT-4

4

32.67%

45.47%

78.04%

Subgroup analysis: assign doctor agents with different specialties dynamically

Base model

Number of agents

Most Likely diagnosis accuracy

Possible diagnosis accuracy

Further diagnostic tests helpful rate

GPT-3.5

4

24.84%

36.64%

78.03%

GPT-4

4

34.32%

48.23%

80.02%