Table 1 Accuracy in primary consultation
From: Enhancing diagnostic capability with multi-agents conversational large language models
Single model | ||||
|---|---|---|---|---|
Base model | Number of agents | Most likely diagnosis accuracy | Possible diagnosis accuracy | Further diagnostic tests helpful rate |
GPT-3.5 | NA | 16.23% | 27.92% | 47.68% |
GPT-4 | NA | 19.65% | 34.55% | 58.17% |
Multi-agent conversation framework | ||||
|---|---|---|---|---|
Base model | Number of agents | Most likely diagnosis accuracy | Possible diagnosis accuracy | Further diagnostic tests helpful rate |
GPT-3.5 | 2 | 23.18% | 36.09% | 73.84% |
GPT-3.5 | 3 | 24.17% | 35.43% | 79.14% |
GPT-3.5 | 4 | 24.28% | 36.64% | 77.59% |
GPT-3.5 | 5 | 22.85% | 36.09% | 79.47% |
GPT-4 | 2 | 31.13% | 45.03% | 73.51% |
GPT-4 | 3 | 32.45% | 46.36% | 76.82% |
GPT-4 | 4 | 34.11% | 48.12% | 78.26% |
GPT-4 | 5 | 31.79% | 46.36% | 81.46% |
Subgroup analysis: exclude supervisor agent | ||||
|---|---|---|---|---|
Base model | Number of agents | Most Likely diagnosis accuracy | Possible diagnosis accuracy | Further diagnostic tests helpful rate |
GPT-3.5 | 4 | 24.50% | 36.20% | 74.28% |
GPT-4 | 4 | 32.67% | 45.47% | 78.04% |
Subgroup analysis: assign doctor agents with different specialties dynamically | ||||
|---|---|---|---|---|
Base model | Number of agents | Most Likely diagnosis accuracy | Possible diagnosis accuracy | Further diagnostic tests helpful rate |
GPT-3.5 | 4 | 24.84% | 36.64% | 78.03% |
GPT-4 | 4 | 34.32% | 48.23% | 80.02% |