Table 3 Performance comparison of different large language models.
Method | RG65 | MC30 | YP130 | MED38 | ||||
|---|---|---|---|---|---|---|---|---|
r | ρ | r | ρ | r | ρ | r | ρ | |
SBERT + Bloom-7B1 | 0.7947 | 0.7724 | 0.7931 | 0.7922 | 0.7251 | 0.6973 | 0.7189 | 0.7203 |
SBERT + Qwen-7B-Chat-Int4 | 0.8188 | 0.8057 | 0.8056 | 0.8184 | 0.7373 | 0.7000 | 0.7381 | 0.7354 |
SBERT + Qwen-7B-Chat | 0.8374 | 0.8089 | 0.8297 | 0.8429 | 0.7599 | 0.7259 | 0.7421 | 0.7428 |
SBERT + Deepseek-7B | 0.8457 | 0.8225 | 0.8300 | 0.8405 | 0.7648 | 0.7325 | 0.7478 | 0.7582 |
SBERT + ChatGPT-3.5-turb | 0.8594 | 0.8294 | 0.8403 | 0.8273 | 0.7792 | 0.7307 | 0.7511 | 0.7736 |
SBERT + ChatGPT-4 | 0.8723 | 0.8469 | 0.8444 | 0.8340 | 0.7820 | 0.7321 | 0.7642 | 0.7839 |