Table 2 Zero-shot, few-shot (5) and Chain of thought prediction performance with Data are precision, recall, F1-score and accuracy.
From: Automated RECIST tumor response classification through prompt-guided large language models
Class | Precision | Recall | F1-Score | n |
|---|---|---|---|---|
BL | 1 | 0,87 | 0,93 | 23 |
CR | 0,053 | 1 | 0,1 | 1 |
PD | 1 | 0,8 | 0,889 | 10 |
PR | 0,643 | 0,551 | 0,593 | 49 |
SD | 0,714 | 0,588 | 0,645 | 34 |
Accuracy |  |  | 0,65 [0,564 – 0,735] | 117 |
Macro avg | 0,682 | 0,762 | 0,632 [0,555 – 0,700] | 117 |
Micro avg | 0,759 | 0,65 | 0,696 [0,564 – 0,735] | 117 |