Table 1 Zero-shot, few-shot (5) and Chain of thought prediction performance with Data are precision, recall, F1-score and accuracy.
From: Automated RECIST tumor response classification through prompt-guided large language models
Class | Precision | Recall | F1-Score | n |
|---|---|---|---|---|
BL | 1 | 0,844 | 0,915 | 32 |
CR | 0,357 | 0,833 | 0,5 | 6 |
PD | 0,588 | 0,909 | 0,714 | 11 |
PR | 0,708 | 0,793 | 0,748 | 58 |
SD | 0,737 | 0,4 | 0,519 | 35 |
Accuracy |  |  | 0,718 [0,641 – 0,789] | 142 |
Macro avg | 0,678 | 0,756 | 0,679 [0,581 – 0,760] | 142 |
Micro avg | 0,757 | 0,718 | 0,716 [0,641 – 0,789] | 142 |